Member Login
Username Password (Forgot?)
You are here: Learn > The Library > Columnists > Dick Eastman Online

Dick Eastman Online
2/21/2001 - Archive


More on LDS Church’s Adoption of the XML Standard
Last week I wrote an article titled "LDS Family and Church History Department Adopts XML Standard." In the article, I wrote, "Randy Bryson of The Church of Jesus Christ of Latter-day Saints announced that the Church is now standardizing on the XML programming language for all future software products."

That article generated a lot of follow-up e-mail messages. Lots of people asked about this new standard while others asked, "What is XML?" I will say that XML is short for "eXtensible Markup Language," a programming language that is becoming very popular on the World Wide Web. For any further details, I would refer the reader to XML.com or any of several excellent technical books on the subject.

As to the first question, "What is this new standard?" I will say that this announcement is an indication of the Family and Church History Department’s future directions. It is not yet a defined specification. It simply is an announcement that a new file standard will be created and that it will be based upon XML.

In the past week several people have expressed concerns about other parts of the announcement. While most people applaud the adoption of XML, some were disappointed that the Family and Church History Department apparently is looking only at XML as a replacement method that will be used to handle the same data elements as today’s GEDCOM standard. Many genealogists were hoping that the Family and Church History Department would improve the data handling to avoid the ambiguities and shortcomings of today’s data handling.

I was planning to write some more details in this week’s newsletter about the XML announcement by the Family and Church History Department. As I was struggling with the wording, I received a copy of a message that Bob Velke posted to the TMG-L mailing list in response to questions triggered by this newsletter. Bob is President of Wholly Genes Software, the producers of The Master Genealogist. He is also a member of the GENTECH Lexicon Working Group and is an accomplished programmer who is an expert in GEDCOM. In short, he is an expert on the topic and much more knowledgeable about the ins and outs of GEDCOM and XML than I am. His message explains the adoption of XML and the potential shortcomings much more clearly than what I could write.

Bob Velke has kindly given permission for me to re-publish his message in its entirety in this newsletter:

As Dick Eastman reported in his Feb. 12 newsletter, the Family and Church History Department of the LDS Church has announced its intention to use XML for all future software products, including future versions of GEDCOM. While I agree that the move toward XML "eventually will benefit all genealogists," I've received a flood of e-mails from researchers wondering what XML is and in what way it may have (in Dick’s words) "an immediate impact on producers of genealogy software."

In a nutshell, XML is a generalized "markup language": a way of defining "tags" or codes for data within a document to make it easier to interpret and communicate. XML shares a common ancestry and in many ways looks similar to HTML (the stuff behind most Web pages). The strength of XML, however, is in the fact that it is extensible (XML=eXtensible Markup Language), meaning that it is not limited to supporting any particular data types. Using XML, someone (like the folks at the LDS Church) can define a set of "supported" data types (person, event, date, place, etc.) and the appropriate relationships between them. That set of "rules" is then published and becomes the template to which two or more programs can map their data, resulting in a system for exchanging genealogical data.

So XML is not a communication standard in itself. It is the language with which the rules of a communication standard can be defined. GEDCOM 5.5 (and previous) did something similar but it used a proprietary language for expressing those rules. XML, on the other hand, is a worldwide standard. Given access to the file containing the GEDCOM-specific "rules," any XML-aware application (browser, editor, publisher, indexer, search engine, etc.) will automatically become a GEDCOM-aware application.

Any software developer who is surprised by the LDS announcement must have had his head in the sand for the last several years. The entire programming world has embraced XML for years, especially in relation to Web integration and data exchange. As evidenced by any technical journal or computer bookstore, XML is all the rage. If the LDS Church or any other developer announced that it did NOT intend to embrace XML in future products, then THAT would be news <g>.

But the LDS Church did not announce the release of a XML version of GEDCOM. They merely announced the intention to do so. Software developers should know that the Church made the same announcement in September 2000 at the FGS Conference. It was widely reported on the GEDCOM-L mailing list and others and is no secret. So while this may be interesting news to researchers, it should not have any "immediate impact" on software developers . . . except perhaps as a wake-up call for those who have not been paying attention.

At that same conference last year, however, the LDS Church made an announcement that was much more startling and is more likely to have a measurable impact on researchers. As readers may know, GEDCOM was originally developed as a means for LDS members to communicate family history data with the Church and among themselves. Over the years, the larger genealogical community (developers and non-LDS researchers) appealed to the Family and Church History Department to expand and modify GEDCOM to accommodate genealogical issues that are not among the Church's core needs. The Family and Church History Department modified its original mission and in GEDCOM 5.5 and the "Future Directions" data model tried to address those needs as a service to that larger community.

At the FGS conference, however, representatives of the LDS Church announced that they have come to realize that their efforts to accommodate the larger genealogical community has diminished their ability to pursue their original mission to serve Church members. As a result, they have resolved to re-focus on that original mission, even to the extent that "professional researchers" (their words), sociologists, and others may not be accommodated.

No one should begrudge the LDS Church its right to direct its efforts in ways that best serve its membership. But the message for the rest of the genealogical community is that we should no longer expect that GEDCOM will evolve in any direction that diverges from the core needs of the LDS Church.

Specifically, the LDS representatives announced that they have abandoned the "Future Directions" data model and that future products will revert to a conceptual blueprint of genealogical data that more closely matches that of GEDCOM 5.5. That is bad news for software developers and researchers who had hoped that GEDCOM would evolve in more worldly directions in order to overcome its many limitations.

In the most startling part of the announcement, however, the LDS representatives reported that they have abandoned any plans for future versions of GEDCOM to distinguish evidence from conclusions! While the lack of such a distinction is widely seen as the single most glaring weakness of GEDCOM 5.5, the issue was described by LDS representatives as "not fundamental to the needs of our Church members."

As a result of these revelations, some developers and researchers question whether ANY future announcement about GEDCOM is likely to have any measurable impact (immediate or otherwise) on the genealogical community. Absent any meaningful evolution of the underlying data model, GEDCOM’s shift towards XML is largely one of representation alone. That is, if the "rules" support the same limited data types and relationships, a new wrapper is just window-dressing.

The proposal to rewrap the same old GEDCOM within XML should be anti-climactic for developers who saw Michael Kay do the same thing several years ago, posting it to the Web as "GEDML". But while the power of XML alone does offer a few advantages (better support for diacritic characters, multimedia links, and text formatting, for instance), the most substantial problems of data transfer are perpetuated by GEDCOM’s limited data model. Insofar as completeness and accuracy of data transfers, it seems that v5.5 may be as good as GEDCOM will ever get.

Indeed, the introduction of yet another GEDCOM specification with minimal improvements may simply complicate the lives of researchers, diluting what semblance of a "standard" has been forged over the last few years. Add to that the likelihood (if history is any judge) that a variety of draft specifications will be implemented in commercial products, and you have a recipe for even less reliable communication. At least three such GEDCOM drafts (v5.5.1, v5.6, and v6.0) are already in limited circulation.

Until a communication standard is developed that will distinguish evidence from conclusions, recognize non-traditional or ambiguous family constructs (e.g., children with only one parent known to be in common, etc.), and resolve many of the other failings of the current "standard," developers and researchers will have little reason to migrate away from GEDCOM 5.5. We are at least accustomed to dealing with those limitations and its various software permutations ("Better the devil you know . . ."). Unless or until the LDS Church announces an intention to discontinue support for GEDCOM 5.5 within its future products (a step I hope would be recognized as unwise), I see very little incentive for software developers to scramble to support GEDCOM XML 6.0.

So what is the prospect of an effective replacement for GEDCOM 5.5? Dick Eastman described Wholly Genes Software’s GenBridge technology as one alternative to GEDCOM. As users of The Master Genealogist know, GenBridge reads data directly from the most popular program formats and typically produces a much more complete and accurate data transfer than is possible with GEDCOM (click here for more information). But GenBridge does not produce an intermediate or archival format and it does not have an export element. That is, it is strictly an import technology. So while it does provide much better results than GEDCOM for specific tasks (and you will soon see some Web databases using GenBridge to process submissions in FTW, PAF, and other formats), that is not the same as a communication standard. GenBridge has never been intended as a wholesale replacement of GEDCOM.

The need for improved communication of genealogical data is not a new idea, and many efforts are being made in that direction (see links below). But progress is much slower than anyone would like. GENTECH’s Lexicon Working Group (sponsored in part by APG, BCG, FGS, NEHGS, and NGS) completed its first phase of work with the publication of the GENTECH Genealogical Data Model (GDM), which was the first organized attempt to thoroughly document the nature of genealogical data. The GDM took several years to develop, but it has spawned a number of data modeling discussions among developers and has furthered the goal of developing a common genealogical lexicon, both of which are prerequisites for any new communication standard.

The next phase of GENTECH’s efforts (the "LeXML project") is implementing the GDM within an XML framework. The LDS Church has expressed a willingness to participate in LeXML and to sanction the project if it can be developed in such a way that, as a subset, it supports their data model and objectives. I believe that this cooperative effort offers the best prospects for a long-term solution that will serve the interests of the entire genealogical community—but I believe that it is likely to be several more years before researchers see a new communication standard that is developed through it or any of the similar efforts listed below.

Bob Velke
President, Wholly Genes Software
and member of the GENTECH Lexicon Working Group

Referenced links and other related projects:

I would like to thank Bob Velke for two things: (1) writing an expert’s explanation of the issues, and (2) allowing me to publish his explanation in this newsletter.

  • Read the next article in this issue.
  • Return to the previous article in this issue.
  • Return to the Table of Contents.

  •   Printer Friendly
     
    E-mail to a friend

    Search The Library