That article generated a lot of follow-up e-mail messages. Lots
of people asked about this new standard while others asked, "What is XML?"
I will say that XML is short for "eXtensible Markup Language," a
programming language that is becoming very popular on the World Wide Web. For
any further details, I would refer the reader to XML.com or any of several excellent technical books on the subject.
As to the first question, "What is this new standard?"
I will say that this announcement is an indication of the Family and Church History
Department’s future directions. It is not yet a defined specification. It
simply is an announcement that a new file standard will be created and that it
will be based upon XML.
In the past week several people have expressed concerns about
other parts of the announcement. While most people applaud the adoption of XML,
some were disappointed that the Family and Church History Department apparently is looking
only at XML as a replacement method that will be used to handle the same data
elements as today’s GEDCOM standard. Many genealogists were hoping that the
Family and Church History Department would improve the data handling to avoid the
ambiguities and shortcomings of today’s data handling.
I was planning to write some more details in this week’s
newsletter about the XML announcement by the Family and Church History Department. As I was
struggling with the wording, I received a copy of a message that Bob Velke
posted to the TMG-L mailing list in response to questions triggered by this
newsletter. Bob is President of Wholly Genes Software, the producers of The
Master Genealogist. He is also a member of the GENTECH Lexicon Working Group and
is an accomplished programmer who is an expert in GEDCOM. In short, he is an
expert on the topic and much more knowledgeable about the ins and outs of GEDCOM
and XML than I am. His message explains the adoption of XML and the potential
shortcomings much more clearly than what I could write.
Bob Velke has kindly given permission for me to re-publish his
message in its entirety in this newsletter:
As Dick Eastman reported in his Feb. 12 newsletter, the
Family and Church History Department of the LDS Church has announced its intention to
use XML for all future software products, including future versions of
GEDCOM. While I agree that the move toward XML "eventually will benefit
all genealogists," I've received a flood of e-mails from researchers
wondering what XML is and in what way it may have (in Dick’s words)
"an immediate impact on producers of genealogy software."
In a nutshell, XML is a generalized "markup
language": a way of defining "tags" or codes for data within
a document to make it easier to interpret and communicate. XML shares a
common ancestry and in many ways looks similar to HTML (the stuff behind
most Web pages). The strength of XML, however, is in the fact that it is
extensible (XML=eXtensible Markup Language), meaning that it is not limited
to supporting any particular data types. Using XML, someone (like the folks
at the LDS Church) can define a set of "supported" data types (person,
event, date, place, etc.) and the appropriate relationships between them.
That set of "rules" is then published and becomes the template to
which two or more programs can map their data, resulting in a system for
exchanging genealogical data.
So XML is not a communication standard in itself. It is the
language with which the rules of a communication standard can be defined.
GEDCOM 5.5 (and previous) did something similar but it used a proprietary
language for expressing those rules. XML, on the other hand, is a worldwide
standard. Given access to the file containing the GEDCOM-specific
"rules," any XML-aware application (browser, editor, publisher,
indexer, search engine, etc.) will automatically become a GEDCOM-aware
application.
Any software developer who is surprised by the LDS
announcement must have had his head in the sand for the last several years.
The entire programming world has embraced XML for years, especially in
relation to Web integration and data exchange. As evidenced by any technical
journal or computer bookstore, XML is all the rage. If the LDS Church or any other
developer announced that it did NOT intend to embrace XML in future
products, then THAT would be news <g>.
But the LDS Church did not announce the release of a XML version of
GEDCOM. They merely announced the intention to do so. Software developers
should know that the Church made the same announcement in September 2000 at the
FGS Conference. It was widely reported on the GEDCOM-L mailing list and
others and is no secret. So while this may be interesting news to
researchers, it should not have any "immediate impact" on software
developers . . . except perhaps as a wake-up call for those who have not been
paying attention.
At that same conference last year, however, the LDS Church made an
announcement that was much more startling and is more likely to have a
measurable impact on researchers. As readers may know, GEDCOM was originally
developed as a means for LDS members to communicate family history data with
the Church and among themselves. Over the years, the larger genealogical
community (developers and non-LDS researchers) appealed to the Family and Church History Department to expand and modify GEDCOM to
accommodate genealogical issues that are not among the Church's core needs.
The Family and Church History Department modified its original mission and in GEDCOM 5.5 and the "Future
Directions" data model tried to address those needs as a service to
that larger community.
At the FGS conference, however, representatives of the LDS Church
announced that they have come to realize that their efforts to accommodate
the larger genealogical community has diminished their ability to pursue
their original mission to serve Church members. As a result, they have
resolved to re-focus on that original mission, even to the extent that
"professional researchers" (their words), sociologists, and others
may not be accommodated.
No one should begrudge the LDS Church its right to direct its
efforts in ways that best serve its membership. But the message for the rest
of the genealogical community is that we should no longer expect that GEDCOM
will evolve in any direction that diverges from the core needs of the LDS
Church.
Specifically, the LDS representatives announced that they
have abandoned the "Future Directions" data model and that future
products will revert to a conceptual blueprint of genealogical data that
more closely matches that of GEDCOM 5.5. That is bad news for software
developers and researchers who had hoped that GEDCOM would evolve in more
worldly directions in order to overcome its many limitations.
In the most startling part of the announcement, however, the
LDS representatives reported that they have abandoned any plans for future
versions of GEDCOM to distinguish evidence from conclusions! While the lack
of such a distinction is widely seen as the single most glaring weakness of
GEDCOM 5.5, the issue was described by LDS representatives as "not
fundamental to the needs of our Church members."
As a result of these revelations, some developers and
researchers question whether ANY future announcement about GEDCOM is likely
to have any measurable impact (immediate or otherwise) on the genealogical
community. Absent any meaningful evolution of the underlying data model,
GEDCOM’s shift towards XML is largely one of representation alone. That
is, if the "rules" support the same limited data types and
relationships, a new wrapper is just window-dressing.
The proposal to rewrap the same old GEDCOM within XML should
be anti-climactic for developers who saw Michael Kay do the same thing
several years ago, posting it to the Web as "GEDML".
But while the power of XML alone does offer a few advantages (better support
for diacritic characters, multimedia links, and text formatting, for
instance), the most substantial problems of data transfer are perpetuated by
GEDCOM’s limited data model. Insofar as completeness and accuracy of data
transfers, it seems that v5.5 may be as good as GEDCOM will ever get.
Indeed, the introduction of yet another GEDCOM specification
with minimal improvements may simply complicate the lives of researchers,
diluting what semblance of a "standard" has been forged over the
last few years. Add to that the likelihood (if history is any judge) that a
variety of draft specifications will be implemented in commercial products,
and you have a recipe for even less reliable communication. At least three
such GEDCOM drafts (v5.5.1, v5.6, and v6.0) are already in limited
circulation.
Until a communication standard is developed that will
distinguish evidence from conclusions, recognize non-traditional or
ambiguous family constructs (e.g., children with only one parent known to be
in common, etc.), and resolve many of the other failings of the current
"standard," developers and researchers will have little reason to
migrate away from GEDCOM 5.5. We are at least accustomed to dealing with
those limitations and its various software permutations ("Better the
devil you know . . ."). Unless or until the LDS Church announces an intention to
discontinue support for GEDCOM 5.5 within its future products (a step I hope
would be recognized as unwise), I see very little incentive for software
developers to scramble to support GEDCOM XML 6.0.
So what is the prospect of an effective replacement for
GEDCOM 5.5? Dick Eastman described Wholly Genes Software’s GenBridge
technology as one alternative to GEDCOM. As users of The Master Genealogist
know, GenBridge reads data directly from the most popular program formats
and typically produces a much more complete and accurate data transfer than
is possible with GEDCOM (click here for more information).
But GenBridge does not produce an intermediate or archival format and it
does not have an export element. That is, it is strictly an import
technology. So while it does provide much better results than GEDCOM for
specific tasks (and you will soon see some Web databases using GenBridge to
process submissions in FTW, PAF, and other formats), that is not the same as
a communication standard. GenBridge has never been intended as a wholesale
replacement of GEDCOM.
The need for improved communication of genealogical data is
not a new idea, and many efforts are being made in that direction (see links
below). But progress is much slower than anyone would like. GENTECH’s
Lexicon Working Group (sponsored in part by APG, BCG, FGS, NEHGS, and NGS)
completed its first phase of work with the publication of the GENTECH
Genealogical Data Model (GDM),
which was the first organized attempt to thoroughly document the nature of
genealogical data. The GDM took several years to develop, but it has spawned
a number of data modeling discussions among developers and has furthered the
goal of developing a common genealogical lexicon, both of which are
prerequisites for any new communication standard.
The next phase of GENTECH’s efforts (the "LeXML
project") is implementing the GDM within an XML framework. The LDS Church has
expressed a willingness to participate in LeXML and to sanction the project
if it can be developed in such a way that, as a subset, it supports their
data model and objectives. I believe that this cooperative effort offers the
best prospects for a long-term solution that will serve the interests of the
entire genealogical communitybut I believe that it is likely to be
several more years before researchers see a new communication standard that
is developed through it or any of the similar efforts listed below.
Bob Velke
President, Wholly Genes Software
and member of the GENTECH Lexicon Working Group
Referenced links and other related projects:
I would like to thank Bob Velke for two things: (1) writing an
expert’s explanation of the issues, and (2) allowing me to publish his
explanation in this newsletter.