At a technical session of the GENTECH 2001 conference last week,
Randy Bryson of The Church of Jesus Christ of Latter-day Saints announced that the Church is now standardizing on the XML programming language
for all future software products. This announcement will have an immediate
impact on producers of genealogy software and eventually will benefit all
genealogists.
Mr. Bryson is the director of the FamilySearch Internet
Genealogy Service for the LDS Family and Church History Department and also is the
Information Technology manager over the Ancestral File, Resource Files, Research
Guidance and Extraction applications. As such, he is responsible for
compatibility among these products. The de facto data exchange standard for many
years has been GEDCOM, a file format that is well-known for its imperfections.
GEDCOM, an abbreviation for Genealogy Data COMmunications, was created by the
LDS Church in the mid-1980s as a method of exchanging genealogy data between
different programs. The specifications for GEDCOM file format have been updated
a few times since then, and GEDCOM files have become the most common method of
exchanging data between distant relatives. GEDCOM files also are used to
contribute an individual’s data to the large, centralized databases of the LDS
Church and other organizations.
In its first iteration, GEDCOM files consisted of ASCII text.
Unlike binary files used by most other programs, you can open a GEDCOM file with
a simple text editor and read the data contained therein. Later versions of
GEDCOM were expanded to include ANSEL and Unicode, in addition to ASCII. Because
of these updates, GEDCOM files can now handle umlauts and accents and other
marks common in European alphabets. However, you can still read this data with a
text editor, such as Windows Notepad.
GEDCOM has always suffered from numerous shortcomings, one
limitation being the use of text. Other limitations have included difficulties
with handling non-European names, handling imprecise data, and also the method
of handling contradictory data such as we all find in genealogy research.
In the 1990s, two separate and exhaustive studies of exchanging
data between genealogy programs were made. The two were conducted more or less
simultaneously:
- One study was the GEDCOM Testbook Project, funded by GENTECH. The results
of that project are called "GEDCOM Interchange Study Summary."
The GENTECH effort later spun off a second, larger study, called the GENTECH
Genealogical Data Model. While not dealing directly with the GEDCOM standard,
it does address many issues that GEDCOM programmers need to be familiar with.
- The other study was conducted by the Family and Church History Department of the LDS
Church. It resulted in the GEDCOM Future Directions document, published by the
Family and Church History Department. [To view this document, click on the link above; then double-click the "Future" folder and open the Gedfmstr.pdf documenta PDF file that will require Adobe Acrobat Reader to read.]
The two studies were different in scope and purpose. The
conclusions and recommendations of the two were also somewhat different although
similar in some ways. It is interesting to note that the XML standard was mostly
unknown at the time these studies began but came into prominence before the
conclusion of these studies. While XML was not cited as a specific
recommendation in either study, I have since heard the authors of both studies
make reference to XML as a possible solution to some of the shortcomings of
today’s methodologies.
XML is an abbreviation for "Extensible Markup
Language," a programming language that has become very popular for
applications that function on the World Wide Web. If you have made airline
reservations online or purchased other goods from an online merchant, you have
probably used an XML-based application without realizing it. A discussion of XML
is beyond the scope of this article. For reference, I would suggest you start at XML.com
or with any of the many good books on the topic available at your local
bookstore.
I also should mention another alternative to GEDCOM’s
shortcomings: Wholly Genes Software created GenBridge, a different method of
directly transferring data between different databases that does not use GEDCOM
at all. While Wholly Genes has had great success with GenBridge, other software
producers have not yet adopted it.
Randy Bryson’s announcement of the adoption of XML illustrates
the LDS Church’s concerns and plans. Obviously, the programmers at the Family and Church History Department have read these two studies and are proceeding with some of
the recommendations. The introduction of XML will increase accuracy as well as
allow for the use of non-European characters. A future release of the GEDCOM
standard will be XML-based. The LDS databases will also accept XML data,
databases such as the Ancestral File, Pedigree Resource File, International
Genealogical Index, and others.
My guess is that the commercial Internet genealogy databases (Ancestry.com,
Genealogy.com, OneGreatFamily.com, etc.) will also convert to XML input, perhaps
even before the LDS Church completes its conversion. Obviously, all the
genealogy programs used by individuals will also need to produce XML-formatted
GEDCOM files in compliance with the new specification. I am sure we will see
future versions of The Master Genealogist, Personal Ancestral File, Family Tree
Maker, Family Origins, Legacy, and other genealogy programs that will produce XML
files, once the new GEDCOM replacement format has been defined.
None of this exists today. Randy Bryson’s announcement simply
indicates a future course. I suspect it will be two years or even longer before
the new XML format is in place and in use. However, the benefits will justify
the wait.
Read the next article in this issue.
Return to the Table of Contents.