I frequently mention the acronym "GEDCOM" in this newsletter.
This week, a reader wrote to me with an excellent question: "What is GEDCOM?"
I realized that I havent explained that buzzword in a long, long time.
So here is a brief, non-technical explanation of the term for the newer subscribers
to this publication.
GEDCOM is an abbreviation that stands for GEnealogy
Data COMmunications. In short, GEDCOM is the language by which
different genealogy software programs talk to one another. The purpose is to
exchange data between dissimilar programs without having to manually re-enter
all the data on a keyboard.
To illustrate the importance of GEDCOM, step back in time with
me for a moment. Back before the invention of GEDCOM and before the invention
of the home computer, I entered onto eighty-column punch cards the names and
limited information about 200 or so of my ancestors. I did this after hours
in my employers data center. I then used the employers mainframe
computer that cost hundreds of thousands of dollars to sort the data and to
print a few crude reports. Luckily for me, my employer allowed me to use all
the mainframe time I wanted during the evening, after the company finished its
daily work.
Around 1980, I built my own home computer. I decided to put my
genealogy database onto the new system, but it would not read eighty-column
punch cards. I manually re-typed every bit of data into a dBASE-II program that
I wrote. My database had grown; I had to enter data on 400 or so individuals.
I stored the information on eight-inch floppy disks attached to my homemade
eight-bit CP/M computer that had 64 kilobytes of memory.
Some time later I discovered a CP/M genealogy program that would
operate on my system. Unlike my crude, homemade program, this new genealogy
program printed pedigree charts, family group sheets, and other reports. I decided
to convert to the new, more powerful program (although I must say that it was
rather elementary when compared to todays powerful programs). My database
had grown to about 600 individuals, and I could not find any method of easily
copying that data into the new program. I first printed out the information
from the dBASE-II database. Then I sat at my computer for several evenings,
reading the information on paper and re-typing every bit of it into my new program.
I bet you can guess the next step: I purchased an IBM clone in
1984 and decided to move my data to this new powerhouse. After all, it had 640
kilobytes of memory and a 20-megabyte hard drive that I was certain that I could
never fill. Having been rather active in my genealogy research, now I had information
about 1,200 people to re-enter. I printed out the entire database from the old
system onto paper and then manually re-typed it into the new PC powerhouse.
That effort took weeks, and I promised myself, "Never again!"
Newer genealogy programs appeared in the following years, each
with new features that I found enticing. However, I continued to use the same
program simply because I didnt want to go through the keyboard effort
again. Then the Church of Jesus Christ of Latter-day Saints announced something
new: a file format called GEDCOM. This new proposed standard file format was
designed to allow different genealogy programs to exchange data. There was only
one problem at the time: the only program that could read and write GEDCOM data
was the one written by The Church of Jesus Christ of Latter-day Saints.
GEDCOM is a standard, not a program. As such, genealogy programs
that are going to use the same data have to be written by the programmers to
handle GEDCOM files. If you are trying to transfer data from one program to
another, only to discover that only one of the programs supports GEDCOM, you
are out of luck. Instead, both programs have to support GEDCOM.
Slowly, over a period of several years, other genealogy programs
began to add the ability to read and write GEDCOM files. It was now possible
to move data from one genealogy program to another without manually re-typing
everything. The author of the genealogy program that I used never did add GEDCOM
capability. Luckily for me, someone else eventually wrote a small routine that
would export data from this program in GEDCOM format, and I was then able to
move my data to more powerful new programs.
By 1990, I was writing articles on CompuServe, advising everyone
to never use a genealogy program that lacked GEDCOM capabilities. Luckily, that
is not much of an issue this year. All of todays major genealogy programs
will import and export GEDCOM data. Data transfer is still a problem for those
using older genealogy programs without GEDCOM capability; many people still
find their data trapped in these "islands." For them, there is no
easy solution.
Unlike the "dark ages" of the 1980s, it is now common
for people to use two or three or even more genealogy programs. You may find
one program that you prefer to use for storing all the bits of information that
you encounter in your research efforts. However, you might prefer the printed
reports or multimedia scrapbook features of a different program. Thanks to GEDCOM,
you can easily move your data from one program to another. You can also share
information with distant cousins using yet other genealogy programs by sending
GEDCOM files to each other by e-mail.
The instructions for creating or reading GEDCOM files will vary
from one program to another. You need to consult the programs HELP files
to find the exact sequence of instructions your genealogy program requires.
You need to be aware that the creation of the GEDCOM standard
was not a perfect implementation. For one thing, not all the data fields are
specified precisely in the GEDCOM specifications. Next, not all the programmers
of the various genealogy programs interpreted the specifications in exactly
the same manner. For instance, your present genealogy program might be perfectly
happy with a birth date listed as, "after 1847 but before 1852." However,
once that information is exported in a GEDCOM file and then imported into a
different program, the birth date may say something else. Typically, it is simply
left blank.
Another problem is that not all genealogy programs have the same
ideas about databases. One program may have only one field for "occupation,"
assuming that every person on the face of the earth never, ever changed careers.
Another genealogy program may have the ability to record multiple occupations
during the persons lifetime. When transferring data via GEDCOM from the
more powerful program to the simpler one, some of these occupations will be
lost. These are a couple of simple examples; you can find numerous other inconsistencies
when moving data between dissimilar programs.
Another limitation is the fact that the present GEDCOM standard
was created before the popularity of multimedia. You can transfer textual data,
such as names, dates, and locations rather well in GEDCOM. However, transferring
scanned images, sound clips and movies from one genealogy program to another
is almost impossible to accomplish via GEDCOM files.
There is another problem with translating from one format to another,
that of data integrity. Translating from one programs database to GEDCOM
is sort of the same as translating from one spoken language to another. The
basics work, but subtleties and details sometimes do not translate well. Then,
when translating to the third language (the receiving genealogy programs
database), more translation losses creep in. I well remember reading a technical
manual some years ago that had been written in Japanese and then translated
into Chinese. At a later date, the Chinese version was translated into English.
The resultant English manual was barely readable. The same may happen with translating
a database from Program A into GEDCOM and then from GEDCOM into Program B.
A new method of transferring data between different genealogy
programs was announced some time ago by Wholly
Genes Software. Their GenBridge
technology reads data from one program directly into a second program without
requiring a "double translation" via GEDCOM. The result is a much
more accurate transfer process. However, other genealogy developers have yet
to adopt GenBridge. To date, this technology is only available in software produced
by Wholly Genes: The Master Genealogist
and Family Tree Super Tools.
Despite all the shortcomings, GEDCOM is still a simple and somewhat
effective method of transferring genealogy data from one program to another.
Most of the data will transfer properly, and then there are easy ways of reviewing
the data to look for errors. The names, dates and locations normally transfer
correctly. Text, events, notes and source citations may not always work perfectly.
The exact problems encountered will depend upon the two genealogy programs involved.
Most modern genealogy programs will create an error log of GEDCOM
data imported but not understood by the receiving program. You can read that
log file to see what the program detected as inconsistent, then manually go
in and fix the errors. While tedious, this is still a lot better than re-keying
everything!
A few weeks ago a new GEDCOM standard was proposed that is to
be based upon XML, a programming language that is popular on the World Wide
Web. This new standard should greatly improve data transfer accuracy. See my
article here
for details. However, dont look for this new GEDCOM 6.0 any time soon.
It is still a proposal and probably will not appear in genealogy programs for
another couple of years.
I offer this as a non-technical explanation of GEDCOM plus some
commentary on its use. For more details and for technical explanations of the
inner workings of GEDCOM, I would suggest that you read the following: