Member Login
Username Password (Forgot?)
You are here: Learn > The Library > Columnists > "Along Those Lines"

"Along Those Lines"
1/22/1999 - Archive


The Digitization of Genealogical Data
Whenever I work with microfilmed U.S. census records, I wish for a simpler means of locating and accessing information. Certainly it is wonderful to have census finding aids such as the indexes that direct us to specific areas of microfilm. No matter how great the index, however, after spending some time in front of the reader my eyes are spinning in my head like the wheels in a Las Vegas slot machine.

How wonderful it would be if computers had been available when the census records were being microfilmed. Instead of the grainy and uneven images we see on film today, census records could have been scanned and digitized. Brightness and contrast could have been adjusted to provide a clearer, more readable image. And perhaps the census indexes could have been created in searchable databases with hyperlinks directly from the index entry to the digitized image of the page that we want to see.

The digitization of primary source records could not be imagined half a century ago. As recently as ten years ago, the scanning and storage of such data was inconceivable because of technology limitations and the costs of scanning equipment and computer storage space. Now that scanners are cheap and computer storage media are more compact and inexpensive, the futuristic dream of digitizing information of genealogical value is within our grasp.

In "Along Those Lines . . ." this week, let's talk about some of the issues involved in making this dream a reality. There are many questions to be answered before embarking on the project. These include:

  • Who will fund the project?
  • Who will decide the scope?
  • What form will the presentation take?
  • What of fragile records?
  • Who will perform the labor?
  • Where will the data be stored and in what format and medium?
  • How will the public access data?
  • Will there be a cost to access the data?

Let's discuss each one.

Who Will Fund and Decide the Scope of the Project?
I have no doubts that most or all public records will all be computerized at some point. It is not just genealogists who would use these records; lawyers, researchers, historians, sociologists and many others would find computerized access to such data invaluable.

A primary question, of course, is "Who will fund such a project?" Federal, state and local governments are under great pressure to control costs. Will they be willing to approve and earmark funds for digitization of records? The process of scanning a record, adjusting the image to its most readable form, and saving it to disk can be a labor intensive effort. Multiply the effort by the many billions of pieces of paper records that exist, and you have a monumental and expensive task. In addition, the cost of providing computer storage space in an expensive proposition.

Like every other well-run business, were governments to fund such a project, they would need to justify the expenditures to taxpayers and to develop a cost recovery scenario to pay for the project. A scaled approach, beginning perhaps with birth, marriage and death records and expanding to other types of records, might be the most advantageous plan. Over time, perhaps land and probate records, court minutes, and the variety of other records could be added.

Genealogical and historical societies certainly lack the funding required to undertake such an effort, and the USGenWeb organization is not prepared for a project of this scope.

It is possible that a combined public-private partnership could be developed, with governments, companies and organizations with a vested interest in ready access to such records, and individual contributors. Such a partnership could get the project off the ground and, once started, it is possible that income generated by paying users could sustain the system.

Whoever funds the project will undoubtedly determine what direction the project might take. This would include what records to digitize initially, whether to index and link records together, where the data will be stored, how it will be accessed, what if any charge there will be for access to digitized records, and a variety of other issues.

What Form Will the Presentation Take?
The organization of the data is an important consideration. The type of data will determine its organization. Census records, for instance, are organized on microfilm by geographical area, enumeration district, by page and then by line entry. Indexes are organized by state and then by name. Marriage records are organized by county or town or other division, sequentially (for the most past as registered), and indexed by groom's name and hopefully also by bride's name. Land and tax records require a vastly different organizational structure depending on the location, the type of record, and the local government's structure.

If there are so many different records, and different ways to organize them, what form will the presentation of the data take? Will there need to be a standardized organizational and indexing structure devised to simply the indexing and access of the records?

Should all records be scanned in their original size or scaled to make them fit on a standard 8.5"x11" page when printed? For larger documents, this could be a nightmare. However, a record printed in such a way would provide a means of seeing the details and, if necessary, a photocopy of the original could be ordered from the holding archive.

Finally, should every digitized record contain a citation in the margin? Gee, wouldn't that be a nice addition!

Hopefully the determination of the data's organization and presentation would be made by a team of representatives from all interested areas.

What of Fragile Records?
What can be done with those records that are fragile and delicate, whose merely handling may be destructive? What about light-sensitive materials that might be damaged by the scanning process? Officials, curators, librarians, archivists, and administrators will need to make decisions to exclude specific records from the digitization process for these reasons. However, the records must be included in the process of building the indexes so that researchers are aware of the records' existence. Perhaps transcriptions should be included in lieu of a digitized document.

Who Will Perform the Labor?
History tells us that the copying of records, in whatever form, should be done by qualified people. If you've worked with microfilmed census and Soundex records, you know all too well that the work was not well supervised, there was little quality control, and that many records were so sloppily filmed as to be unreadable.

Digitization of the records needs a better approach. Qualified and certified technicians are required. They must certainly be experts in the use of computers and scanning equipment. They must also understand archive concerns and have an understanding of the records they are processing.

Where Will the Data Be Stored and in What Format and Medium?
A key consideration is where the data can and will be stored. Will dedicated computers be allocated to the storage of digitized records and their indexes? Will these be large mainframe computers, midrange, or a network of PCs with large clusters of data storage devices? Should data be stored on hard disks, floppy disks, CD-ROMs, or other new media? What will the backup plan involve in the event of a computer crash—for both the computer and the data?

Data format is a concern too. Digitization of document images involves the use of a variety of file formats. Bitmap files, although high in resolution, are data intensive and take a long time to download. Other image file formats are more economical in size, such as .JPG and .GIF files, but sacrifice something in resolution. The technician performing the scanning must be the one to determine what format provides the best possible image.

Another format concern is whether to use the .PDF document format for text files rather than the older text file format (.TXT) or any of the word processor formats. The .PDF format has the advantage of providing a common, high quality standard that incorporates the best of word processing and formatting without having a dependency on a specific word processor or a version/release of it. .PDF files are being used extensively at Federal government Web sites today, such as the Library of Congress's Thomas site. The software used to read .PDF files is the Adobe Acrobat Reader, free for download from the Adobe Web site. This format will probably become outdated as technology advances so it becomes important to make intelligent decisions about textual document formats that anticipate future needs.

How Will the Public Access Data?
Remote access to information is imperative. With the advent of the Internet's World Wide Web a mere 6 years ago, we have become dependent on rapid access to huge amounts of information from our home and office computers. There is no reason to expect the public to use any other means to access digitized records.

The days of purchasing CD-ROM products with copies of images and transcribed records are almost over. Only if there is no other alternative available should you consider investing in CDs. Five years ago I purchased a CD of the Social Security Death Index because that was the only way I could access it. The SSDI is now available at Web sites on the Internet, most notably at Ancestry.com who maintains a current version and provides a link from each record to a slick letter-writing facility you can use to request records from the Social Security Administration. Therefore, I would never consider buying another CD-ROM version of the SSDI.

Online databases and archives are rapidly being developed and their functionality is expanding. The National Archives and Records Administration (NARA) maintains a massive, expanding database called NAIL in which there are huge amounts of indexed information about its holdings. Accessible at http://www.nara.gov/nara/nail.html, it also contains a tremendous collection of digitized records—photographs, historic posters, documents (many of a genealogical nature), maps, and a selection of fascinating sound files. As a leader in archival maintenance and in providing access to citizens, NARA's NAIL database is a fine example of what could be done with the digitization of records.

Will There Be a Cost to Access the Data?
Most genealogists are used to contacting archives, courthouses, libraries and other places to obtain copies of records. There is almost always a cost for the copies. A growing number of us are subscribers to "pay databases" where many kinds of genealogical data is available. These databases are usually accessible on an unlimited basis for a flat monthly, quarterly or annual subscription fee. The fee covers the cost of the computer storage, the telecommunications equipment and expense, the acquisition of new materials for the database and the personnel needed to develop and maintain the facility. Considering all of these factors, a good subscription database is a bargain. Companies such as Ancestry.com (http://www.ancestry.com) have proved it can be a viable and cost effective means of providing access.

Placing digitized information into a pay database seems a viable option for storage. Genealogists and other researchers, already accustomed to paying for photocopies of records from courthouses and the like, could subscribe to the database and pay a per copy charge. They might also have a "pay-per-use" option, not unlike pay-per-view cable events, where they provide credit card information to access the database and print records.

When weighed against the costs of traveling to a location to access the original record, or the cost of letter writing to request and pay for photocopies, working with an Internet-based Web site with a pay database seems an attractive method of gaining access to information.

The Future Is Now . . . For a Price
As you can see, there are a lot of questions to be answered, but the technology to digitize public records is available today. The process of identifying records to be digitized, scanning and indexing them, storing and maintaining them on computers, and providing telecommunications equipment to provide access is not inexpensive. There is a cost associated with the effort.

It is important to let local, state and national government officials know that you, personally, are interested in the availability of online access to digitized copies of the records they hold. Let your genealogical and historical societies know of your interest too. Let them know by writing letters and e-mail.

The future is available today . . . for a price. The monies will never be allocated unless you let your interest be known.

Happy hunting!

George



Copyright 1998 George G. Morgan
All Rights reserved

"Along Those Lines ..." is a weekly feature of the Genealogy Forum
on America Online (Keyword: ROOTS).

This column originally appeared in the Genealogy Forum on America Online.

You may send e-mail alonglines@aol.com. George Morgan would like to hear from you but, because of the volume of e-mail, is unable to personally respond to each letter individually. He also regrets that he cannot assist you with your personal genealogical research.


  Printer Friendly
 
E-mail to a friend

Search The Library