Whenever I work with microfilmed U.S. census records, I wish for a
simpler means of locating and accessing information. Certainly it is
wonderful to have census finding aids such as the indexes that direct us
to specific areas of microfilm. No matter how great the index, however,
after spending some time in front of the reader my eyes are spinning in
my head like the wheels in a Las Vegas slot machine.
How wonderful it would be if computers had been available when the
census records were being microfilmed. Instead of the grainy and uneven
images we see on film today, census records could have been scanned and
digitized. Brightness and contrast could have been adjusted to provide a
clearer, more readable image. And perhaps the census indexes could have
been created in searchable databases with hyperlinks directly from the
index entry to the digitized image of the page that we want to see.
The digitization of primary source records could not be imagined half a
century ago. As recently as ten years ago, the scanning and storage of
such data was inconceivable because of technology limitations and the
costs of scanning equipment and computer storage space. Now that
scanners are cheap and computer storage media are more compact and
inexpensive, the futuristic dream of digitizing information of
genealogical value is within our grasp.
In "Along Those Lines . . ." this week, let's talk about some of the
issues involved in making this dream a reality. There are many
questions to be answered before embarking on the project. These
include:
- Who will fund the project?
- Who will decide the scope?
- What form will the presentation take?
- What of fragile records?
- Who will perform the labor?
- Where will the data be stored and in what format and medium?
- How will the public access data?
- Will there be a cost to access the data?
Let's discuss each one.
Who Will Fund and Decide the Scope of the Project?
I have no doubts that most or all public records will all be
computerized at some point. It is not just genealogists who would use
these records; lawyers, researchers, historians, sociologists and many
others would find computerized access to such data invaluable.
A primary question, of course, is "Who will fund such a project?"
Federal, state and local governments are under great pressure to control
costs. Will they be willing to approve and earmark funds for
digitization of records? The process of scanning a record, adjusting the
image to its most readable form, and saving it to disk can be a labor
intensive effort. Multiply the effort by the many billions of pieces of
paper records that exist, and you have a monumental and expensive task.
In addition, the cost of providing computer storage space in an
expensive proposition.
Like every other well-run business, were governments to fund such a
project, they would need to justify the expenditures to taxpayers and to
develop a cost recovery scenario to pay for the project. A scaled
approach, beginning perhaps with birth, marriage and death records and
expanding to other types of records, might be the most advantageous
plan. Over time, perhaps land and probate records, court minutes, and
the variety of other records could be added.
Genealogical and historical societies certainly lack the funding
required to undertake such an effort, and the USGenWeb organization is
not prepared for a project of this scope.
It is possible that a combined public-private partnership could be
developed, with governments, companies and organizations with a vested
interest in ready access to such records, and individual contributors.
Such a partnership could get the project off the ground and, once
started, it is possible that income generated by paying users could
sustain the system.
Whoever funds the project will undoubtedly determine what direction the
project might take. This would include what records to digitize
initially, whether to index and link records together, where the data
will be stored, how it will be accessed, what if any charge there will
be for access to digitized records, and a variety of other issues.
What Form Will the Presentation Take?
The organization of the data is an important consideration. The type of
data will determine its organization. Census records, for instance, are
organized on microfilm by geographical area, enumeration district, by
page and then by line entry. Indexes are organized by state and then by
name. Marriage records are organized by county or town or other
division, sequentially (for the most past as registered), and indexed by
groom's name and hopefully also by bride's name. Land and tax records
require a vastly different organizational structure depending on the
location, the type of record, and the local government's structure.
If there are so many different records, and different ways to organize
them, what form will the presentation of the data take? Will there need
to be a standardized organizational and indexing structure devised to
simply the indexing and access of the records?
Should all records be scanned in their original size or scaled to make
them fit on a standard 8.5"x11" page when printed? For larger
documents, this could be a nightmare. However, a record printed in such
a way would provide a means of seeing the details and, if necessary, a
photocopy of the original could be ordered from the holding archive.
Finally, should every digitized record contain a citation in the margin?
Gee, wouldn't that be a nice addition!
Hopefully the determination of the data's organization and presentation
would be made by a team of representatives from all interested areas.
What of Fragile Records?
What can be done with those records that are fragile and delicate, whose
merely handling may be destructive? What about light-sensitive
materials that might be damaged by the scanning process? Officials,
curators, librarians, archivists, and administrators will need to make
decisions to exclude specific records from the digitization process for
these reasons. However, the records must be included in the process of
building the indexes so that researchers are aware of the records'
existence. Perhaps transcriptions should be included in lieu of a
digitized document.
Who Will Perform the Labor?
History tells us that the copying of records, in whatever form, should
be done by qualified people. If you've worked with microfilmed census
and Soundex records, you know all too well that the work was not well
supervised, there was little quality control, and that many records were
so sloppily filmed as to be unreadable.
Digitization of the records needs a better approach. Qualified and
certified technicians are required. They must certainly be experts in
the use of computers and scanning equipment. They must also understand
archive concerns and have an understanding of the records they are
processing.
Where Will the Data Be Stored and in What Format and Medium?
A key consideration is where the data can and will be stored. Will
dedicated computers be allocated to the storage of digitized records and
their indexes? Will these be large mainframe computers, midrange, or a
network of PCs with large clusters of data storage devices? Should data
be stored on hard disks, floppy disks, CD-ROMs, or other new media? What
will the backup plan involve in the event of a computer crashfor
both the computer and the data?
Data format is a concern too. Digitization of document images involves
the use of a variety of file formats. Bitmap files, although high in
resolution, are data intensive and take a long time to download. Other
image file formats are more economical in size, such as .JPG and .GIF
files, but sacrifice something in resolution. The technician performing
the scanning must be the one to determine what format provides the best
possible image.
Another format concern is whether to use the .PDF document format for
text files rather than the older text file format (.TXT) or any of the
word processor formats. The .PDF format has the advantage of providing
a common, high quality standard that incorporates the best of word
processing and formatting without having a dependency on a specific word
processor or a version/release of it. .PDF files are being used
extensively at Federal government Web sites today, such as the Library
of Congress's Thomas site. The software used to read .PDF files is the
Adobe Acrobat Reader, free for download from the Adobe Web site. This
format will probably become outdated as technology advances so it
becomes important to make intelligent decisions about textual document
formats that anticipate future needs.
How Will the Public Access Data?
Remote access to information is imperative. With the advent of the
Internet's
World Wide Web a mere 6 years ago, we have become dependent on rapid
access to huge amounts of information from our home and office
computers. There is no reason to expect the public to use any other
means to access digitized records.
The days of purchasing CD-ROM products with copies of images and
transcribed records are almost over. Only if there is no other
alternative available should you consider investing in CDs. Five years
ago I purchased a CD of the Social Security Death Index because that was
the only way I could access it. The SSDI is now available at Web sites
on the Internet, most notably at Ancestry.com who maintains a current
version and provides a link from each record to a slick letter-writing
facility you can use to request records from the Social Security
Administration. Therefore, I would never consider buying another CD-ROM
version of the SSDI.
Online databases and archives are rapidly being developed and their
functionality is expanding. The National Archives and Records
Administration (NARA) maintains a massive, expanding database called
NAIL in which there are huge amounts of indexed information about its
holdings. Accessible at http://www.nara.gov/nara/nail.html, it also
contains a tremendous collection of digitized recordsphotographs,
historic posters, documents (many of a genealogical nature), maps, and a
selection of fascinating sound files. As a leader in archival
maintenance and in providing access to citizens, NARA's NAIL database is
a fine example of what could be done with the digitization of records.
Will There Be a Cost to Access the Data?
Most genealogists are used to contacting archives, courthouses,
libraries and other places to obtain copies of records. There is almost
always a cost for the copies. A growing number of us are subscribers to
"pay databases" where many kinds of genealogical data is available.
These databases are usually accessible on an unlimited basis for a flat
monthly, quarterly or annual subscription fee. The fee covers the cost
of the computer storage, the telecommunications equipment and expense,
the acquisition of new materials for the database and the personnel
needed to develop and maintain the facility. Considering all of these
factors, a good subscription database is a bargain. Companies such as
Ancestry.com (http://www.ancestry.com) have proved it can be a viable
and cost effective means of providing access.
Placing digitized information into a pay database seems a viable option
for storage. Genealogists and other researchers, already accustomed to
paying for photocopies of records from courthouses and the like, could
subscribe to the database and pay a per copy charge. They might also
have a "pay-per-use" option, not unlike pay-per-view cable events, where
they provide credit card information to access the database and print
records.
When weighed against the costs of traveling to a location to access the
original record, or the cost of letter writing to request and pay for
photocopies, working with an Internet-based Web site with a pay database
seems an attractive method of gaining access to information.
The Future Is Now . . . For a Price
As you can see, there are a lot of questions to be answered, but the
technology to digitize public records is available today. The process
of identifying records to be digitized, scanning and indexing them,
storing and maintaining them on computers, and providing
telecommunications equipment to provide access is not inexpensive.
There is a cost associated with the effort.
It is important to let local, state and national government officials
know that you, personally, are interested in the availability of online
access to digitized copies of the records they hold. Let your
genealogical and historical societies know of your interest too. Let
them know by writing letters and e-mail.
The future is available today . . . for a price. The monies will never
be allocated unless you let your interest be known.
Happy hunting!
George
Copyright 1998 George G. Morgan
All Rights reserved
"Along Those Lines ..." is a weekly feature of the Genealogy Forum
on America Online (Keyword: ROOTS).
This column originally appeared in the Genealogy Forum on America Online.
You may send e-mail alonglines@aol.com. George Morgan would like
to hear from you but, because of the volume of e-mail,
is unable to personally respond to each letter individually.
He also regrets that he cannot assist you with
your personal genealogical research.