You are here: Learn > The Library > Magazines > Ancestry Magazine

Ancestry Magazine
7/1/2000 - Archive

July/August 2000 Vol. 18 No. 4

U.S. Census 2000: How America "Keeps" What America Needs

In a quiet moment, I got online and filled out the 2000 census form via the Census Bureau’s Web site. It took less than five minutes to fill out and return my family’s 2000 Census questionnaire—names, ages, relationships, ethnicity, telephone number, and status of home ownership.

As I responded to the questions and finally pushed the "Send" button, I began to think about what I had done. I had just created an original source record regarding the life of my family that had never been (and likely will never be) committed to paper. From my keystrokes and across cyberspace to the Census Bureau’s Web server, no paper record was created. In 2072, when my posterity will be searching original sources for details of their ancestors’ lives, they won’t see my chicken scratches on government-printed paper. The "document" in the term "source document" no longer applies to this future genealogical record.

The 2000 U.S. Census is now behind us. Ahead of us are some necessary decisions of census preservation, but the technology used to collect and tabulate the 2000 Census makes these decisions unique.

No Paper Trail
My census data will join the data of millions of others to be processed in the Internet Data Collection (IDC) program. The Census Bureau estimates the number of respondents using the Internet method will be anywhere from one percent to as much as twelve percent of the total census respondents.

This Internet-submitted information will be further merged with other census responses that were not submitted on paper either. These include the results of the Telephone Questionnaire Assistant program, in which operators ask census questions and enter the answers directly into the system. Some two percent of all census questionnaires will be answered over the phone. So as many as fourteen percent of census responses will have no corresponding paper form. Therefore, only about eighty-five percent of the 2000 Census will be recorded on paper. This has important implications when considering how to preserve the census for the future.

Image Is Everything
Perhaps the most innovative part of processing Census 2000 is how the paper forms are handled once they are returned to the Census Bureau. Returned paper forms are distributed to one of four data capture centers around the country. These centers process the forms using the Data Capture System (DCS) 2000, which checks the forms into the center and digitally images and optically reads the information on the census forms. It then converts the data electronically for transmission to the Census Bureau.

These are optically scanned images of the completed census forms–but only about eighty-five percent of the total responses. Tests of the equipment show that the OMR (machine that scans checked boxes) translates the paper-based information into electronic files at an accuracy rate of 99.69 percent on the short forms and 99.64 percent on the long forms. OCR (machine that scans handwriting) accuracy is 99.10 percent for the short forms and 99.67 percent for the long forms.

The census forms that could not be scanned, e.g., the enumeration of institutions such as hospitals, prisons, and shelters, are eventually funneled into the Individual Census Record File (ICRF). The ICRF is the master file for the census. It contains all recorded responses from the householders, plus other information. This single ASCII data file represents all the data recorded by the 2000 Census.

Preservation Decisions
Once the Census Bureau has completed its tabulation and made its reports to the public, it has no further use for the documents themselves. The Bureau only needs the original paper forms and electronic images of the census, the electronic responses, and other electronic files of the census for administrative purposes after the census is complete. After a short period of time, the originals are no longer of value to the Census Bureau.

Enter the National Archives and Records Administration. NARA is charged with preserving the nation’s past by overseeing the management of federal records. NARA preserves records that help ensure the rights of citizens, that document the actions of government, or that have other historical value.

The 2000 Census is only one example of the many records that NARA must decide whether or not to preserve. With so much of the 2000 Census in electronic format, the announced creation of an Electronic Records Archives will no doubt play a large part in the preservation of the census. So far, because NARA has had to make hard choices in the face of rapidly changing technologies, they have chosen to preserve only certain parts of the census.

The completed paper census forms will be destroyed confidentially by the Census Bureau fifteen days after confirmation of data transmission reaches Census Bureau headquarters. The images of scanned census forms will be destroyed when they are ten years old or when they are no longer needed for evaluation purposes (whichever is later). The Individual Census Record File (ICRF) will be transferred by the Census Bureau to the National Archives for permanent preservation three years after the completion of the census.

NARA has proposed not preserving the scanned images for the following four primary reasons.

1. Scanned Images Are Not Unique
NARA considers the scanned images to be non-unique because they believe that the ICRF represents a reliable copy of the information provided on the images. However, the OMR and OCR processes will only translate information contained in certain designated areas of the paper form. The scanned image of the form contains a near photographic representation of the entire form, complete with doodles, mistakes, eraser marks, cross-outs, and any messages written in the margins. Perhaps most important to a future family historian, examples of ancestors’ handwriting would be preserved in the scanned images. The information on the scanned images and on the ICRF are different. But whether the differences are unique enough to warrant preservation is debatable.

Then there is the question of the errors in the ICRF. Even with extremely low error rates of between .31 and .9 percent, there is the potential of up to 1.3 million errors on the ICRF. These errors will be the result of DCS failures to correctly translate the marks made by the householder. We are all familiar with errors found on past censuses. But we usually encounter these errors as a result of the original record being filled out incorrectly by the householder or enumerator, and not as a result of a transcription process. So up to 1.3 million scanned images of returned Census 2000 forms may indeed be unique. They would represent the only accurate copy of the forms in existence after the paper originals are destroyed.

2. Most Scanned Images Will Be Blank
The total size of the scanned images is estimated to be sixty-two terabytes (trillion bytes). The real problem is that most of the information is blank. Perhaps two-thirds of the census forms’ sheets are unused, since they were not required by the householder to complete. Of the sixty-two terabytes, about forty-one terabytes will be empty pages.

While the volume is very big, the storage cost for this information is surprisingly small. It will cost about $1.2 million dollars per year to store sixty-two terabytes of data. And the costs to store a gigabyte have been falling by a rate of forty percent per year in recent years.

Electronic image files can, of course, be edited. It might be a very big project, but the sixty-two terabytes could be whittled down. Removing the blank areas of the forms would reduce the above estimated storage cost to $420,000 per year. And at some point, the cost of storing the scanned images of eighty-five percent of the 2000 Census will become more affordable as technology advances.

3. Scanned Images Are Not Searchable
Then there is the problem of image locating. The scanned images are organized the same way the paper forms were–in the order in which they were processed. This organizational method is extremely inconvenient when searching for specific information either in the paper forms or in the scanned images. There may be new and better ways of indexing images in the future, but we shouldn’t count on it.

When originally preserved beyond their need by the Census Bureau, past censuses have been neither indexed nor easily searched. Significant efforts had to be made for these censuses to become easily usable for researchers. Reproduction technology like microfilm has allowed the census images to be accessed in a convenient form. Volunteer, commercial, and government efforts have all been used to create appropriate indexes for some of these past censuses.

The bottom line is, simple human effort, organized and aided by existing technologies, could create usable indexes to the scanned images of the 2000 Census without the need for any future technology. Why would we expect the future to be so different from the past?

4. Scanned Images Are Not the Complete Census
Finally, because only eighty-five percent of the 2000 Census will be represented by scanned images, the National Archives has not recommended the preservation of the images. But don’t we all wish we had eighty-five percent of the 1890 Census images available to us now?

However, there are additional problems with the scanned images. Forms not distributed as part of the postal mailing may not have pre-printed address information on them. This may represent as much as twenty percent of all scanned images. Images of only eighty-five percent of the census are clearly not as valuable as an electronic transcription of the entire census, as represented by the ICRF. If forced to choose between the two sources for preservation purposes, the Individual Census Return File is the obvious choice.

But must we discard the scanned images in order to preserve the ICRF? Of course not. The discarding of the scanned images because they do not represent all of the census returns makes sense only if the ICRF is a complete and thorough representation of the scanned questionnaires themselves. Clearly, it is not.

The 2000 Census provides a unique opportunity to preserve a significant part of the census in a computerized, visual format, if NARA chooses to do so. NARA has to make hard choices regarding what records are worthy of preservation. But it is with a great deal of appreciation for the work of the National Archives that I respectfully disagree with their proposal to discard the scanned census images. And I am not alone.

The official bodies of the genealogical community have not been quiet on the issue of preserving the 2000 Census images. The Federation of Genealogical Societies and the National Genealogical Society have a long-standing cooperative venture that addresses records preservation issues such as this. This committee has sent its opinion to NARA, arguing in favor of preserving the scanned images.

The comment period for Proposed Disposition Job Number N1-29-00-02 is now past; however, the Census Bureau has announced plans to save the image files for at least the next ten years. NARA’s reports applaud the Bureau’s decision and indicate that this grace decade may be used for further discussions of the final disposition of the scanned images.

We can improve the research sources available to future genealogists with what we decide now. By preserving the scanned images, we will be allowing our children’s children and their children to have access to all of the census 2000 records, when they need them.

Special Editor's Note
After this issue of Ancestry was sent to press, Archivist of the United States John W. Carlin announced that the scanned image files of individual responses to the 2000 Census will become part of the permanent records. Nearly all of the public comments NARA received urged the permanent retention of the scanned images of the Census questionnaires and forms. Special thanks to all of our readers who vocalized their opinion and impacted the preservation decisions of Census 2000.

Mark Howells is a Certified Information Systems Auditor and a Certified Information Systems Security Profess-ional. He hosts the Norfolk-L genealogy mailing list and is chairman of the Internet Branch of the Norfolk Family History Society.


  Printer Friendly
 
E-mail to a friend

Search The Library