Overview of Databases and Indexes
From Ancestry.com Wiki
| General References and Guides
This article is part of a series.
|Introduction to the General References and Guides|
|Overview of Databases and Indexes|
|Database and Index Types|
|List of Specific Databases and Indexes|
|List of Useful Finding Aid References|
A database is any collection of information that is organized for rapid search and easy retrieval. Usually the term refers to computerized (electronic) records, but it can also refer to manual (non-electronic) records. In the past, large databases were published on microfilm and microfiche (manual records), but the recent increase of electronic publishing has led to ever-larger databases being compiled and published. Today, most databases are being posted on the Internet. Others are published on removable media, like compact discs. Many electronic databases are published in both formats. Databases are of great interest to genealogists because they are easily-used sources of information.
Most databases are compiled from other records, and in such situations, the database is considered a derivative, not an original, source. Thus, the information found in such databases must be used with caution. The data provided should be verified against the original records that provided the information because, when databases are created, errors are often introduced during the data entry process. Also, it is an unwise practice for genealogical conclusions to rest on any single reference, be it derivative (such as a database) or even an original. Good research requires the collection, comparison, and careful evaluation of all information available from multiple sources.
A growing number of databases contain new information, originally compiled on computers (such as the Social Security Death Index). Others are derived from older, original records, such as some census databases. See List of Specific Databases and Indexes for number of the major databases available for genealogical research.
Some manual databases have existed for several decades, since even before the term “database” became popular. Manual databases are often unique and accessible at only one location—usually at the institutions where they were created—although some have been published, usually on microfilm. The more recent development of automated (computerized) databases allows access from more than one location. Access options primarily include the Internet and CD-ROM.
While databases include actual genealogical data, indexes generally give very little genealogical information; rather, indexes are primarily finding aids—they refer the user to other sources of information about a subject. Most family historians seek references to their ancestors in indexes as well as in databases. Indexes are crucial to successful research because they free the researcher to navigate through the information in many sources much more efficiently. While many indexes are topical—that is, they indicate where particular topics are treated—the following discussion focuses on nominal (name) indexes.
Some genealogical indexes have broad application; others have very limited uses. Generally, indexes cover two different categories of records: compiled sources (which usually contain secondary information, such as family or local histories, genealogies written by others, journal articles, and family group sheets) and original sources (which generally provide primary information, such as military rolls, immigration lists, census records, and so on).
Indexes that list individuals may include either the given names of each subject (personal name indexes) or the last name only, with page references for each occurrence of that name (surname indexes). They can be comprehensive (indexing every occurrence of a name in the source) or selective (indexing only major occurrences of the name). In selective indexes, the name of the head of the family may be the only name indexed, although the whole family is described by name in the record entry itself. There may be locality, topical, or major-entry indexes as well.
Indexes compiled by government clerks for wills, deeds, and court cases are personal-name “subject” indexes; they refer only to the principals in each transaction. Witnesses, jurors, clerks, and others mentioned in the documents are rarely indexed. In government records particularly—though not exclusively—indexes may not be strictly alphabetical. For example, all of the A entries may be grouped together but may not be alphabetized—Abbott may come after Arnold. In some indexes, names are arranged by the first letter of the given name, the first three letters of the surname, or the first and third letters of either name; some are alphabetized by the given name irrespective of the first letter of the surname. Others, like the Soundex indexes, are arranged so that names pronounced alike are indexed together.
The original indexes found in most compiled histories generally include topical or surname entries only. Comprehensive, every-name indexes are sometimes compiled later for genealogical use. These supplements may be bound into the original record, written, typed, or printed in a separate volume, or added to the pages of a reprint edition. As you search a record, whether it is compiled or original, check carefully for multiple indexes. You may find them at the end, in the middle, or, conveniently, at the front of the record. Indexes may be indicated in the table of contents as well.
No index is perfectly accurate or complete. Whether prepared manually or by computer, indexes contain errors of omission, incompleteness, and typography. The key to using any index is to understand who created it and why. Successful researchers spend as much time getting to know the index as they do using the index itself. The preface or introduction to the index, “how-to” books and articles, other researchers, and experimenting with the index itself can reveal much about its usefulness.
An index is only as accurate as the source itself. If a family history has errors, those errors will be indexed. Misspelled words, garbled names, and incorrect page references will be indexed as well. It is not the indexer’s place to correct errors—even when they are obvious—although some add prefaces or footnotes to warn users. If a record is in a foreign language or has been damaged, names may be undecipherable or illegible. Even a skilled indexer, dealing with unfamiliar names, may misinterpret spellings, placing a name in an entirely different part of the index than it belongs. Cross references for spelling variants and for multiple entries may be omitted due to space, time, or financial considerations.
Indexers select entries according to their own criteria. The best ones describe their selection processes for the reader’s benefit. For example, Schneider and Snyder may be indexed together or separately in a surname index. If the index is topical, who chose the topics? Are public officials indexed together, individually by name, or by separate government agencies? Entries in a family history index may be divided into descendants, spouses who married into the family, ancestors of the central couple, and places where the family lived—each in a separate index. Check them all. Women may have been omitted from an index. If you’re looking for Mary Loomis and the index lists only John, Joseph, Michael, and Stephen Loomis, check those entries; the indexer may have included only Mary’s brothers and father. Children and grandparents may have been treated similarly.
Any name can be spelled some other way. The Cole family of New York sometimes appears as Kool due to Dutch influence or Kohl due to German influence, yet many families with this name stem from the New England Coles. In strictly alphabetical indexes, such spelling variants must be checked to get all of the data. Be especially watchful for variations with a vowel as the initial letter. Even simple names, such as Ott, can appear as Ot, Otte, Utt, or Autt. Thompson is often spelled without the letter p, giving it a different Soundex code in the census and other government indexes.
Both given names and family names may have been translated from other forms. Jacob is the Latin and German form of James. The Slavic Vojtech becomes Adelbert or Albert in English. Polly and Mary are interchangeable, as are Sarah and Sally. The Huguenot Le Counte becomes the Dutch de Graff; and de la Maiste becomes Delamater. Some Germans translated their surnames into English: Zimmerman becomes Carpenter and Schwartz becomes Black. Be wary when you are dealing with the first and second American generations.
When searching indexes, look for less-common names first. For example, for a Mary Loomis-John Smith marriage, check Loomis first because it is less common. If searching a Loomis family history, however, reverse the process: check for John Smith married to Mary Loomis. This method is faster and usually more effective.
The growing popularity of genealogy and the increasing number of genealogical publications means that more and more indexes are being created and published each year. Additional indexes are mentioned throughout this book. The purpose of all of these indexes generally remains the same: to help researchers find individuals faster and to locate important information more easily.
An excellent article on the use of printed indexes is Donald Lines Jacobus’s “Tricks in Using Indexed Genealogical Books.” In it, Jacobus covers some of these rules in greater detail. Keep in mind that indexes are tools—not sources. For more information on the nature of genealogical indexes, and how to use them more effectively, see Kip Sperry’s “Published Indexes,” chapter 6 in Printed Sources.
Differences Between Databases and Indexes
Databases and indexes are often confused. In fact, many databases are referred to as indexes, even though they include more information than is traditionally associated with indexes. How do databases differ from indexes? A database is more than an index if it includes significant information about its subjects. Typically, an index includes only enough information to 1) identify a subject and 2) reference another source where the researcher can get further information on the subject. While databases usually refer to source information, they may also include some, if not all, of the known information about their subjects. The distinction may seem minor, but it is important to understand from a research perspective. A database may contain sufficient information for a researcher’s needs, while an index usually only points to the information—the researcher must still retrieve it from some other source. Access to that source may not be easy and entails another step in the research process. Therefore, databases are usually preferred by researchers.