A few years ago I tried using voice recognition software, hoping it might simplify the task of transcribing wills, deeds, and other genealogical documents. Many of the words found in the old documents are not commonly used in todays speech and therefore I spent much time correcting and adding words manually. The computer was also sluggish and froze often. After many frustrating attempts to verbally transcribe with the computer, I finally abandoned the idea and reverted to the old read-and-type method. At the time, home computers simply did not have sufficient speed or memory to process the speech software in a timely manner.
Recently, I learned that the medical profession is successfully using voice recognition software to dictate patient medical reports, record diagnoses and laboratory results, and even write prescriptions. The computer can recognize the difficult medical procedures and terminology as well as the names of medication much easier than the pharmacist can decipher the doctors illegible handwriting. If the computer can successfully understand medical terminology, why cant it do the same for genealogical terms? Maybe a more powerful computer and a newer release of software would have better success than before. Perhaps I should give voice recognition another try.
Voice recognition systems have changed significantly in the past couple of years. Current technology permits dictation in a natural tone, in complete phrases and sentences as you would in normal conversation. Described as "continuous speech" systems, the new technology is easier to use and has better recognition. It is also more accurate and much faster, provided you have a computer with sufficient processing power.
The four major voice recognition programs currently on the market are IBM ViaVoice, Dragon NaturallySpeaking, Phillips FreeSpeech 2000, and L&H VoiceXpress. Last year, L&H purchased Dragon Naturally Speaking, but it is currently marketing both products.
Each of these products offer three or more versions, with prices ranging from $50 to $200. The higher-priced versions include text-to-voice capability, which enables the computer to proofread the dictated text back to you. They also permit you to control the computer and software using only voice commands.
Even the least expensive versions have simple internal word processors, much like WordPad or SpeakPad, which dont require a lot of memory. The more expensive versions include the capability to dictate directly into Word, WordPerfect, and many other applications. While these features may sound attractive, genealogists should be aware that they require significant computer resources when they are used. Unless your computer is very powerful, you may be disappointed with the end results.
Some of the less expensive versions require that all corrections be made to the dictated text before saving the transcribed document. Others allow you to save the dictated voice file with the document, allowing you to make corrections later. This creates a large temporary sound file, but after you make corrections, it can be saved without the sound. This feature allows much more flexibility to your schedule.
In all versions, the software package includes a special headset microphone, which should be used for maximum recognition. The microphone is specially designed to block out external noises and to provide the best voice pattern, thereby enhancing voice recognition. In addition, the headset maintains the microphone at a constant position from your mouth for maximum quality.
Before installing any voice recognition software, be sure to verify that your computer has the minimum specifications to handle the program. Most of the packages suggest a minimum of 266 MHz Pentium II (or equivalent) processor with 64 megabytes of RAM and Windows 98. Of course, a faster processor with more RAM will allow the software to recognize and process the speech more quickly and accurately. Most of these programs require about 300 megabytes of hard disk space. (ViaVoice Pro requires 510 Mb free space.)
Installing the software is straightforward and takes only about 15 minutes. After the program files are installed, you are prompted to calibrate the microphone and to create a voice model. This is easily accomplished by reading one of four short stories furnished in the program and viewed on the screen. As you read the lines, the script automatically fades and scrolls to indicate that the computer understands what you said. The software records how you pronounce certain sounds and phrases, so it can recognize and type your words later.
After you are finished, the software analyzes your speech pattern and stores it in memory. This takes an additional 10 to 15 minutes, depending on your computers processing speed.
The Voice Model
You can continue training the voice model by reading additional stories now or at any later time. Naturally, the more stories you read, the more the computer can refine your voice model and gain more accurate recognition of your dictation. Later, as you use the software for actual dictation and make corrections, you can add the corrected information to the voice model and thereby increase future recognition reliability. (Also, several people can use the same software for dictation, but each user must create his or her own individual voice pattern and remember to select it before dictating.)
After the voice model is completed, the program will analyze your previously typed documents. This is perhaps the easiest and surest opportunity to build a vocabulary and add any special words you will be using in your genealogical documents. Simply specify what files the computer should look at and it will go through them to find new words not already in its vocabulary. The new words are displayed in order by the number of times they appear in the document. This process allows you to decide whether or not to add each new word to the permanent vocabulary. While the computer looks for new words, it also studies the context in which those words are used, and stores that information within the voice model.
As an example, if you dictate a simple word like to, the software wont know whether you intend two, to, or too unless it understands the context in which the word is being used. This type of analysis is also performed as you dictate new text; the software learns your style of speaking and the context of documents that you often dictate. While the analysis is not foolproof, it certainly seems to get it right more often than wrong.
Vocabulary Manager
Voice recognition software uses vocabularies similar to the dictionaries with high-end word processors. The vocabulary verifies spoken words like spell check verifies the dictionary. If the software does not understand the pronunciation of a word, or does not find it in the vocabulary, it questions it and offers substitutions. Of course the more words in the vocabulary, the more likely that the software will select the correct one and speed up the recognition.
New words may be individually added to the vocabulary at any time. You can open the Vocabulary Manager to add words and record your pronunciation to the vocabulary. Once a new word is added and trained, the software rarely misses it during dictation. One of the keys to successful transcription of genealogical documents is the building of a large vocabulary of genealogical terms.
Desktop or Laptop
Desktop computers will usually perform better for voice recognition than a laptop having the same specification, for a variety of reasons. The sound input circuitry of the smaller computer is electronically "noisier" than the desktop and is physically closer to other noisy components such as the disk drive motor. The power saving capabilities and other smaller components on the laptop also contribute to some of the loss of input sound quality. The ViaVoice Pro comes with an optional USB connection for the headset microphone that bypasses the internal sound card, significantly improving input sound quality.
Some of the newer desktop computers come with the sound card built into the motherboard. Those units are not as desirable for voice recognition as units that have a separate sound card. If you are having difficulty with voice recognition, and your computer is of that type, you might consider adding a separate sound card. The price of a sound card is usually based on the quality of sound output and not the input. In voice recognition, only the input is of importance, so any quality sound card would be satisfactory.
Tips to Success
1. Have plenty of computer powerminimum Pentium III (or equivalent), 128 Mb RAM, Windows 98.
2. Purchase the latest version of whichever software you choose. It does not have to be the most expensive version, but this technology is improving so quickly it changes significantly each year.
3. Spend plenty of time creating a good voice model; read all the stories they offer.
4. Let the computer analyze as many genealogical documents as you have available.
5. Continue to add words to build a large vocabulary of unique genealogical terminology.
6. Continue correcting mistakes during dictation so they are added to improve the voice model.
7. Dictate in a reasonably quiet, controlled sound environment.
Jim Slade is on the GENTECH Board of Directors. A retired civil engineer, he is active in genealogical research and lectures nationally on subjects relating to the use of computers for genealogy research. He is the former chairman of the NGS Computer Interest Group and has led the Genealogy Group of the Oklahoma City Computer Users since 1993. He can be reached by e-mail at sladej@swbell.net.
Return to the Genealogical Computing Summer 2001 Table of Contents.