You are here: Learn > The Library > Magazines > Genealogical Computing

Genealogical Computing
1/1/2001 - Archive

Winter 2001 Vol. 20.3

Experiments with Voice Recognition Software

I have never written a column the way I am writing this one. The words you are reading are words I am speaking to my computer. My hands are not touching the keyboard. Is this the wave of the future? To answer this question, we’ll need to take a look at some history: the history of voice recognition.

When we look at science fiction stories from the past fifty years, we see that the future does not include people typing on computer keyboards or, for that matter, using a computer mouse. Instead, what we see is people speaking to their computers and the computers speaking back. As it turns out, having a computer convert text into spoken language is not terribly difficult. Research in this area has made it possible for blind computer users to have documents read out loud to them. In the few cases where the computer mispronounces a word, the listener can usually figure out from the context what was intended. The process by which computers convert text into speech is known as voice synthesis.

When I was an undergraduate student in the late 1970s, researchers were working on a far more difficult problem: converting human speech into text. This process, known as voice recognition, was a very different kind of problem than the process of voice synthesis. For a computer to convert human speech into text, it would have to distinguish between speech and noise. It would have to understand both low-pitched and high-pitched voices. It would have to ignore regional and ethnic accents. It would have to understand the speaker, even when the speaker had a cold. Beyond all of that, it would have to take a stream of spoken language and figure out which text the speaker most likely meant. And it would have to work as quickly as the speaker was speaking.

Up until recently, most home computers did not have the power to quickly process accurate voice recognition. If you had purchased voice recognition software a few years ago, you would probably have been disappointed with the speed and quality of its performance. As of the year 2000, things have changed.

The Major Contenders
Today, three product lines dominate the voice recognition market. They are IBM's ViaVoice, Dragon Naturally Speaking, and L&H Voice Xpress. Although there are three different product lines, there are only two companies, because Lernout & Hauspie purchased Dragon Systems in 2000. (At this point in the article, I'm going to switch back to using the keyboard. More about that later.) Within each of the three product lines are products priced at different levels that offer different sets of features. To help you understand this, think about word processing on your own Windows-based computer. If all you need to do is type some ordinary text, you can get by with Notepad. If you need to format that text (change the font size, for instance), you can move up to using WordPad. And if you need every possible bell and whistle, you can purchase and use something like Microsoft Word. In the same way, the various voice recognition software packages are available in at least three different levels.

Lernout & Hauspie announced on 14 August 2000 the release of version 5 of the Dragon NaturallySpeaking product line. All products in that line require a Windows 98 (or later) system running at 266 MHz (or faster), with at least 64 MB RAM and 150 MB of free hard-disk space. The three primary products in the line are Essentials ($59), Standard ($109), and Preferred ($199). As with its competitors, all of these Dragon products come with a headset microphone (which you’ll need to use instead of the microphone that came with your PC). All of the Dragon products allow you to create and send e-mail, and to have the e-mail you receive read aloud. You may add your own custom vocabulary to the vocabulary already built in to the products. The Essentials version adds the ability to use commands to activate features of Microsoft Word or Corel WordPerfect, and the Preferred version allows the use of special dictation devices (which would allow you to compose your e-mail or documents anywhere, and then play back the recording to the software for transcription).

Although Lernout & Hauspie has purchased Dragon Systems, it continues to offer version 5 of its own line of voice recognition software: L&H Voice Xpress. Its three primary products are known as Standard ($49), Advanced ($79), and Professional ($149). The differences between the three products are essentially the same as the differences in the three products of the Dragon Naturally Speaking line. System requirements are almost identical, too, although 250 MB of free hard-disk space is needed. (By the way, I have L&H Voice Xpress Advanced installed on my home system. I used it to compose the first few paragraphs of this article.)

On 29 August 2000, IBM announced the latest release of its own voice recognition software: ViaVoice for Windows, release 8.0. This product is available in four editions: Personal ($29), Standard ($59), Advanced ($99), and Pro ($199). The Personal and Standard editions have processor and memory requirements similar to their competitors, requiring 460 MB of free hard-disk space. Also like competitors, the more expensive editions provide additional features, such as the ability to work more closely with Microsoft Word.

If all you need is a program that allows you to compose the text of your e-mail or word-processing documents, the least expensive programs should be sufficient. If your needs are geared toward using your voice to control additional word-processing features or using a portable dictation recorder, you’ll need the more expensive programs. Be sure to visit the Web sites for the various programs so you can review the features in more detail. In some cases, you may be able to buy the product online and obtain a rebate.

But What Is It Really Like?
For me, the most impressive thing about voice recognition software is how accurate it can be. To obtain a high level of accuracy, the software must deal with a number of problems. First, it must seek to eliminate background noise. To do this, the software is usually packaged with a headset microphone, which keeps your hands free and the microphone positioned at a proper distance from your mouth. The software may require you to be silent for a few moments in order for it to ascertain how much background noise there is. Then it may require you to read some text (such as numbers). At this point, the software can determine whether or not there will be too much background noise for it to distinguish your voice.

Although the human ear and brain can immediately understand language spoken by people they have never heard before (even with fairly thick accents), voice recognition software must be trained to respond to a single voice. To do this, you may be asked to read a large amount of text from the screen. The L&H software I use gives you the choice of many different kinds of text to read, so at least you won’t be too bored! Once the software has figured out the way you pronounce words, you can then train it further by providing it with a number of documents you have already written. In this way, the software can figure out your writing style. Also, the software looks for vocabulary terms that are not part of its original vocabulary. For instance, my documents contain a large number of genealogy-specific terms. After the software examines the documents I have given it, it compiles a list of unfamiliar terms and asks me if I want to have those added to its vocabulary.

Later, it is possible to spend more time with the software to perform additional training. With some of the current packages, you can expect accuracy in the 98 percent range. I have found that I need to be careful not to slur some of my terms. Otherwise, the software may confuse similar sounding words such as and and in. However, you do not have to speak unnaturally slow, nor do you have to say each word by itself. You can speak entire sentences. The software usually takes a few seconds after each phrase to process it. (It is probably trying to figure out which phrase is most likely correct based upon your pronunciation and the grammar of the sentence.) If it makes a mistake, you can say something like "Undo that," and it will back up, allowing you to pronounce the text again.

Sooner or later, the phone will ring or you’ll need to yell at the cat, but you obviously don’t want to have your words appear as part of the document you’re working on. Fortunately, the software allows you to tell it to "Stop listening." At that point, you can say nearly anything you want, and the software will blissfully ignore you. When you’re ready to return to the task at hand, you can say something like "Listen to me," and it dutifully switches back into its dictation mode.

Additional features allow you to indicate that you want to capitalize words, and the software understands the words "comma," "period," and so forth. You may also switch the software into spelling mode (an essential tool for any genealogist dealing with lots of hard-to-spell proper names). The software even understands such commands as "new line" or "new paragraph." With enough practice, you’ll find yourself rolling merrily along without having to put your hands on the keyboard or mouse.

Although the software strikes me as especially useful for composing long documents in word-processing software, it can also be used to compose e-mail or to create the text notes within your genealogy database software.

Don't Throw Away the Keyboard Yet
Keeping in mind that this is still a relatively new technology, you must realize that it isn’t perfect. Even with training, the software makes mistakes, and you may find the need to return to the keyboard to fix those few words the software can’t seem to understand. Also, the software uses a lot of your computer’s processing power, which means you might expect occasional slow behavior or system freezes. It is a good idea to save the documents you are working on frequently.

If more than one person uses your computer and each plans to use the voice recognition software, each user will have to train it separately. We have not yet reached that science-fiction future in which computers automatically understand every new voice that speaks to them.

For the last paragraph of this article, I have decided to turn the software back on. I’m still amazed at the accuracy that has been reached by the software, and I now consider it a useful tool rather than a toy. I know I will need to spend more time training it, but I believe the time will be well spent.


Drew Smith, MLS, is an instructor at the University of South Florida in Tampa, where he teaches library/ Internet research skills and genealogical librarianship. He is the webmaster and listowner for Librarians Serving Genealogists. He is also a past leader of the Genealogy and Local History Interest Group of the Florida Library Association.


  Printer Friendly
 
E-mail to a friend

Search The Library