Back in the 1980s I saw the movie War Games. It starred Mathew Broderick
as a teenage computer geek who hacks his way into the U.S. missile defense
system and almost causes a war. What really intrigued me about the movie
were the talking computers. I vividly remember the talking computer
asking, "Would you like to play a game?" and wishing that I
could somehow make my TRS 80 from Radio Shack talk to me. I was sure I
would be talking to my computer within a few years. Of course, I was
wrong.
Also in the 80s, the television show Knight Rider (starring David
Hasselhoff) popularized the concept of the talking car. Again, I was sure
I would be talking to my car in the near future. Again I was wrong.
Prime-time TV viewers tired of the talking car, and David Hasselhoff went
on to bigger and better things. Such has been the history of speech
recognition -- just when it seems to be right around the corner, it never
really becomes mainstream. But now, all that is about to change.
FASTER! FASTER!
Gordon Moore, one of the founders of Intel, is most famous for his
theorem known as Moore's Law which states, "The speed of
microprocessors will double every 12 to 24 months." (This is usually
averaged into the more commonly heard 18 month figure.) His theorem was
correct and the past few years have proven to be unique in the PC market.
Until recently, most PC users would see a dramatic improvement in the
performance of desktop applications by purchasing a computer with a faster
processor. However, processors have become so fast in recent years that
the change in response time for most word processors, spreadsheets, and
Web browsers has become nominal from one generation of chip to the next.
With the possible exception of graphically intensive video games, the PC
market is losing its motivation for new upgrades.
This brings us to speech recognition, which is one of the most
processor-intensive applications known. For speech rec to work even
reasonably well, it takes a very fast processor. The problem in the past
was that even the fastest computers didn't do a very good job of dealing
with speech recognition. The technology was just ahead of its time, and
the few computers that did run a speech rec application were usually
prohibitively expensive. However, due to the ongoing advances in
microprocessors, the technology is finally reaching a point where speech
recognition will reach the mainstream.
NEW TARGETS, NEW APPS
As the need to upgrade computers every few years just to run basic
applications dwindles, microprocessor vendors are simultaneously
discovering the need to target new markets to sell their wares. As an
example, Intel has repeatedly positioned their Pentium III processor as an
ideal platform for speech rec and has built in special pneumonics to
better handle the technology. Always the supportive partner, Microsoft has
also stated that Windows 2000 also has special instructions to make speech
rec applications run more smoothly.
While this may seem interesting only to the techies, it is important to
remember that these industrial giants have the clout to create new,
previously non-existent markets for their products. If they decide speech
rec should become mainstream, it will become so. There are three basic
platforms for speech recognition: PC-based software, embedded
applications, and the telephone.
PC-BASED SOFTWARE
Most people are familiar with the PC products such as Dragon Systems'
NaturallySpeaking. This product allows a word processor to write using the
spoken word, rather than the keyboard. In the past, products like these
were tedious to use, had low reliability, and required over thirty minutes
of speaker-specific training. However, with the new technologies the
accuracy is extremely high and the training time is around five minutes.
Several of Dragon's products actually require a Pentium III due to the
special instructions.
EMBEDDED APPLICATIONS
The embedded applications market (composed of cell phone handsets,
auto computers and stereos, and home automation) was almost entirely
non-existent just a few years ago. Today it holds the potential to
generate billions in the near term.
Now that microprocessors are cheap enough to embed in a low-end
handset, there is almost no reason not to include speech rec in even the
most basic offering. The same holds true for car stereos, home
entertainment centers, and various other types of home automation. Very
soon consumers will be asking their car radio to switch the channel, and
instructing their ovens to "Cook the pot roast at 450 degrees."
TELEPHONE
Finally, there are the products that are most familiar to those in the
voice world -- applications that exist remotely and are accessed via the
telephone. Already familiar are stock quotes and dial-by-name directories
using speech rec. Unfortunately even these high-end products are still
sometimes unwieldy. The slightest mispronunciation, slip in grammar, or
pause would cause the system to falter.
Nevertheless, the next generation of speech rec applications hold
amazing potential. Customers will be able to find the appropriate sales or
service department automatically just by asking. Natural language
processors will interpret myriad voice commands, questions and complaints,
and send the caller to his or her destination quickly and easily.
GLIMPSING THE FUTURE
On the horizon are security products that analyze a voice print
instead of requesting an account number and access code. Help desks will
increasingly become automated as speech rec systems gain the intelligence
to answer common questions without any human intervention. The net result
is that call centers will run more efficiently and cheaply. Agents will
only be used when absolutely necessary, and then only for more advanced
sales or technical assistance. Many of the traditional "galley
slaves" of the call center stadium will need to increase their skill
set to remain employed.
Not surprisingly, all of these applications still sound like something
out of science fiction. These are not new ideas, though. They are old ones
simply waiting for the technology to catch up with the minds of speech rec
innovators. Some of these products are here now, and some are just around
the corner. But the time is right, Moore's Law has been proven correct,
and finally the technology is here for speech recognition.
Brian Strachman is industry analyst, Voice and Data Communications,
Cahners In-Stat Group. To correspond with the author, please send your
comments to [email protected].
[ return
to the July 2000 table of contents ] |