Who says voice portals are hot? Venture
capitalists, for one. Voice portals are toll-free numbers that provide
voice access to Internet information. Yes, superficially this sounds like
IVR warmed over, but it's a lot more, which is why it's so interesting to
VCs as well as to developers and service providers. Voice portals give an
early view of the ultimate user interface for the telephone network.
Unlike earlier touch-tone-based IVR technology, voice portals utilize the
latest speech processing technologies, both automatic speech recognition
(ASR) and text-to-speech (TTS), to provide a more natural user interface.
It's still not a human conversation. On the other hand, voice portals give
access to Internet information when you're without a computer or an
Internet connection, so a little awkwardness is acceptable. In fact, if
you are mobile, and need information or directions, it can be a lot less
awkward than other solutions. So where are voice portals headed, and what
do they mean for the next round of developments in mobile communications?
Well, let's start where it all began: speech.
SPEECH TECHNOLOGY
ASR technology has been around for decades, gradually getting better
as algorithms, and more significantly computer technology, have improved.
Early ASR had very limited vocabularies and accuracy. But even 20 years
ago it was useful for some command and control applications, especially if
the operator was working in a dark room (i.e., couldn't see a keyboard) or
if the operator's hands were busy.
For example, one early application was placing parcel post packages on
a conveyer belt while speaking their ZIP codes, so a downstream conveyer
system could route the packages to the correct truck. This early
application worked because there was only one speaker active at a time, so
the system could be trained for that voice alone. And they were speaking
words from a very limited vocabulary: the ten digits, with variants like
"zero" and "oh." And finally, operators' hands were
full, so they were motivated to work with a system even though it required
them to speak carefully and distinctly.
With advances in computer technology, speech recognition has come a
long way. Today's systems can be speaker-independent. And vocabularies of
hundreds or thousands of words are routinely available. In fact, dictation
systems are available that can handle vocabularies of several hundred
thousand words (if trained for a single user).
TTS technology has likewise made great strides. Early systems were
notorious for sounding like a drunken Swedish robot (at least to my
American-English-tuned ears). But with the advent of very large memories
and increasing computer power, TTS quality has improved to the point where
some versions sound almost human.
PERSONAL ASSISTANTS
The first commercial attempts at a new user interface for telecom
began four years ago. Leveraging advances in speech technology, companies
such as Wildfire and Webley introduced over-the-telephone automated
personal assistants that act as a front end to unified messaging services
and provide a personal phone book/auto-dialer, personal calendar, etc.
Even though these services are convenient for some users, they haven't
taken off like wildfire.
Integration with corporate voice mail systems is limited, and it turns
out the user interface is not quite convenient enough for everyone. You
have to say precisely what you want, using a limited vocabulary, and often
giving multiple commands. DTMF shortcuts are often faster. In addition,
because your personal assistant attempts to capture information about
incoming calls, your callers end up in a more protracted dialog than they
would have if they simply had reached an answering machine or a
conventional voice mail system. Clearly, there are interface issues yet to
be resolved.
However, these personal telecommunications assistant applications have
been in use successfully for several years now. The companies providing
them have not gone under; they are gaining ground. But their services
haven't become ubiquitous either.
THE NEXT STEP
The next dramatic step in voice user interface development has emerged
in recent months: voice portals that provide telephone access to Internet
data. Companies such as Tellme, BeVocal, Quack.com, and Audiopoint are
providing access to Web information -- stock quotes, sports results,
weather, directions, nearby restaurants, flight information -- without the
benefit of a traditional browser. Typically, access is via an 800 number
and spoken commands.
Besides access to Web information, typical portals offer voice-commerce
transactions, service personalization, location-specific targeted
advertising, secure voiceprint verification, and general information
delivery via voice, WAP, fax, e-mail, and text paging. The big payoff is
in widespread consumer adoption, but many companies are also targeting
third-party developers and corporate voice portals that handle internal
employee communications or external customer service. Think of this as
next-generation IVR. Investment money is pouring into such companies, each
with a slightly different spin on the ultimate voice portal.
While it's too early to pick the winners, the winning criteria are
clear: usability, usability, usability. The challenge is to design the
dialog -- the user interface -- so it's easy for the casual user, while
allowing experts to immediately access the information they want. The
winning portal will start with an excellent user interface, provide ways
to personalize it, and then continue to improve it day-by-day,
week-by-week. The best speech recognition and text-to-speech technology
and the fastest response and highest availability will help. But these are
just contributors to usability.
TECHNOLOGY
System availability will need to mimic that of the current telephone
system, i.e., it always works. Service platforms will need to scale with
T1, T3, or higher access; SS7 signaling; high performance echo
cancellation; and simultaneous input and output of speech (to support
customer barge-in while messages are playing). Very high-volume systems
are required to join together with streaming media converters to allow
callers to listen to live sports, Internet radio stations, or pre-recorded
Internet content.
All of this technology is available today on open telecommunications
platforms. Early systems are using PCI computers, but a robust deployment
configuration will use a CompactPCI chassis loaded with hot-swappable
trunk interfaces as the telecom front end, connected by redundant Ethernet
networks to a conventional distributed computer system that hosts the
speech recognizers, databases, and Internet access engine.
Customization will be critical to allow repeat users to streamline
their use of the system. Access to customization can be over the
telephone, via WAP, or more easily, through a normal Web interface from a
large-screen home or office PC.
THE ROLE OF ADVERTISING
As with conventional portals like Yahoo, "free" service
actually means it is advertiser-supported. Striking a delicate balance
between pleasant service and revenue generation is critical. An example is
free phone calls to listed advertisers. If the user identifies a
restaurant of interest, the voice portal will directly connect him/her to
the restaurant by placing an outgoing call and bridging the connection to
the new call. The restaurant pays for the referral.
VOICE PORTAL ADOPTION
There will be many places for differentiation in the voice portal
market, and the vendor who makes it as easy as possible is going to win.
So, if personal communications assistants have been adopted slowly, why
should voice portals take off rapidly? There are several reasons. The
speech technologies have gotten better over time. Internet adoption has
been phenomenal. And mobile telephone use has soared.
THE EVOLUTION OF VOICE PORTALS
With widespread Internet adoption and the more recent emergence of
always-on broadband connections, people are beginning to rely on easy
access to Web information. Separately, a large segment of the population
has become accustomed to keeping in constant contact via mobile phones,
pagers, and other forms of wireless communication. Taken together, these
phenomena have prompted the development of WAP. But WAP-enabled handsets
are limited by their small screens and relatively narrow bandwidth
connections. Third generation (3G) wireless promises to improve bandwidth,
but it will take years to become widely deployed.
The resulting interesting opportunity is to combine voice portal
technology with WAP-enabled handsets in a multimodal solution. It's been
well established that combining voice and pointing improves user
performance at computer tasks. Likewise visual output combined with speech
and other sound is much more compelling than either alone. Studies have
documented drops in input error rate of as much as 50 percent when using
multimodal input.
But more compelling is an across-the-board user preference for
multimodal systems. A good combination of voice portal and WAP
technologies will improve service for mobile users. Multimedia is coming
to your phone -- even if it only has a 9.6 Kbps link and a tiny display.
To move beyond mobile users and begin to provide multimodal access for
desktop users may take another round of improvements in speech
technologies, but this is coming. Academic research is focused on computer
understanding, if not of all natural language, at least of certain topics
of discussion. One remarkable demonstration of this technology involves a
computer listening to broadcast news and indexing the topics for later
retrieval. As this level of speech understanding is combined with speech
recognition, recognition performance will continue to improve and dialogs
will further simplify. We are a long way from Star Trek-style natural
conversations with our computers, but we're not too far away from an
entirely new user interface for our telephone system.
Brough Turner is senior vice president of technology at Natural
MicroSystems, a leading provider of hardware and software technologies for
developers of high-value telecommunications solutions. For more
information, call Natural MicroSystems at 508-620-9300, or visit the
company's Web site at nmss.com. E-mail
to the author (addressed to [email protected])
is also welcome.
[ return
to the July 2000 table of contents ] |