TMCnet - World's Largest Communications and Technology Community




On The Horizon
July 2000


Brough Turner Voice Portals: Building A Network You Can Talk To


Who says voice portals are hot? Venture capitalists, for one. Voice portals are toll-free numbers that provide voice access to Internet information. Yes, superficially this sounds like IVR warmed over, but it's a lot more, which is why it's so interesting to VCs as well as to developers and service providers. Voice portals give an early view of the ultimate user interface for the telephone network. Unlike earlier touch-tone-based IVR technology, voice portals utilize the latest speech processing technologies, both automatic speech recognition (ASR) and text-to-speech (TTS), to provide a more natural user interface. It's still not a human conversation. On the other hand, voice portals give access to Internet information when you're without a computer or an Internet connection, so a little awkwardness is acceptable. In fact, if you are mobile, and need information or directions, it can be a lot less awkward than other solutions. So where are voice portals headed, and what do they mean for the next round of developments in mobile communications? Well, let's start where it all began: speech.

ASR technology has been around for decades, gradually getting better as algorithms, and more significantly computer technology, have improved. Early ASR had very limited vocabularies and accuracy. But even 20 years ago it was useful for some command and control applications, especially if the operator was working in a dark room (i.e., couldn't see a keyboard) or if the operator's hands were busy.

For example, one early application was placing parcel post packages on a conveyer belt while speaking their ZIP codes, so a downstream conveyer system could route the packages to the correct truck. This early application worked because there was only one speaker active at a time, so the system could be trained for that voice alone. And they were speaking words from a very limited vocabulary: the ten digits, with variants like "zero" and "oh." And finally, operators' hands were full, so they were motivated to work with a system even though it required them to speak carefully and distinctly.

With advances in computer technology, speech recognition has come a long way. Today's systems can be speaker-independent. And vocabularies of hundreds or thousands of words are routinely available. In fact, dictation systems are available that can handle vocabularies of several hundred thousand words (if trained for a single user).

TTS technology has likewise made great strides. Early systems were notorious for sounding like a drunken Swedish robot (at least to my American-English-tuned ears). But with the advent of very large memories and increasing computer power, TTS quality has improved to the point where some versions sound almost human.

The first commercial attempts at a new user interface for telecom began four years ago. Leveraging advances in speech technology, companies such as Wildfire and Webley introduced over-the-telephone automated personal assistants that act as a front end to unified messaging services and provide a personal phone book/auto-dialer, personal calendar, etc. Even though these services are convenient for some users, they haven't taken off like wildfire.

Integration with corporate voice mail systems is limited, and it turns out the user interface is not quite convenient enough for everyone. You have to say precisely what you want, using a limited vocabulary, and often giving multiple commands. DTMF shortcuts are often faster. In addition, because your personal assistant attempts to capture information about incoming calls, your callers end up in a more protracted dialog than they would have if they simply had reached an answering machine or a conventional voice mail system. Clearly, there are interface issues yet to be resolved.

However, these personal telecommunications assistant applications have been in use successfully for several years now. The companies providing them have not gone under; they are gaining ground. But their services haven't become ubiquitous either.

The next dramatic step in voice user interface development has emerged in recent months: voice portals that provide telephone access to Internet data. Companies such as Tellme, BeVocal, Quack.com, and Audiopoint are providing access to Web information -- stock quotes, sports results, weather, directions, nearby restaurants, flight information -- without the benefit of a traditional browser. Typically, access is via an 800 number and spoken commands.

Besides access to Web information, typical portals offer voice-commerce transactions, service personalization, location-specific targeted advertising, secure voiceprint verification, and general information delivery via voice, WAP, fax, e-mail, and text paging. The big payoff is in widespread consumer adoption, but many companies are also targeting third-party developers and corporate voice portals that handle internal employee communications or external customer service. Think of this as next-generation IVR. Investment money is pouring into such companies, each with a slightly different spin on the ultimate voice portal.

While it's too early to pick the winners, the winning criteria are clear: usability, usability, usability. The challenge is to design the dialog -- the user interface -- so it's easy for the casual user, while allowing experts to immediately access the information they want. The winning portal will start with an excellent user interface, provide ways to personalize it, and then continue to improve it day-by-day, week-by-week. The best speech recognition and text-to-speech technology and the fastest response and highest availability will help. But these are just contributors to usability.

System availability will need to mimic that of the current telephone system, i.e., it always works. Service platforms will need to scale with T1, T3, or higher access; SS7 signaling; high performance echo cancellation; and simultaneous input and output of speech (to support customer barge-in while messages are playing). Very high-volume systems are required to join together with streaming media converters to allow callers to listen to live sports, Internet radio stations, or pre-recorded Internet content.

All of this technology is available today on open telecommunications platforms. Early systems are using PCI computers, but a robust deployment configuration will use a CompactPCI chassis loaded with hot-swappable trunk interfaces as the telecom front end, connected by redundant Ethernet networks to a conventional distributed computer system that hosts the speech recognizers, databases, and Internet access engine.

Customization will be critical to allow repeat users to streamline their use of the system. Access to customization can be over the telephone, via WAP, or more easily, through a normal Web interface from a large-screen home or office PC.

As with conventional portals like Yahoo, "free" service actually means it is advertiser-supported. Striking a delicate balance between pleasant service and revenue generation is critical. An example is free phone calls to listed advertisers. If the user identifies a restaurant of interest, the voice portal will directly connect him/her to the restaurant by placing an outgoing call and bridging the connection to the new call. The restaurant pays for the referral.

There will be many places for differentiation in the voice portal market, and the vendor who makes it as easy as possible is going to win. So, if personal communications assistants have been adopted slowly, why should voice portals take off rapidly? There are several reasons. The speech technologies have gotten better over time. Internet adoption has been phenomenal. And mobile telephone use has soared.

With widespread Internet adoption and the more recent emergence of always-on broadband connections, people are beginning to rely on easy access to Web information. Separately, a large segment of the population has become accustomed to keeping in constant contact via mobile phones, pagers, and other forms of wireless communication. Taken together, these phenomena have prompted the development of WAP. But WAP-enabled handsets are limited by their small screens and relatively narrow bandwidth connections. Third generation (3G) wireless promises to improve bandwidth, but it will take years to become widely deployed.

The resulting interesting opportunity is to combine voice portal technology with WAP-enabled handsets in a multimodal solution. It's been well established that combining voice and pointing improves user performance at computer tasks. Likewise visual output combined with speech and other sound is much more compelling than either alone. Studies have documented drops in input error rate of as much as 50 percent when using multimodal input.

But more compelling is an across-the-board user preference for multimodal systems. A good combination of voice portal and WAP technologies will improve service for mobile users. Multimedia is coming to your phone -- even if it only has a 9.6 Kbps link and a tiny display.

To move beyond mobile users and begin to provide multimodal access for desktop users may take another round of improvements in speech technologies, but this is coming. Academic research is focused on computer understanding, if not of all natural language, at least of certain topics of discussion. One remarkable demonstration of this technology involves a computer listening to broadcast news and indexing the topics for later retrieval. As this level of speech understanding is combined with speech recognition, recognition performance will continue to improve and dialogs will further simplify. We are a long way from Star Trek-style natural conversations with our computers, but we're not too far away from an entirely new user interface for our telephone system.

Brough Turner is senior vice president of technology at Natural MicroSystems, a leading provider of hardware and software technologies for developers of high-value telecommunications solutions. For more information, call Natural MicroSystems at 508-620-9300, or visit the company's Web site at nmss.com. E-mail to the author (addressed to brough_turner@nmss.com) is also welcome.

[ return to the July 2000 table of contents ]

Technology Marketing Corporation

2 Trap Falls Road Suite 106, Shelton, CT 06484 USA
Ph: +1-203-852-6800, 800-243-6002

General comments: tmc@tmcnet.com.
Comments about this site: webmaster@tmcnet.com.


© 2020 Technology Marketing Corporation. All rights reserved | Privacy Policy