TMCnet - World's Largest Communications and Technology Community
New Coverage :  Asterisk  |  Call Recording  |  SIP Trunking  |  Fax Software  |  Load Balancer  |  PBX  |  CTIA  |  INTEROP  |  Small Cells
 
| More

Feature.GIF (10600 bytes)
February 2000

 

Carol Drzewianowski Paying Lip Service To Speech Rec

BY CAROL DRZEWIANOWSKI


How many of you remember the scene from the movie LA Story, in which Steve Martin hooks up his new telephone that features dial-by-voice? In case you've forgotten, or haven't seen the movie, Steve's character picks up the handset and says, "Call Mom." He hears the other phone ringing, but is surprised when doesn't reach mother, but he reaches a pizza place. Obviously speech rec applications have improved dramatically since this film was made. (Or maybe the mix-up was because Steve Martin did not speak clearly enough!) But where will this technology take us, in the next months and years of the new century?

I thought that there was no better way to glimpse the future than to pry inside the minds of just a few industry leaders in this area. In the past few years, we've all witnessed advancements in speech recognition technology. IVR has made it easier and faster for us to get information over the phone. Personal assistants "listen" to our commands and read our e-mail to us over the phone. And lately we've even seen voice-enabled Web browsing. But what other kinds of changes will speech recognition technology make in the way that people and machines communicate?

CH-CH-CH-CHANGES
According to Joe Yaworski, vice president and general manager of Unisys Natural Language Understand Business Initiative, speech will soon be everywhere -- from your PC, to your VCR, and even your toaster. "Eventually speech will be one of the modes of interfacing and controlling all different types of devices and services," he said. "Not necessarily the exclusive mode, but certainly one of the preferred modes. Speech will be used where it can simplify and improve the interface with a device, or where hands-free operation is needed, or where access to the device can be provided over the phone."

One of the problems that people have found with speech recognition products is that they want to be able to communicate with machines naturally and not have to be a computer expert. "Eventually it will become easier to pick up the phone or talk to your computer in a natural way to retrieve the information you want or to perform tasks," said Heather Howland, marketing director of Phonetic Systems.

"People want to be able to get information quickly and easily," she said. "Too often people sit on hold for long periods of time just to get a simple answer. Or they will navigate through mazes on a Web site or software application to perform a simple task. Within the next few years, speech recognition will be much more commonplace. While it is already starting to infiltrate the market, you will find that more and more applications, both on the e-commerce side and the telephony side that will be driven with speech recognition."

ADVANTAGES
There are plenty of advantages to using speech applications. Steve Gladstone, vice president and general manager of Hammer Technologies cites portability and accessibility as two main benefits. "When I think of my 78- year-old mother, she hates both computers and cell phones," he said. "Speech recognition will allow her to talk to all sorts of applications that were previously unavailable to her."

Steve Ehrlich, vice president of marketing at Nuance said, "For the consumers, it's better and more convenient service and access to information that might have once only been available over the Web. For enterprises, it comes down to the ability to provide better service, often at a fraction of the cost of live operators. For dot-com companies, speech offers a way to broaden the reach of their products, without building and staffing a large call center. In many cases, speech also provides a competitive advantage."

HURDLING THE OBSTACLES
Yet, despite the advantages, speech recognition is still in its early stages of development. Not all of the technology issues are being met. Bill Ledingham, vice president, product development for SpeechWorks International said, "The technology is here and has proven itself in a number of large-scale customer-facing deployments at mainstream companies over the past several years. What SpeechWorks is working on now is making it less expensive to deploy (through increased processor performance and advanced application tools), and adding capabilities to handle even more complex, natural language dialogs. What we are also doing now is marrying the Web and the telephone through products like SpeechSite. This proves that the technology is sophisticated enough to conduct major transactions and handle multiple requests from many different callers just as you find on mainstream e-commerce sites.

There is always room for improvement, and speech recognition technology still has a few hurdles to leap. According to Ehrlich, "From an end-user standpoint, there are a number of people who have never used a speech system and don't really trust that it works. As the number of deployed applications grows, this will cease to be a problem. From an enterprise standpoint, the biggest problem is the lack of skilled speech developers. Reusable components (such as Nuance SpeechObjects, which encapsulate the voice interface), the hardest part of the design and development process, should help to lower this obstacle."

Yaworski also cites developers as an obstacle, "If we have to rely only on the developers employed by the speech recognizer vendors, then the speech market will never grow. The Unisys NLSA toolkit is intended to make speech so easy that any reasonably good applications developer can now build a workable speech application."

Howland, however, sees that there are different bars to lower, including people's expectations. "Many expect the futuristic vision of being able to have free-form conversations with machines," she said. "But in reality, the technology just isn't there yet. Right now the applications are more structured than most people want them to be."

According to Howland, obstacles have been fostered and set by the industry, as well. "There is too much talk about the underlying technology, and not enough about using the technologies with applications to make them succesful. Unfortunately, there are many speech vendors in the industry that continue to dash the hopes of their customer base with unrealized expectations. This becomes an impediment to the growth of an industry. Vendors need to be more open with that they can actually deliver, and not what they want to be able to deliver. Customers need to be careful, and should do their homework before committing to a vendor. Asking the right questions can mean the difference between getting a product that solves a problem, versus a product that creates one."

WHERE DO WE GO FROM HERE?
It seems that the Internet is the hotbed from which some new speech rec seedlings will begin sprouting. Yaworski said, "The Internet is simply exploding, with the volume of business being conducted online growing by leaps and bounds. Yet, even here in the United States, two-thirds of the people do not have access to an Internet browser. Even those who do have a browser only have access when sitting at their desk. There are ten times more phones in the world than browsers. Using speech technology to voice-enable Web sites so that they can be accessed by telephone would immediately expand the market for Web-based commerce."

Ehrlich agrees. "The next few years will see the birth of the voice Web and that will fuel broad adoption of the technology by carriers and voice portals around the world," he said.

According to Ledingham, "We are seeing recent enormous demand from the dot-com companies and mainstream companies that recognize the need to expand their e-business channel strategies. Speech recognition gives these companies a way to reach customers who might not have access to the Internet, while at the same time, offering those customers the same self-service options available on the Web."

Moreover, Ledingham be-lieves that dot-com companies will also need to differentiate themselves in a very crowded market, and they will turn to speech systems to accomplish that goal.

CONCLUSION
Certainly, a lot is going on in the area of speech recognition. Other companies like Lernout & Hauspie, Parlance, Edify, Philips Speech Processing, Dragon Systems, and Vodavi-CT (to name but a few) are helping to advance speech technology even further.

And if the industry trend toward speech-enabling Web sites continues, it could possibly change not only the reach of the Web, but it could also change how the telephone will be used. The marriage of the two may seem unlikely, but it's happening, and as speech technology and the Web grow (and grow up) together, it's very likely that the two will reside happily.


Look Ma, No Hands!

BY MIA CARLEY

I suppose I'm a bit fidgety during the workday because I like to move around in my chair and stretch my arms and back on a regular basis. But, if the truth be told, it's not because I'm inattentive, it's because my arms and wrists get sore from typing all day. Yet, as a Web editor, how or where else can I use the Internet that wouldn't involve the same strain on my body?

With this in mind, I was intrigued when I noticed a unified messaging site announcing speech recognition software for their services. I noticed this the same day I read about a new audio service provider. Imagine being liberated from your keyboard or computer by the power of your voice?

Speech recognition is not a new technology. It's used widely in the enterprise environment, often as IVR (interactive voice response). So, whether your calling a company looking for a friend's extension, calling information, making your way through a customer service line, speech recognition is alive and well. Its application to the Internet, however, is what's bringing this technology to the forefront once again.

BROWSE BY VOICE
California-based CoolSpeak.com is just one company that is providing free, integrated, global communications services or Web-based unified messaging. Users can access their messages from their desktop, telephone, or browser. Their main features include real-time voice chat (PC-to-PC only), real-time text chat, e-mail, voice mail, e-mail fowarding, community bulletin board, instant notification, and multiple greetings. It is your standard Web-based unified messaging site.

However, it's their newest addition, CoolSpeak Interactive Voice Assistant (IVA) that sets them apart. CoolSpeak IVA is a speech recognition plug-in designed by Wizzard Software, specifically for Coolspeak.com subscribers. Thanks to a combination of IBM's Via Speech Recognition, Lernout & Hauspies text-to-speech engine, Microsoft Agent Avatars, and a portable headset from Andrea Electronics, IVA users can navigate through CoolSpeak's services simply by voicing simple commands like "Go to inbox" or "Read new messages." It is important to note that IVA's language requirement is English. They do plan on developing foreign language additions, but the date of their deployment is not known.

Unlike the rest of their services, CoolSpeak IVA is not free. It's currently available for an introductory price of $75.00 to any member of CoolSpeak. Purchase the software online from Wizzard at www.wizzardsoftware.com. Ah, the sweet sound of my voice and the silence of the keyboard!

AUDIO SERVICE PROVIDERS
For those looking for a broader level of service, InternetSpeech.com offers netECHO for full phone-based, Internet access. Subscribers will be able to navigate their way through the Internet by simply picking up the phone — wireless or not. Subscribers dial a toll free number and have immediate access to the Web. netECHO offers all the services you would expect from your traditional service provider like:

  • Surfing the entire Internet (sites do not have to be speech enabled).
  • E-mail accounts with the ability to send voice mail attachments in .WAV format.
  • Trading stocks or other commercial transactions.
  • Voice mail.

While some sites have voice enabled themselves using VXML (see Mark Robins' September 1999 Internet Telephony article), InternetSpeech.com understands this is a new technology and has yet to be applied universally. Instead, netECHO's technology lies on the InternetSpeech.com server, thus enabling their subscribers to surf the entire Internet.

netECHO is English only as well, but the company does have plans to develop foreign language versions. Their launch is due by the end of the first quarter of 2000.

I SAY, "SURF THE WEB!"
Someday, people (not only Web editors!) who experience the same eye, back, and wrist strain will be able to do all their Internet work and research with their own voice. Until then, keep an eye out for companies like CoolSpeak IVA and InternetSpeech.com who are leading the way in the application of speech recognition on the Web.

Opening up the Internet to phone-based users will expand its application to those who are vision impaired, without a computer, or limited to a mobile phone. On the other hand, computer-based users will be able to sit back, relax, and let their voice do the surfing. From either view, it's a win-win situation.


Getting Started With Speech Rec

In order to help you sift through the many companies offering different variations of speech technology, here's a small sampling of companies to get you started:

Dragon Systems: Dragon NaturallySpeaking, Dragon NaturallySpeaking Developer Suite
Dragon NaturallySpeaking transcribes speech immediately, appearing as text on the screen and in reports, letters, e-mail messages, chat rooms, and instant messaging windows. It allows you to format and edit documents by voice, navigate the Internet, and navigate and control the desktop by speaking drop-down menu commands. Dragon NaturallySpeaking Developer Suite provides a complete solution for creating high-performance applications benefiting from the advanced features of NaturallySpeaking technology. Included in the Developer Suite are the latest versions of the Dragon NaturallySpeaking SDK, the Dragon NaturalVoc Tool, and other components.

Edify: Electronic Workforce
The Electronic Workforce (EWF) is designed to quickly develop and deploy scalable Internet and voice e-commerce applications. EWF comprises a multi-channel server (which supports Web, IVR/speech recognition, and other channels) together with enterprise connectivity, development tools, and operational management tools. Edify supports both "small vocabulary" and "large vocabulary" speech recognition applications.

Lernout & Hauspie: L&H Voice Express
L&H Voice Xpress Advanced uses the voice technology available to turn your computer into a powerful voice-driven tool that can recognize not just what you want to type, but what you want to do. Various flavors of Voice Express (from Professional to Standard) are offered.

Nuance: Nuance 6, Nuance Express, Nuance Verifier, Nuance Developers Toolkit, SpeechObjects
Nuance 6 employs distributed client/server architecture to achieve scalability in large call centers for even complex systems. Java and ActiveX APIs enable developers to create applications using the powerful languages and tools with which they are already familiar. Nuance Express includes ready-to-use packaged grammars to shorten the development cycle. Nuance Verifier creates a voiceprint for users who enroll their voice in a simple one-time interaction. Nuance Developers Toolkit provides a solution for all phases of speech development: speech design, rapid prototyping, development, customization, and tuning. SpeechObjects is a set of reusable components that incorporate design and development standards. SpeechObjects does not require developers to have specific speech or linguistics experience, so it decreases the level of expertise needed for developing high-quality speech applications.

Philips Speech Processing: SpeechPearl, SpeechMania, SpeechWave, Voice ReQuest
This family of products consists of the core engine for host-based, large vocabulary natural language recognition and additional modules like the SpeechPath resource manager. SpeechManias mixed initiative dialog gives the caller the impression of a truly natural dialogue and makes SpeechMania act as a full replacement of the human operator. The SpeechWave product group is made to be embedded on DSP based telephony cards, and supports relatively small vocabularies in both whole word mode as well as phonetic modes. Pure ReQuest! automatically routes an organizations incoming or internal telephone calls by enabling callers to simply say the name of the person or department desired.

Phonetic Systems: PhoneticOperator
Delivering 24/7 speech-enabled auto- attendant, call routing, and information retrieval, PhoneticOperator gives corporations, call centers, and telcos accurate speech recognition that provides live operator-class performance.

SpeechWorks: SpeechSite, SpeechWorks 6
SpeechSite directs calls, delivers company information, provides fax back services, supports e-commerce transactions and more, helping to bring the Web model of self-service to your telephone. SpeechWorks 6 is a solution that makes it easy to design and deploy automated speech recognition applications. SpeechWorks 6 includes the SMARTRecognizer core recognition engine, DialogModule building blocks, and Natural Language and Tuning Tools for rapid application development.

Unisys: NLSA
NLSA (Natural Language Speech Assistant) equips developers with the tools for writing speech-enabled applications without learning the details of programming speech recognizers, easily migrates to different speech recognizers, and enhances current IVR applications or develops new more capable applications by capitalizing on the speech technology available today.

Vodavi-CT: PathFinder, Expresso!
PathFinder is a Windows NT-based communications server that takes advantage of the latest in voice processing technology. Custom IVR solutions are available for PathFinder to expand your companys horizons. Expresso! is a self-contained voice processing system that delivers large capability in a compact and economical package. From two to eight ports with a minimum of 35 hours of storage.


Upcoming Events

October 2- 5, 2012
The Austin Convention Center
Austin, Texas
October 3- 5, 2012
The Austin Convention Center
Austin, Texas
October 3- 5, 2012
The Austin Convention Center
Austin, Texas

DevCon5 provides you with the information and tools you need to exploit the capabilities of revolutionary HTML5 technology
View all >>

Subscribe FREE to all of TMC's monthly magazines. Click here now.