Imagine you are a customer and you have just completed
a call with your insurance company. For the first time
you can recall, you hang up smiling. You have received
all the requested information about your policy
quickly, without spending an eternity on hold or
having to bear those long, annoying touch-tone menus --
and the call was completed without ever talking to a
live agent.
Does this sound like science fiction? Welcome to
the new millennium, where more of the classic
touch-tone IVR systems that have dominated the
enterprise for 30 or more years are being provided
with major face lifts. More to the point, perhaps it's
a new set of vocal cords.
Voice recognition technology has come of age and,
when integrated into business-critical systems such as
IVR and office automation systems, it can provide a
new level of service at a surprisingly reasonable
cost. Replacing functions such as the basic
hierarchical dual-tone multifrequency (DTMF) menus and
complex dialogs employing natural language
understanding, automatic speech recognition (ASR) is
finding its way into enterprises conducting e-business
and carriers deploying voice-activated dialing and
automated directory assistance.
Thanks to language modeling, sophisticated grammars
and accuracy tuning tools, speaker-independent ASR
engines can attain an accuracy rate of 97 percent or
better, rivaling that of a live agent. Combined with
natural language understanding, this allows callers to
navigate an application without having to follow a
strict menu structure common in a typical IVR system.
For instance, a caller who wants to transfer $100
between his or her bank accounts need not listen to a
series of prompts such as "for transfer, press one"
and "for checking account, press two."
All that is required is a verbal caller request
such as, "I'd like to transfer $100 from my savings
account to my checking account, please." The
application responds by prompting the caller to
articulate the account number and, upon validation,
handles the transaction appropriately.
ASR is only part of this remarkable achievement.
Text-to-speech (TTS), an application that uses basic
computer ASCII text and simulates speech, has produced
quality very close to the natural human voice. This
human-like automated response allows callers to listen
and understand with ease rather than struggle through
the tedious, monotone sound that has been the hallmark
of TTS for 35 or more years. TTS technology is capable
of simulating actual speech while maintaining the
appropriate prosody, speed, voice inflections and
other characteristics that are important to human
communication.
While businesses experiment with voice technology,
the industry itself has accelerated the development
efforts of linguists, dialog designers and speech
technology engineers to create a broader selection of
vastly improved products. To aid this development
effort, tools are available that range from grammar
and vocabulary to call flow and dialog design,
enabling the creation of extensive, complex and
accurate applications. Additionally, speaker
verification, which is a biometric technology,
provides unsurpassed security over the telephone when
coupled with traditional passwords or account numbers.
In this global business environment, supporting
multiple languages is vital. Currently, most of the
languages spoken in North and South America, Western
and Eastern Europe and Asia are supported through
speech recognition, while developing areas such as
India, the Middle East and Africa are either available
or in development.
VoiceXML: The Emerging Standard
Voice technology is bridging yet another gap to access
the enormous expanse of information contained on the
Web. VoiceXML, a scripting language born of the same
family as HTML, allows voice applications to be served
up to speech browsers in the same way that HTML pages
are served up to the traditional Web browsers. The
similarity of VoiceXML to HTML makes it easy for
developers to create voice applications that can
leverage the existing Web infrastructure and enable
companies to use existing investments with voice
access to information on the Web.
VoiceXML is a particularly compelling, emerging
technology for voice applications and it represents
the first potential standard for voice applications.
Originating with the VoiceXML Forum, a consortium of
companies that includes Motorola, IBM, Lucent and
AT&T, the VoiceXML standard is now the
responsibility of the Worldwide Web Consortium (W3C),
an organization with a long record of establishing
technological standards. The involvement of the W3C,
coupled with widespread developer acceptance of the
VoiceXML specification, will promote interoperability
among diverse voice applications and businesses around
the globe.
Connecting The Voice Application
Voice recognition over the telephone has created some
challenges for telephony equipment vendors. Often when
a caller's voice is transmitted over the traditional
public switched telephone networks (PSTN) to an ASR
engine, it can be garbled and difficult to understand.
Additionally, satellite repeaters and other telephone
equipment can introduce echo, static or noise, which
negatively affect the accuracy rate of the speech
recognition engine.
Voice over IP (VoIP) technology has become
prevalent in large enterprises and when introduced,
can cause packet loss, latency and jitter that affect
the voice sample. To counter these negative effects,
equipment vendors are designing telephone network
interfaces that provide superior echo cancellation,
noise filters, jitter buffers and caching to improve
voice quality and deliver excellent speech
recognition. The accuracy of the ASR, of course,
remains the major factor in user adoption of voice
applications.
The Session Initiation Protocol (SIP) standard is
also emerging throughout VoIP networks to handle call
control in a distributed network. Easier to use and
more flexible than other protocols, such as H.323 and
Megaco, SIP is becoming the preferred protocol for
voice application developers. SIP has won widespread
adop-tion by high-profile organizations such as
Microsoft, which has integrated the SIP standard into
its latest operating system.
Adoption Is The Key
Voice recognition over the telephone has reached a
critical mass, demonstrated by the response of
businesses to dramatically increased end user
adoption. For service providers, voice applications
provide a competitive differentiation that can drive
revenue. Additional benefits to wireless carriers
include promotion of the safe use of cell phones while
driving and the increase of usable "minutes" that the
carriers sell on their networks. Carriers recognize
that speech technology represents more than another
enhanced service. Speech recognition improves the
usability of existing services and allows for the
expansion of new, revenue-generating applications,
such as instant conferencing and instant messaging
services.
An increase in employee productivity, customer
satisfaction, sales automation and more efficient
service centers all contribute to the bottom line of
any enterprise. Currently, there are a wide range of
revenue-generating voice applications available that
include customer relationship management systems
(CRM), sales automation, e-business systems such as
stock trading and voice banking, and more
sophisticated interactive voice response (IVR)
systems.
IVR vendors are now scrambling to integrate speech
recognition into their products, propelled largely by
competitive pressures, although this differentiation
could be short-lived once it becomes commonplace in
the IVR. Many IVRs have reached a limit on
functionality constrained by DTMF. Fortunately, speech
can broaden the services that an IVR can provide.
Voice recognition and voice-enabled applications
have hit the mainstream, and we should expect to see
an explosive growth of these types of services now and
in the near future.
Steve Parsons is director of product management
for the New Network Services division of NMS
Communications (formerly Natural MicroSystems). In
this position he is responsible for product marketing
and management of HearSay, the company's high-density
voice portal platform, integrating NMS telephony
hardware with best-of-breed speech products.
[ Return
To The October 2001 Table Of Contents ]
|