You've probably been reading lately about voice
processing and future technologies in the IP
environment. One technology making especially strong
gains is speech recognition. But why is speech becoming
so important in IP telephony, and why now?
There are several key, interrelated factors driving
the growth of speech in IP. First is that enterprises
are seeing that deploying speech-enabled IP telephony
applications over virtual private networks (VPNs) can
reduce their costs and provide a better business model.
At the same time, demand is rising for enhanced
services. Mobile users expect the same universal access
other users have to those services, and speech plays a
key role in delivering it. Finally, standards (and
standards-based scripts like VoiceXML) are maturing and
becoming widely deployed. It all adds up to explosive
growth for speech technologies.
Back To Basics
At the heart of today's speech-IP deployments are basic
business principles: Cementing customer loyalty,
expanding services, and reducing costs.
One way many companies are looking to meet these
goals is by leveraging a service-provider-managed VPN to
reduce costs and expand the range of services they can
offer their customers.
In a traditional enterprise call center environment,
callers use a circuit connection through the PSTN,
interacting with a speech-enabled interactive voice
response (IVR) system. The company pays the 800 charges
and the telephony hardware resides at the company site,
where it is managed and maintained. Today many companies
are looking for less expensive alternatives to this
One alternative is outsourcing speech applications.
Many hosting or application network-based companies
offer their enterprise customers services that let an
end user simply dial a local 800 number, which is
automatically translated to the local POP server of the
network service provider -- incurring no long-distance
charges. The caller's DNIS information is used to
determine the client; speech engines fetch the desired
information. All the telephony hardware, speech
interpreters, and engines are at the edge of the network
-- owned and managed by the network service provider,
which can lower costs for its customers by using the
same infrastructure to support many clients. For service
providers, speech hosting is another service to add to
their hosting portfolios. The services these companies
provide empower enterprises to reduce their PSTN
hardware requirements -- lowering costs and driving
Another key reason for the speech explosion is the
explosion in mobile devices. Mobile users expect the
same enhanced services and easy information access their
office-bound colleagues receive over the Web. The IP
environment allows companies to deliver information to
their customers and prospects based on their location
and inference. Speech-enabled applications give mobile
users easy voice access to any kind of IP-based
Today's mobile devices range from pure personal
assistants to traditional cell phones. Connectivity
ranges from pure telephony to pure wireless. Speech
recognition is a key input method for all these devices.
Already speech is playing a key role in services like
voice portals, unified messaging, and network-based
personal assistants. Other emerging applications will
also find speech the best way to present information.
For instance, one fast-growing class of applications
pushes information to mobile users using location-based
services. For example, if you travel to a certain city
three or four times a year, you might want to receive
local information updates over your mobile device. The
easiest way to interact with the application is with
simple spoken commands.
Growth Through VoiceXML
Standards play a key role in the growth of any
technology, and for voice-based services the most
exciting new standard is Voice eXtensible Markup
Language (VoiceXML). The explosive growth of the
Internet was triggered by the acceptance of the HTML
scripting standard, which allowed everyone worldwide to
access a common Web structure. Today, the VoiceXML
standard is poised to drive explosive growth for
speech-based services by making them faster and easier
to develop, deploy, and configure.
VoiceXML was designed for building audio dialogues
that feature synthesized speech output (text-to-speech
or TTS), digitized audio output, the recognition of
spoken and keyed (touchtone) input, the recording of
spoken input, and the ability to give an application
telephony features like call transfer and disconnect.
The most important design goal of VoiceXML is to bring
the advantages of Web-based development and content
delivery to IVR applications.
One exciting use of this new language is in voice
portal services, which let callers use spoken commands
to access and retrieve Web content like weather and
traffic information, stock quotes, and sports scores.
The kind of information voice portals offer is limited
only by the provider's imagination and the interests of
VoiceXML is also used to provide access to virtual
personal assistants and Web-based unified messaging
applications. Callers can hear their voice mail and even
have their e-mails and faxes read to them over the phone
-- all without keying in a single letter or number
except the basic access phone number.
The bottom line? The transition to a speech-enabled
IP environment is on. And there is no doubt that speech
will play a key role in the next-generation network.
Jim Machi is Director, Product Management for the Intel
Telecommunication and Embedded Group. The Intel
Telecommunication and Embedded Group develops advanced
communications technologies and products that merge data
and voice technologies into a single network.
To The September 2001 Table Of Contents ]