|Many people seem excited about the innovative
services made possible by the move from circuit to VoIP. But when you ask
for specific examples, they go quiet. Thatï¿½s because services you can
deploy today on VoIP, you can also deploy on circuit. Even so, popular
wisdom is not entirely wrong. After growing untidily for more than a hundred
years, circuit networks are a tangle of protocols, standards, and
proprietary implementations presided over by incumbents with little
motivation to change. The move to packets for transport flattens out
everything, lowers technical barriers to entry, and promises new services.
Whatever new services evolve as voice moves from circuit to packet, it is
a safe bet that many of them will depend on speech technologies. Automatic
speech recognition (ASR) and text-to-speech (TTS) have slowly and steadily
improved so that they are now usable by ï¿½normalï¿½ people (translation: by
me). Of course, to be impressive, ASR-based systems still have to be
carefully designed, then laboriously tested and tweaked. But the best are
really good. For example, the interactive voice response (IVR) application
that asks for my American Airlines frequent flier number is more accurate
than a human ever was. (And it understands a New York accent.) Even better,
the number automatically goes to the agent who handles the call, so the
agent doesnï¿½t have to ask for it again. That we find this impressive shows
how pathetically low our expectations of IVR really are. How could any
competent system designer automatically capture an account number, then
routinely require the caller to speak it again to the agent ï¿½ as they do
at numerous credit card companies?
While this magazine is typically about telephony, I know you wonï¿½t be
surprised to hear that voice isnï¿½t the only kind of data that can be
transported by IP. The Internet is transforming the way companies do
business. E-Commerce systems are based on a constellation of technologies
including application server platforms like Java 2 Enterprise Edition (J2EE
), Web servers, and markup languages like Hyper Text Markup Language (HTML)
and eXtensible Markup Language (XML).
Thereï¿½s an old saying that when all you have is a hammer, everything
looks like a nail. Similarly, when the e-Commerce contingent started
planning to voice-enable their systems, they thought in terms of markup
languages for implementation. There have been several such initiatives, but
the first to catch on was Voice Extensible Markup Language (VoiceXML). Then,
just a few months ago, a new contender emerged: Speech Application Language
Considering that they use similar technology to achieve similar ends,
VoiceXML and SALT are surprisingly different in their philosophy, their
application, and their intellectual property environment.
Both are markup languages. SALT consists of extensions to HTML, while
VoiceXML is a complete markup language in its own right.
Both are designed to speech-enable Web applications, starting with the
notion of a Web browser. A regular Web browser uses a screen to render
output and a keyboard and mouse to collect input. Exactly the same basic
model could use TTS to speak the Web page through a loudspeaker or telephone
instead of rendering it to a screen. Similarly, input could be collected by
ASR or touchtone instead of a keyboard and mouse. This is the basic idea of
VoiceXML and half the SALT story.
VoiceXML, in existence longer than SALT, is on version 2.0. It has
roughly 80 tags compared to SALTï¿½s seven. This radical size difference
comes from a fundamental difference in philosophy. While SALT is an add-on
to HTML, VoiceXML is a complete markup language in its own right.
procedural flow of control. VoiceXML defines tags like ï¿½ifï¿½ and ï¿½goto.ï¿½
Purists see this as a weakness of VoiceXML, since a page markup language
theoretically has no business performing the role of a procedural scripting
Beyond brevity and elegance, SALT reaps another major benefit from
tagging on to HTML rather than standing alone: It can add speech to regular
HTML Web pages. This feature, termed ï¿½multi-modality,ï¿½ is not currently
available in VoiceXML. This means that SALT is applicable where VoiceXML
isnï¿½t. For example, imagine using a personal digital assistant (PDA) to
book a flight on a Web site like Travelocity. First, you tap the stylus on
the ï¿½departing fromï¿½ field, then you use a pop-up keyboard or graffiti
to scratch in a city name. But stylus input is a drag. It would be much
easier to simply speak ï¿½San Franciscoï¿½ into the PDA microphone and see
ï¿½SFOï¿½ magically appear on the screen. This is what SALT does.
The SALT forum was founded by Microsoft, Intel, Cisco, Comverse, Philips,
and SpeechWorks. Because Microsoft is an active participant, itï¿½s not hard
to envision SALT being integrated into Visual Studio, FrontPage, Internet
Information Server, and Internet Explorer. If this happens, adding speech
support to a Web page will become a simple matter of setting an attribute on
a field in FrontPage. Thatï¿½s exciting.
So does SALT supercede VoiceXML? No. VoiceXML is incumbent, promulgated
by the W3C, and supported by dozens of companies including Intel, IBM,
Motorola, SpeechWorks, and Nuance. It is widely deployed and familiar to
thousands of programmers. There are dozens of implementations and a sizable
body of plug-in modules (ï¿½speech objectsï¿½ or ï¿½dialog modulesï¿½) that
acquire common data types like driverï¿½s license numbers, phone numbers,
social security numbers, account numbers, and so on. More importantly, the
development environment is very approachable. For example, Tellme offers a
simple over-the-Web way for anybody to create a sample VoiceXML application
and deploy a demo right on Tellmeï¿½s servers (http://studio.tellme.com).
So we should not consider an alternative to VoiceXML as the beginning of
a new standards war, but as evidence of the coming together of speech,
telephony, and the Web ï¿½ a fertile field ready to yield an abundant
Jim Machi is director, product management for the Network Processing
Division of the Intel Communications Group. Intel, the worldï¿½s largest
chip maker, is also a leading manufacturer of computer, networking, and
communications products. For more information, visit www.intel.com.
To The July 2002 Table Of Contents ]