ITEXPO begins in:   New Coverage :  Asterisk  |  Fax Software  |  SIP Phones  |  Small Cells

Industry Insight
July 2002

Jim Machi

Season with SALT, Sprinkle with VoiceXML


Many people seem excited about the innovative services made possible by the move from circuit to VoIP. But when you ask for specific examples, they go quiet. That�s because services you can deploy today on VoIP, you can also deploy on circuit. Even so, popular wisdom is not entirely wrong. After growing untidily for more than a hundred years, circuit networks are a tangle of protocols, standards, and proprietary implementations presided over by incumbents with little motivation to change. The move to packets for transport flattens out everything, lowers technical barriers to entry, and promises new services.

Whatever new services evolve as voice moves from circuit to packet, it is a safe bet that many of them will depend on speech technologies. Automatic speech recognition (ASR) and text-to-speech (TTS) have slowly and steadily improved so that they are now usable by �normal� people (translation: by me). Of course, to be impressive, ASR-based systems still have to be carefully designed, then laboriously tested and tweaked. But the best are really good. For example, the interactive voice response (IVR) application that asks for my American Airlines frequent flier number is more accurate than a human ever was. (And it understands a New York accent.) Even better, the number automatically goes to the agent who handles the call, so the agent doesn�t have to ask for it again. That we find this impressive shows how pathetically low our expectations of IVR really are. How could any competent system designer automatically capture an account number, then routinely require the caller to speak it again to the agent � as they do at numerous credit card companies?

While this magazine is typically about telephony, I know you won�t be surprised to hear that voice isn�t the only kind of data that can be transported by IP. The Internet is transforming the way companies do business. E-Commerce systems are based on a constellation of technologies including application server platforms like Java 2 Enterprise Edition (J2EE ), Web servers, and markup languages like Hyper Text Markup Language (HTML) and eXtensible Markup Language (XML).

There�s an old saying that when all you have is a hammer, everything looks like a nail. Similarly, when the e-Commerce contingent started planning to voice-enable their systems, they thought in terms of markup languages for implementation. There have been several such initiatives, but the first to catch on was Voice Extensible Markup Language (VoiceXML). Then, just a few months ago, a new contender emerged: Speech Application Language Tags (SALT).

Considering that they use similar technology to achieve similar ends, VoiceXML and SALT are surprisingly different in their philosophy, their application, and their intellectual property environment.

Both are markup languages. SALT consists of extensions to HTML, while VoiceXML is a complete markup language in its own right.

Both are designed to speech-enable Web applications, starting with the notion of a Web browser. A regular Web browser uses a screen to render output and a keyboard and mouse to collect input. Exactly the same basic model could use TTS to speak the Web page through a loudspeaker or telephone instead of rendering it to a screen. Similarly, input could be collected by ASR or touchtone instead of a keyboard and mouse. This is the basic idea of VoiceXML and half the SALT story.

VoiceXML, in existence longer than SALT, is on version 2.0. It has roughly 80 tags compared to SALT�s seven. This radical size difference comes from a fundamental difference in philosophy. While SALT is an add-on to HTML, VoiceXML is a complete markup language in its own right.

Like HTML, SALT uses ECMAScript (also known as JavaScript) to program procedural flow of control. VoiceXML defines tags like �if� and �goto.� Purists see this as a weakness of VoiceXML, since a page markup language theoretically has no business performing the role of a procedural scripting language.

Beyond brevity and elegance, SALT reaps another major benefit from tagging on to HTML rather than standing alone: It can add speech to regular HTML Web pages. This feature, termed �multi-modality,� is not currently available in VoiceXML. This means that SALT is applicable where VoiceXML isn�t. For example, imagine using a personal digital assistant (PDA) to book a flight on a Web site like Travelocity. First, you tap the stylus on the �departing from� field, then you use a pop-up keyboard or graffiti to scratch in a city name. But stylus input is a drag. It would be much easier to simply speak �San Francisco� into the PDA microphone and see �SFO� magically appear on the screen. This is what SALT does.

The SALT forum was founded by Microsoft, Intel, Cisco, Comverse, Philips, and SpeechWorks. Because Microsoft is an active participant, it�s not hard to envision SALT being integrated into Visual Studio, FrontPage, Internet Information Server, and Internet Explorer. If this happens, adding speech support to a Web page will become a simple matter of setting an attribute on a field in FrontPage. That�s exciting.

So does SALT supercede VoiceXML? No. VoiceXML is incumbent, promulgated by the W3C, and supported by dozens of companies including Intel, IBM, Motorola, SpeechWorks, and Nuance. It is widely deployed and familiar to thousands of programmers. There are dozens of implementations and a sizable body of plug-in modules (�speech objects� or �dialog modules�) that acquire common data types like driver�s license numbers, phone numbers, social security numbers, account numbers, and so on. More importantly, the development environment is very approachable. For example, Tellme offers a simple over-the-Web way for anybody to create a sample VoiceXML application and deploy a demo right on Tellme�s servers (http://studio.tellme.com).

So we should not consider an alternative to VoiceXML as the beginning of a new standards war, but as evidence of the coming together of speech, telephony, and the Web � a fertile field ready to yield an abundant harvest.

Jim Machi is director, product management for the Network Processing Division of the Intel Communications Group. Intel, the world�s largest chip maker, is also a leading manufacturer of computer, networking, and communications products. For more information, visit www.intel.com.

[ Return To The July 2002 Table Of Contents ]

Today @ TMC
Upcoming Events
ITEXPO West 2012
October 2- 5, 2012
The Austin Convention Center
Austin, Texas
The World's Premier Managed Services and Cloud Computing Event
Click for Dates and Locations
Mobility Tech Conference & Expo
October 3- 5, 2012
The Austin Convention Center
Austin, Texas
Cloud Communications Summit
October 3- 5, 2012
The Austin Convention Center
Austin, Texas