Internet Telephony Industry Insight

July 2002
	Season with SALT, Sprinkle with VoiceXML BY JIM MACHI

Many people seem excited about the innovative services made possible by the move from circuit to VoIP. But when you ask for specific examples, they go quiet. Thatï¿½s because services you can deploy today on VoIP, you can also deploy on circuit. Even so, popular wisdom is not entirely wrong. After growing untidily for more than a hundred years, circuit networks are a tangle of protocols, standards, and proprietary implementations presided over by incumbents with little motivation to change. The move to packets for transport flattens out everything, lowers technical barriers to entry, and promises new services. Whatever new services evolve as voice moves from circuit to packet, it is a safe bet that many of them will depend on speech technologies. Automatic speech recognition (ASR) and text-to-speech (TTS) have slowly and steadily improved so that they are now usable by ï¿½normalï¿½ people (translation: by me). Of course, to be impressive, ASR-based systems still have to be carefully designed, then laboriously tested and tweaked. But the best are really good. For example, the interactive voice response (IVR) application that asks for my American Airlines frequent flier number is more accurate than a human ever was. (And it understands a New York accent.) Even better, the number automatically goes to the agent who handles the call, so the agent doesnï¿½t have to ask for it again. That we find this impressive shows how pathetically low our expectations of IVR really are. How could any competent system designer automatically capture an account number, then routinely require the caller to speak it again to the agent ï¿½ as they do at numerous credit card companies? While this magazine is typically about telephony, I know you wonï¿½t be surprised to hear that voice isnï¿½t the only kind of data that can be transported by IP. The Internet is transforming the way companies do business. E-Commerce systems are based on a constellation of technologies including application server platforms like Java 2 Enterprise Edition (J2EE ), Web servers, and markup languages like Hyper Text Markup Language (HTML) and eXtensible Markup Language (XML). Thereï¿½s an old saying that when all you have is a hammer, everything looks like a nail. Similarly, when the e-Commerce contingent started planning to voice-enable their systems, they thought in terms of markup languages for implementation. There have been several such initiatives, but the first to catch on was Voice Extensible Markup Language (VoiceXML). Then, just a few months ago, a new contender emerged: Speech Application Language Tags (SALT). Considering that they use similar technology to achieve similar ends, VoiceXML and SALT are surprisingly different in their philosophy, their application, and their intellectual property environment. Both are markup languages. SALT consists of extensions to HTML, while VoiceXML is a complete markup language in its own right. Both are designed to speech-enable Web applications, starting with the notion of a Web browser. A regular Web browser uses a screen to render output and a keyboard and mouse to collect input. Exactly the same basic model could use TTS to speak the Web page through a loudspeaker or telephone instead of rendering it to a screen. Similarly, input could be collected by ASR or touchtone instead of a keyboard and mouse. This is the basic idea of VoiceXML and half the SALT story. VoiceXML, in existence longer than SALT, is on version 2.0. It has roughly 80 tags compared to SALTï¿½s seven. This radical size difference comes from a fundamental difference in philosophy. While SALT is an add-on to HTML, VoiceXML is a complete markup language in its own right. Like HTML, SALT uses ECMAScript (also known as JavaScript) to program procedural flow of control. VoiceXML defines tags like ï¿½ifï¿½ and ï¿½goto.ï¿½ Purists see this as a weakness of VoiceXML, since a page markup language theoretically has no business performing the role of a procedural scripting language. Beyond brevity and elegance, SALT reaps another major benefit from tagging on to HTML rather than standing alone: It can add speech to regular HTML Web pages. This feature, termed ï¿½multi-modality,ï¿½ is not currently available in VoiceXML. This means that SALT is applicable where VoiceXML isnï¿½t. For example, imagine using a personal digital assistant (PDA) to book a flight on a Web site like Travelocity. First, you tap the stylus on the ï¿½departing fromï¿½ field, then you use a pop-up keyboard or graffiti to scratch in a city name. But stylus input is a drag. It would be much easier to simply speak ï¿½San Franciscoï¿½ into the PDA microphone and see ï¿½SFOï¿½ magically appear on the screen. This is what SALT does. The SALT forum was founded by Microsoft, Intel, Cisco, Comverse, Philips, and SpeechWorks. Because Microsoft is an active participant, itï¿½s not hard to envision SALT being integrated into Visual Studio, FrontPage, Internet Information Server, and Internet Explorer. If this happens, adding speech support to a Web page will become a simple matter of setting an attribute on a field in FrontPage. Thatï¿½s exciting. So does SALT supercede VoiceXML? No. VoiceXML is incumbent, promulgated by the W3C, and supported by dozens of companies including Intel, IBM, Motorola, SpeechWorks, and Nuance. It is widely deployed and familiar to thousands of programmers. There are dozens of implementations and a sizable body of plug-in modules (ï¿½speech objectsï¿½ or ï¿½dialog modulesï¿½) that acquire common data types like driverï¿½s license numbers, phone numbers, social security numbers, account numbers, and so on. More importantly, the development environment is very approachable. For example, Tellme offers a simple over-the-Web way for anybody to create a sample VoiceXML application and deploy a demo right on Tellmeï¿½s servers (http://studio.tellme.com). So we should not consider an alternative to VoiceXML as the beginning of a new standards war, but as evidence of the coming together of speech, telephony, and the Web ï¿½ a fertile field ready to yield an abundant harvest. Jim Machi is director, product management for the Network Processing Division of the Intel Communications Group. Intel, the worldï¿½s largest chip maker, is also a leading manufacturer of computer, networking, and communications products. For more information, visit www.intel.com. [ Return To The July 2002 Table Of Contents ]

Today @ TMC

Headlines

Upcoming Events

ITEXPO West 2012
October 2- 5, 2012
The Austin Convention Center
Austin, Texas

MSPWorld
The World's Premier Managed Services and Cloud Computing Event
Click for Dates and Locations

Mobility Tech Conference & Expo
October 3- 5, 2012
The Austin Convention Center
Austin, Texas

Cloud Communications Summit
October 3- 5, 2012
The Austin Convention Center
Austin, Texas

Corporate News

Winners of 2026 MSP Today Product of the Year Awards Announced

TMC Labs and INTERNET TELEPHONY Announce Winners of 2026 Innovation Awards

CUSTOMER Magazine Announces Winners of 2026 Product of the Year Awards

TMCnet and Cloud Computing Magazine Announce Winners of the 2026 Cloud Computing Product of the Year Awards

INTERNET TELEPHONY Announces Winners of the 2026 SD-WAN Product of the Year Awards

Enterprise AI Leaders Recognized by TMCnet for Real-World Impact

TMCnet Announces Winners of 2026 Remote/Hybrid Work Leadership Awards

FusionScore Launches AI Visibility Platform and FAME Engine to Help Companies Win in Generative AI Search

The World's Largest Communications And Technology Community

Technology Marketing Corporation,
2 Trap Falls Road Suite 106, Shelton, CT 06484
Ph: 800-243-6002, 203-852-6800; Fx: 203-866-3326
General comments: [email protected]. Comments about this site: [email protected].

» About » Contact » Advertise

Technology Marketing Corp. 1997-2017 Copyright . Privacy Policy Sitemap