×

TMCnet
ITEXPO begins in:   New Coverage :  Asterisk  |  Fax Software  |  SIP Phones  |  Small Cells
 

Feature Article
September 2002


SALT: The Light In Speech Mark-Up

BY ALBERT KOOIMAN & DR. KUAN SAN WANG

In October 2001, Cisco, Comverse, Intel, Microsoft, Philips, and SpeechWorks established the SALT Forum in order to create a standard mark-up language for multimodal and telephony speech applications. This initiative has gained increasing momentum since its inception and in the last few months alone, 43 companies have joined the Forum. The SALT Forum�s identification of the telephony market as one of its targets has created speculation in the industry that the SALT (speech application language tags) specification is competing with VoiceXML 2.0, which, if all goes well, will receive the World Wide Web Consortium�s (W3C) Recommendation Status in Spring 2003. VoiceXML and the SALT specification do not need to be seen as competing standards; each can be seen as a viable standard in its own right that can be chosen for deployment on the basis of its own merits.

Web, What Web?
For several years now, the buzz about a voice-enabled Web has created tremendous excitement and promise in the industry. Although this event is still in its infancy, a great amount of momentum has been building, and hundreds of companies have embraced VoiceXML.

A key driver of the voice-enabled Web is that the future of commerce is based on Internet standards and the ability of mobile users to access Web content through voice channels. The notion that Internet standards will power commerce of the future and that adding a voice channel to give access to such commerce for mobile users will grow business, will drive this even further. Whereas VoiceXML addresses the voice-only case, it is expected that more and more people would prefer to choose the way to access their application on the basis of what is the most convenient: keyboard and mouse, a stylus, or speech.

The SALT Forum was created on that very principle � the ability to simply add a speech access channel to an existing GUI-based Web application. The SALT specification builds on a few Internet paradigms as the cornerstone for its success:

  • Clean separation of representation, logic, and data;
  • Event-driven, object-oriented programming approach to speech interface design; and
  • Extendable XML model.

Moreover, the SALT specification allows for the integration of different modalities into one unified execution model � developers do not need to learn different languages for the respective modalities.

Multimodal is Different
Applications based on the SALT specification can either use the speech channel and be accessed only by voice, or they can be multimodal and give visual feedback. The user can choose to speak, click a mouse, or enter text via a keyboard, stylus, or any other way at his disposal. SALT-based applications can run on a server in the network, on a handheld device, or even on a mix of the two. The SALT specification has defined profiles for all of these various ways of using the specification. One key element in this is that SALT is event-driven. In the multimodal world, you never know what will happen next � where the user will click on the page, which field he will fill first, whether he will give more than one piece of information at once, etc. Actually, this is similar to the telephony world, for most telephony events are asynchronous as well.

Stairway To Heaven
This is all possible because SALT concentrates on the basics � providing a doorway for input and output to Web applications, and giving the programmer a fine control over implementing the user interface that makes the most sense for the application by using a scripting language like ECMA Script (a.k.a. JavaScript). The SALT specification introduces an event-driven, object-oriented programming model to speech interface design. This programming model has proven itself to be a powerful and flexible paradigm that meets the most demanding requirements in creating sophisticated user interfaces, most notably, for GUI or Web applications. The programming model is also already familiar to the developer community at large. Software engineers will be able to apply their immense experience and best practices directly to developing applications with the SALT specification. By following widely used programming models and using existing programming languages, the SALT Forum believes that speech programming will become much more mainstream.

The SALT object model is fairly simple with only a few objects, called �listen� and �prompt,� for speech input and output processing respectively. Underneath this simple cover, however, lies the rich functionality the SALT specification has to offer. The key to making the SALT specification versatile, yet simple, is to separate data and operations in the most logical way. Object models often become unnecessarily complicated because too many function calls must be used to manage complex data structures. The SALT specification�s design avoids this problem by using XML to represent complicated data. As a result, SALT objects only need a few methods, such as starting and stopping audio streams, and a few events reporting synthesis progress and recognition results.

Who Cares Who Is Talking?
The SALT specification is designed to be extensible and it defines standard ways to extend the functionality not covered its current 1.0 version. For example, the SALT specification incorporates a mechanism to extend the �listen� object for speaker identification and verification. The specification also allows interoperability with a wide range of input/output devices, such as Instant Messaging, Internet chatting, VoIP or general telephony, global positioning systems for location-aware applications, and text telephones (TTY) or Braille devices for the hearing or visually impaired. In addition to physical I/O devices, the same mechanism also makes the SALT specification Web Services ready, in the sense that SALT-based documents can have simple and secure links to the Web Services available on the Internet. These extensions also enable SALT to be easily interfaced with legacy infrastructures so that existing investments can be recapitalized. In other words, SALT extension standards simply take advantage of XML and realize its benefits to the fullest. The SALT specification empowers developers to introduce extensions, while using XML to insure that these extensions do not sacrifice application portability and interoperability.

A <Programmer> Knows Brackets
It is the strength of the SALT specification to concentrate on the basics and not to invent a new programming language with a new browser. VoiceXML has become a standalone programming language with a lot of brackets, which combines procedural and declarative approaches, and has several limitations in this respect. However, VoiceXML has created something very useful � a universal interactive voice response (IVR) scripting language that can be run on many IVR systems. This language addresses customer demand for interoperability, which is the prime benefit standards can deliver.

VoiceXML, like IVR, is based on fixed menu driven turns with a synchronous execution model. It uses menus or some kind of a form, which is defined by a Form Interpretation Algorithm (FIA) that synchronizes speech input and output. Actually, this FIA may get in the developer�s way. The synchronous execution model makes it difficult to integrate asynchronous modalities, or use VoiceXML together with those technologies. Plus, the FIA makes the browser so �heavy� that it has to reside in the network. These are major issues that will be discussed by the W3C Voice Browser group in the framework of VoiceXML 3.0. Combining VoiceXML with other modalities, like the ability to simultaneously handle speech and point-and-click, will open some technological issues that are already solved in the execution environments of SALT.

Voice Browsing Standards
There is a perception in the industry that there is a �standards war� going on between SALT and the upcoming VoiceXML 2.0 specification, which currently is reaching Candidate Recommendation Status for W3C to adopt. In this context, it is worth looking at how much the SALT specification uses the work done by the W3C thus far. What many people do not realize is that the Voice Browser Group of the W3C, in addition to creating VoiceXML, has developed other standard specifications that are used by SALT, including the Speech Recognition Grammar Format and the Speech Synthesis Markup Language, as well as markup languages like for call control, semantic interpretation, etc.,

In the SALT Forum, interoperability is a design principle of great value. Therefore, it is recommended that all SALT-based browsers use these W3C standards as a common denominator. This leverages the W3C Voice Browser Group�s work within the SALT specification.

All On The Same Page?
The founders of the SALT Forum all have vested interest in promoting speech to continue the success of the Internet. Most of them choose to continue investing in VoiceXML in addition to investing in SALT. In the W3C, the first discussions on VoiceXML 3.0 have started, which offer the opportunity to take the best of the VoiceXML 2.0 and SALT specifications. In parallel, the W3C has established a multimodal working group, called Multimodal Interaction Activity. This working group has a charter for the coming two years, and will take its time to find common ground on the requirements and to develop a common standard for multimodal applications. In the meantime, the industry can use both approaches: VoiceXML in cases when a voice-only server-based solution is needed, or SALT if an application may need to offer multimodality as well and is based on an existing Web application.

Albert Kooiman is director of business development at Philips Speech Processing. Dr. Kuan San Wang is a researcher with Microsoft Corp. Microsoft and Philips are both founding members of the SALT Forum, which brings together a diverse group of companies sharing a common interest in developing and promoting speech technologies for multimodal and telephony applications. For more information, visit them online at www.saltforum.org.

[ Return To The September 2002 Table Of Contents ]



Today @ TMC
Upcoming Events
ITEXPO West 2012
October 2- 5, 2012
The Austin Convention Center
Austin, Texas
MSPWorld
The World's Premier Managed Services and Cloud Computing Event
Click for Dates and Locations
Mobility Tech Conference & Expo
October 3- 5, 2012
The Austin Convention Center
Austin, Texas
Cloud Communications Summit
October 3- 5, 2012
The Austin Convention Center
Austin, Texas