ITEXPO begins in:   New Coverage :  Asterisk  |  Fax Software  |  SIP Phones  |  Small Cells

Feature Article
March 2003

go directly to A Closer Look At Call Control (sidebar)

Bringing The Benefits Of Speech To Web Interfaces


A quick look at the ideas behind SALT shows how it can streamline development of speech interfaces. Here are SALT�s three primary design principles:

� Use a Web-centric model to build speech applications in order to take advantage of application code, development tools and developer skill sets that currently exist.

� Engineer a lightweight and flexible specification that is simple to learn and can be utilized in a wide variety of applications.

� Unite and build upon existing telephony, Web, and speech standards to simplify the development and deployment of standards-based speech solutions.

SALT is a collection of XML tags designed to hide the complexity of managing speech input and output. For example, rather than specify all of the control structure necessary to load, configure, invoke, and manage the execution of a speech recognition engine, SALT allows a developer to use a <listen> tag with subtags that specify the grammar used by the speech recognition engine. The results are then tied from the engine�s processing to program variables. By hiding many of the low-level control details, SALT tags enable the developer to specify speech functions at a high level of abstraction.

Today�s users access information through two primary interfaces: voice and visual. For example, if you want to see your checking account balance online, you can use the Web browser on your personal computer. You issue commands using a keyboard and mouse, and the system presents information to you visually. Alternatively, if you want to check your balance by phone, you dial your bank�s interactive voice response (IVR) system and interact with the system using voice, touch-tones, and prompts.

Unfortunately, each of these applications was developed by two different sets of programmers writing in two very different languages. The applications reside on separate systems and have separate support infrastructures.

SALT helps converge these separate infrastructures by allowing different user interfaces to be developed for a common set of business logic (Figure 1). Because SALT only addresses the presentation of information, companies can take an existing Web application and use SALT to build a voice-enabled interface. Sharing business logic across multiple applications extends a company�s investment in Web applications by reusing Web development tools, Web programmers, and Web infrastructure.

In addition, SALT enables a third kind of interface called multimodal, which combines both voice and visual interfaces. For many applications, multimodal provides a more natural, user-friendly interface. For example, a PDA could access and display a stock portfolio while the user issues buy/sell orders by speaking. Likewise, someone planning a trip could speak departure and destination cities, and view complicated scheduling options on the screen. SALT provides a seamless way to build voice and multimodal user interfaces that share common business logic.

The famous architect Mies van der Rohe, known for his highly functional but visually understated building designs where �less is more,� could easily have been the inspiration for SALT.

The architects of SALT did not choose to create an entirely new standalone programming language. Rather, they built upon existing languages (and standards) by creating a set of tags that are designed to be embedded into markup languages such as XHTML or WML. By building on languages that so many programmers already know, developers can get up to speed on SALT quickly. In addition, SALT interpreters can run on a wide variety of devices, including personal computers, PDAs, and �smartphones.� This increases the number of users who can access a given application.

SALT is extensible as well as flexible. It defines an element for message exchange known as Simple Messaging Extension (SMEX). These elements communicate with the external components of the SALT platform and allow SALT applications to interact with external applications that provide call control, database access, and messaging.

Voice, visual, and multimodal interfaces can present the same information to users in very different ways. An awareness of these differences can help programmers who have only worked on visual interfaces learn effective voice and multimodal techniques. Even though the language and logic of SALT will be very familiar to developers, designing an effective interface requires an understanding of the way different interfaces are affected by dialogue style and modality.

Dialogue Style

Dialogue style describes how the user interacts with the application. Dialogues can be user-directed, system-directed, or a combination of the two (known as mixed initiative). Telephony applications such as IVR tend to be system-directed. The system (i.e., application) issues commands and users respond. For example, when you call your bank to check your account balance, the system guides you through the process of collecting information from you and providing information to you. In contrast, Web applications accessed on our PCs are user-directed. Users tell the �system� (the Web application) to provide the information they need.


Modality describes the kind of interaction that takes place between users and the system. For example, when calling an IVR system, you are prompted for input, you respond, the system responds to you, etc. When you call and ask for your last ten deposits, the system �reads� them to you one at a time (i.e., serially).

Visual interfaces also employ a single mode of interaction, but have very different characteristics. A �page� of information is displayed on your PC, mobile phone, or PDA. Unlike voice interfaces, visual interfaces can provide many different pieces of information to the system simultaneously. In the above example, all of your last ten deposits could be shown on the screen at the same time.

As we have discussed, multimodal interfaces combine the two modes. Users can initiate a session by phone, receive information on a screen, and respond by voice or on screen. This can provide a more natural, user-friendly way of interacting with applications.

As previously mentioned, SALT tags are designed to be embedded in common Web markup languages. These �SALT-enhanced� markup languages can be combined with a common Web scripting language such as ECMAScript to create dynamic, speech-enabled Web pages that support a range of dialogue styles. The voice and visual presentation is coded in the markup language while the dialogue flow is coded in the scripting language. In this way, SALT can support both system-directed applications, such as IVR systems, as well as mixed-initiative, multimodal interfaces. In developing these types of applications, the asynchronous nature of the applications created with markup languages is an important advantage for SALT. XHTML, for example, can sit idle indefinitely until a user takes action, or it can be scripted to be more system-directed.


By defining a simple and flexible specification for developing speech interfaces to Web applications, SALT will help bring together telephony and Web applications. As a result, SALT can reduce the cost of managing a separate voice application infrastructure as well as enable development of advanced, multimodal interfaces for current and next generation applications.

Peter Gavalakis is marketing manager for Intel and a member of the SALT Forum Marketing Working Group. For more information about SALT, SALT-based products, or to download a copy of the specification, go to the SALT Forum Web site at www.saltforum.org.

[ Return To The March 2003 Table Of Contents ]


A Closer Look At Call Control
Telephones are much more common than personal computers today. In the future, it seems likely that many multimodal devices with wireless connectivity will utilize existing mobile and landline networks. All but the simplest speech applications require a way to establish and manage connections to these endpoints.
 SALT defines a Call Control object to implement telephony functions. Alternatively, developers can use the SMEX message exchange interface. Using SMEX, developers can benefit from the variety of call control mechanisms that currently exist. These include both proprietary implementations as well as standards such as CSTA. In either case, the developer is abstracted from the underlying transport (e.g., ISDN over T1/EI, SIP, H.323, etc.). Companies that offer a wide variety of circuit and packet-network connectivity products are working with developers of SALT browsers to ensure that a wide variety of network connectivity choices are available.

[ Return To The March 2003 Table Of Contents ]

Today @ TMC
Upcoming Events
ITEXPO West 2012
October 2- 5, 2012
The Austin Convention Center
Austin, Texas
The World's Premier Managed Services and Cloud Computing Event
Click for Dates and Locations
Mobility Tech Conference & Expo
October 3- 5, 2012
The Austin Convention Center
Austin, Texas
Cloud Communications Summit
October 3- 5, 2012
The Austin Convention Center
Austin, Texas