go directly to A Closer Look At Call
Bringing The Benefits Of Speech To Web
BY PETER GAVALAKIS
A quick look at the ideas behind SALT shows how it can streamline
development of speech interfaces. Here are SALTï¿½s three primary design
ï¿½ Use a Web-centric model to build speech applications in order to take
advantage of application code, development tools and developer skill sets
that currently exist.
ï¿½ Engineer a lightweight and flexible specification that is simple to
learn and can be utilized in a wide variety of applications.
ï¿½ Unite and build upon existing telephony, Web, and speech standards to
simplify the development and deployment of standards-based speech
SALT is a collection of XML tags designed to hide the complexity of
managing speech input and output. For example, rather than specify all of
the control structure necessary to load, configure, invoke, and manage the
execution of a speech recognition engine, SALT allows a developer to use a
<listen> tag with subtags that specify the grammar used by the
speech recognition engine. The results are then tied from the engineï¿½s
processing to program variables. By hiding many of the low-level control
details, SALT tags enable the developer to specify speech functions at a
high level of abstraction.
SEPARATING BUSINESS LOGIC AND USER INTERFACES
Todayï¿½s users access information through two primary interfaces:
voice and visual. For example, if you want to see your checking account
balance online, you can use the Web browser on your personal computer. You
issue commands using a keyboard and mouse, and the system presents
information to you visually. Alternatively, if you want to check your
balance by phone, you dial your bankï¿½s interactive voice response (IVR)
system and interact with the system using voice, touch-tones, and prompts.
Unfortunately, each of these applications was developed by two different
sets of programmers writing in two very different languages. The
applications reside on separate systems and have separate support
SALT helps converge these separate infrastructures by allowing
different user interfaces to be developed for a common set of business
logic (Figure 1). Because SALT only addresses the presentation of
information, companies can take an existing Web application and use SALT
to build a voice-enabled interface. Sharing business logic across multiple
applications extends a companyï¿½s investment in Web applications by
reusing Web development tools, Web programmers, and Web infrastructure.
In addition, SALT enables a third kind of interface called multimodal,
which combines both voice and visual interfaces. For many applications,
multimodal provides a more natural, user-friendly interface. For example,
a PDA could access and display a stock portfolio while the user issues
buy/sell orders by speaking. Likewise, someone planning a trip could speak
departure and destination cities, and view complicated scheduling options
on the screen. SALT provides a seamless way to build voice and multimodal
user interfaces that share common business logic.
LESS IS MORE
The famous architect Mies van der Rohe, known for his highly
functional but visually understated building designs where ï¿½less is
more,ï¿½ could easily have been the inspiration for SALT.
The architects of SALT did not choose to create an entirely new standalone
programming language. Rather, they built upon existing languages (and
standards) by creating a set of tags that are designed to be embedded into
markup languages such as XHTML or WML. By building on languages that so
many programmers already know, developers can get up to speed on SALT
quickly. In addition, SALT interpreters can run on a wide variety of
devices, including personal computers, PDAs, and ï¿½smartphones.ï¿½ This
increases the number of users who can access a given application.
SALT is extensible as well as flexible. It defines an element for message
exchange known as Simple Messaging Extension (SMEX). These elements
communicate with the external components of the SALT platform and allow
SALT applications to interact with external applications that provide call
control, database access, and messaging.
ALL USER INTERFACES ARE NOT CREATED EQUAL
Voice, visual, and multimodal interfaces can present the same
information to users in very different ways. An awareness of these
differences can help programmers who have only worked on visual interfaces
learn effective voice and multimodal techniques. Even though the language
and logic of SALT will be very familiar to developers, designing an
effective interface requires an understanding of the way different
interfaces are affected by dialogue style and modality.
Dialogue style describes how the user interacts with the application.
Dialogues can be user-directed, system-directed, or a combination of the
two (known as mixed initiative). Telephony applications such as IVR tend
to be system-directed. The system (i.e., application) issues commands and
users respond. For example, when you call your bank to check your account
balance, the system guides you through the process of collecting
information from you and providing information to you. In contrast, Web
applications accessed on our PCs are user-directed. Users tell the ï¿½systemï¿½
(the Web application) to provide the information they need.
Modality describes the kind of interaction that takes place between
users and the system. For example, when calling an IVR system, you are
prompted for input, you respond, the system responds to you, etc. When you
call and ask for your last ten deposits, the system ï¿½readsï¿½ them to
you one at a time (i.e., serially).
Visual interfaces also employ a single mode of interaction, but have very
different characteristics. A ï¿½pageï¿½ of information is displayed on
your PC, mobile phone, or PDA. Unlike voice interfaces, visual interfaces
can provide many different pieces of information to the system
simultaneously. In the above example, all of your last ten deposits could
be shown on the screen at the same time.
As we have discussed, multimodal interfaces combine the two modes. Users
can initiate a session by phone, receive information on a screen, and
respond by voice or on screen. This can provide a more natural,
user-friendly way of interacting with applications.
GET READY FOR MULTIMODAL
As previously mentioned, SALT tags are designed to be embedded in
common Web markup languages. These ï¿½SALT-enhancedï¿½ markup languages
can be combined with a common Web scripting language such as ECMAScript to
create dynamic, speech-enabled Web pages that support a range of dialogue
styles. The voice and visual presentation is coded in the markup language
while the dialogue flow is coded in the scripting language. In this way,
SALT can support both system-directed applications, such as IVR systems,
as well as mixed-initiative, multimodal interfaces. In developing these
types of applications, the asynchronous nature of the applications created
with markup languages is an important advantage for SALT. XHTML, for
example, can sit idle indefinitely until a user takes action, or it can be
scripted to be more system-directed.
THE PROMISE OF SALT
By defining a simple and flexible specification for developing speech
interfaces to Web applications, SALT will help bring together telephony
and Web applications. As a result, SALT can reduce the cost of managing a
separate voice application infrastructure as well as enable development of
advanced, multimodal interfaces for current and next generation
Peter Gavalakis is marketing manager for Intel and a member of the SALT
Forum Marketing Working Group. For more information about SALT, SALT-based
products, or to download a copy of the specification, go to the SALT Forum
Web site at www.saltforum.org.
[ Return To The March 2003
Table Of Contents ]