VoiceXML
Versus SALT: Selecting A Voice Application Standard
By George T. Platt, Intervoice
When it comes to speech application standards, it seems we've been
asking all the wrong questions.
The VXML versus SALT debate is currently a hot topic in the IT conference
rooms of organizations that rely on efficient, cost-effective contact
centers. Phrases like 'intense competition' and 'battle royale' are bouncing
around the trade press. Rival consortiums are at work writing specs and
generating headlines, and some of the biggest names in technology have
entered the VXML versus SALT fray.
So, which speech standard will win, VXML or SALT? A good question, but
possibly a wrong question.
Given the real reasons organizations deploy speech-enabled technologies
and the fundamental nature of technology standards themselves, our focus
should be more on the application than on the application standard.
Ask yourself this: how often does someone using an accounting program or
other application know or care whether that software was written in C or in
Java-based code? When was the last time a customer hung up the phone and
said, 'Hey, that was the best VoiceXML application I have ever heard'?
The fact is, it's easy enough to get caught up in the debate over which
standard is superior and which will dominate. Standards matter. But what
matters most to the end user, and therefore what should matter most to
contact centers and their system suppliers, is the quality and performance
of the application itself.
With the end user in mind, now may be the time to ask some different and
more relevant questions about VXML and SALT. The answers may surprise you.
The Future Of Speech
One thing is certain: speech recognition is the future of voice automation
and a very important part of a customer or employee self-service strategy.
In fact, some industry analyst firms indicate that over 80 percent of all
customer interaction is still done with a telephone call.
Interactive voice response (IVR) is now a foundation technology of the
customer service marketplace and is common in the contact centers of
companies in financial services, insurance, telecommunications and a wide
range of other industries.
As customer-oriented organizations seek to drive both service performance
and cost efficiencies, the next major wave of investment is the
incorporation of speech-based IVR solutions. The entire concept of IVR
systems is being transformed by new and powerful speech-enabling innovations
such as speech recognition, text-to-speech and speaker verification.
By incorporating speech technologies into their voice automation systems,
enterprises are increasing productivity, cost efficiency and customer
satisfaction.
But, as we have seen in so many other technology environments, the move
to speech-driven automation has sparked an intense discussion over the
relative merits and viability of the standards that underlie this
still-emerging technology: Voice eXtensible Markup Language (VXML) and
Speech Applications Language Tabs (SALT).
These two evolving standards are making headlines as industry analysts,
development groups and IT vendors jockey for position in the growing
speech-enabled marketplace. Here are snapshot views of what their respective
forums have to say about each.
VoiceXML
First published in 2000 by a consortium of 500 companies under the auspices
of the VoiceXML Forum, VoiceXML has been described as the HTML of the voice
Web. VXML is an open, standard markup language for voice applications.
Originally developed for telephony applications, VXML harnesses the large
Web infrastructure created for HTML to simplify the development and
implementation of voice applications.
Control of the VoiceXML standard has been given to the Word Wide Web
Consortium (W3C), and that group published the VoiceXML 2.0 version upon
which a number of product solutions are now based. The VoiceXML Forum says
VXML takes advantage of several industry trends, including the growth of the
World Wide Web and the migration of the Web beyond the desktop computer, as
well as improvements in computer-based speech recognition and text-to-speech
synthesis.
SALT
As described by the SALT Forum, SALT extends existing Web markup languages
such as HTML, XHTML and XML to enable multimodal and telephony access to the
Web. The SALT 1.0 specification enables multimodal and telephony-enabled
access to information, applications and Web services from personal
computers, telephones, tablet PCs and wireless personal digital assistants.
This powerful multimodal access will allow end users to interact with
applications in a number of ways, such as audio, speech and synthesized
speech, plain text, mouse or keyboard, video or graphics. The SALT 1.0
specification is currently under consideration within the World Wide Web
Consortium.
This is what the parties in each standard have to say about themselves.
But how should managers of contact centers evaluate the relative pros and
cons of SALT and VoiceXML?
Five Questions To Ask
If you are an IT manager responsible for the performance of a contact
center, Web infrastructure or any other form of customer or employee
self-service, and you see speech-enabled automation as a natural part of
your user interface, which standard is right for you?
Here are five questions that go straight to the heart of the VXML versus
SALT debate. Ask them, and get the right answers, before you make a decision
on speech technology standards.
1. What is your current Web infrastructure?
Is your existing Web infrastructure built on J2EE or .NET? If you are
developing applications in the .NET environment, the Microsoft Speech Server
SALT browser provides a very clean, seamless integration. The Microsoft
Speech SDK provides speech development tools and ASP.NET components that
integrate into the Microsoft Visual Studio .NET development environment and
the .NET application server. So for contact centers or Web development
groups with an existing .NET Web environment, SALT is the obvious choice.
Companies that have adopted the J2EE Web infrastructure may have an
easier time developing VoiceXML applications. Technically, VoiceXML and SALT
browsers will work with any Web server. However, the development tools that
are included by VoiceXML vendors are usually Java-based, while the tools
included with the Microsoft SALT browser will obviously be tied to .NET.
Java developers can still take advantage of the Microsoft Speech Server and
development tools by using a .NET server with Web services that communicate
with back-end J2EE components for data access and business transactions.
While it is certainly possible to make SALT work in a Java environment, many
J2EE-based organizations will probably choose VoiceXML.
2. Are speed and vendor support important to you?
If you want to deploy an open standards speech-enabled voice automation
application rapidly on a proven technology platform, then VoiceXML provides
an advantage in terms of time-to-market and a diversity of vendor offerings.
Compared to the relatively new SALT standard, the more mature VoiceXML has
been under development for several years and is now in its second major
specification release. Additionally, product support for VoiceXML has been
introduced by most (if not all) IVR vendors in the marketplace.
Organizations can leverage VoiceXML to immediately deploy an
open-standards IVR with full integration to the call center, PBX, ACDs and
CTI, and enjoy the technical support and service of established system
suppliers. Vendors that support both the VoiceXML and SALT standards give
companies an added degree of flexibility; they can deploy now using
established VXML solutions and, if conditions warrant, migrate smoothly to
SALT-based applications at some point in the future. In fact, a single
customer or employee interaction could include interaction with both
standards seamlessly within a single call.
3. Do you need multimodal access?
If multimodal access by devices including mobile phones and wireless PDAs,
in addition to traditional telephony and Web browsers, will be an important
consideration in the future, then the SALT standard makes better sense than
VoiceXML. Multimodal access is a core capability of the SALT specification,
while VoiceXML is a voice interface language originally designed
specifically for the voice user interface.
This doesn't mean you can't voice-enable a Web site using VoiceXML. X+V
(or XHTML plus Voice) extends the VoiceXML specification by adding
multimodal attributes. However, if multimodal access is a central issue in
your deployment, SALT may be your best option given its more granular level
control of multimodal events and the fact that this capability was built
into the requirements of its design from day one.
For example, the New York City Department of Education (NYC DOE) boasts
the largest school system in the country with over one million students. To
optimize the children's educational experience by addressing parental
concerns and encouraging parental involvement, the NYC DOE is using a
speech-enabled application on a SALT-powered platform to enable parents to
check things such as their child's attendance record, course grades and
lunch menu for the day. Much of this information is already available to the
parents via the NYC DOE Web site, but they are using speech technologies to
enable round-the-clock accessibility to the information for parents that
don't have consistent access to a computer.
4. Which standard will best support your existing infrastructure?
The standard you select must support your existing technology
infrastructure. However, the plain truth is that both VoiceXML and SALT are
equally inadequate in their ability to integrate into a call center
environment. VoiceXML and SALT are presentation-layer specifications,
meaning they address the user interface (voice and multimodal), but do not
address integration or back-end functionality requirements.
At best, standards provide a baseline framework for such things as the
hardware platforms (Intel, Windows, Linux), telephony integration (ISDN,
SS7, SIP) and the voice user interface (VoiceXML and SALT). But standards do
not encompass all of the components needed to integrate and deploy a voice
automation solution. If we consider technologies such as call control, CTI
and legacy host integration, we see that standards do not cover every
crucial element necessary for a successful voice automation solution.
Components that also include tools for operating, maintaining and
administering the systems and tools for developing and debugging
applications are needed to support the successful lifecycle of a solutions
deployment and are not adequately addressed by the standards themselves.
To create a workable solution, you need all of these elements ' some of
which are supported by an open standard. However, in the end, it's up to the
solutions vendor to provide all of the product components that are needed to
develop, maintain and report on a voice application. In fact, an
organization can rely on open standards to build, perhaps, half of an open
IVR solution, and the rest must be supplied by a vendor. Because neither
SALT nor VXML provide all of the features needed for an IVR solution,
organizations that must deploy these solutions in the context of the larger
call center environment may wish to seek out solutions that support both
SALT and VXML.
There are no agreed upon standards for call control, CTI or data/host
integration, for example, which means various vendors will deploy very
unique solutions. To ensure optimum flexibility, it is important to support
either your preferred standard or both standards, so you can support the
elements of your existing or planned infrastructure under that standard.
5. Which standard will win in the contact center market?
The answer is, we just don't know and in reality we don't need to know.
There will always be new and competing ideas on standards, and while either
SALT or VXML may one day emerge as the dominant player, they may coexist
equally for a significant period of time. In fact, there's even talk that
the two standards may one day come together into one. The other thing of
which we can be certain is that standards such as VXML and SALT will
continue to evolve, and that new standards will be created to address new
functionality in the future. That's why it makes no sense to delay the
launch of a flexible, cost-efficient voice automation solution until the
standards sort themselves out.
Standards In Perspective
It is easy enough to get caught up in the debate over the relative
advantages of one standard or another. Standards matter. What matters more
is that you get the speech-enabled application right. In the end, customers
interface with and react to applications and not the standards that help
enable those applications.
By keeping your focus on the quality and efficiency of the customer
interaction, and the wider set of automated voice technologies needed to
support more natural and effective communications, you can put the ongoing
debate over standards in its proper perspective.
Organizations invest in voice recognition solutions to improve the
quality of customer interactions. Standards are just a means to that more
important end.
George Platt is senior vice president for Business Development and Marketing
for Intervoice (www.intervoice.com).
If you are interested in purchasing reprints of this article (in either
print or HTML format), please visit Reprint Management Services online at
www.reprintbuyer.com or contact a
representative via e-mail at
[email protected] or by phone at 800-290-5460.
For information and subscriptions, visit
www.TMCnet.com or
call 203-852-6800.
[
Return
To The May 2004 Table Of Contents ] |