VoiceWeb. Voice Portals. VoiceXML. They are terms that
have gone from flights of fancy just a few years ago, to
hard-charging technology realities today. And as
VoiceXML continues to move forward, it is poised to
enter the marketplace with an audible bang. The
implications of VoiceXML will impact the interactive
voice response (IVR) industry profoundly.
Just as HTML revolutionized the Internet and powered
the phenomenal growth of the Web, VoiceXML seems
positioned to revolutionize IVR by laying the groundwork
for the VoiceWeb. And while this revolution holds many
promises, pitfalls remain.
The VoiceXML Advantage
VoiceXML removes application dependence on proprietary
architectures. Voice applications in the IVR world have
traditionally been developed based on either proprietary
scripting languages or on proprietary APIs. As with any
proprietary language or API, vendor implementation
limitations often result from a lack of the broader
industry perspective that has nurtured VoiceXML along
its development process.
Additionally, as the service provider's corpus of
vendor-specific application code grows, so too does the
dependency on that vendor. In time, the cost to
redevelop applications on a better platform can be
prohibitive, restricting the service provider to a less
price-competitive or feature-competitive solution.
However, with applications developed in VoiceXML,
application portability is greatly enhanced, freeing the
service provider to focus more directly on
price/performance and OA&M feature comparisons when
selecting a platform vendor.
Dependence on proprietary system-level hardware and
software technology is reduced. By abstracting
underlying interfaces to telephony and voice-processing
hardware and software, VoiceXML frees the service
provider to select compliant platforms that provide
support for multiple options in "best-of-breed"
component technologies like voice recognition and
In a technological environment where fundamental
improvements in voice recognition and text-to-speech
(among other technologies) continue to be made, it is
imperative to the service provider that hardware,
software, and even system component vendor changes can
be made with maximum transparency.
VoiceXML Makes Transparency Possible
A standardized language fuels competition among
development tool vendors. As voice application
development shifts to standardized VoiceXML, a
marketplace for sophisticated VoiceXML-targeted
integrated development environments (IDEs) will broaden.
Standardization of the markup language will ensure
that applications developed with an IDE from one vendor
interoperate with a VoiceXML platform from another --
ending the rigid lack of choice in the vertically
integrated proprietary IVR model. The resulting
marketplace competition will bring better IDE choices to
the service provider.
Rather than recruiting, training, and retaining
developers with narrow proprietary development skills,
VoiceXML will empower service providers to select from
the broader marketplace skill set of Web developers. As
an added benefit, these developers will also be capable
of deploying multi-modal (Web and voice) applications,
while clients get the best of both worlds -- quicker
application development with the same security and
As the W3C standardizes formal multi-modal markup
languages, these same developer resources will be
ideally positioned to leverage their existing skills in
developing applications using these new languages.
Embracing Web development methodologies means
VoiceXML builds from experience. Web infrastructure and
development methodologies will be leveraged as VoiceXML-based
services are deployed. Load-balancing, caching,
middleware, and application servers that support
existing Web-based services will become equally useful
in supporting voice-based services as well. In fact, the
VoiceXML architecture will do a great job supporting the
n-tier development paradigm, meaning real reuse of
business logic components between Web and voice
applications may finally become a practical reality.
A marketplace for third-party voice application
components is thus spawned. Leveraging standardized
voice markup, third-party software developers will be
able to write powerful and reusable server-side
voice-application components for J2EE or .Net
environments. Libraries of such components will greatly
accelerate application development. Written in
object-oriented languages like Java or C#, for example,
these components could be easily customized or extended
to more quickly create powerful applications.
More Flexible Service Models
Unlike its predecessors in the genre, the application
deployment strategy of VoiceXML offers a much richer
array of options. Traditional IVR service providers
develop, host, and maintain voice applications for their
clients. With VoiceXML, however, more options become
available. A third party may develop the application.
The client may elect to host (via HTTP) and directly
control the source code (VoiceXML, middleware code,
voice recordings, grammars, etc.), yet contract with a
service provider to handle telephony and media services.
With applications coded in VoiceXML, a client can much
more easily choose to switch from insourcing to
outsourcing or to move from one service provider to
another. This is flexibility that meets the diverse
demands of an increasingly sophisticated client base.
VoiceXML also enables development of a VoiceWeb.
Although a convincing underlying business model that
would support a pervasive VoiceWeb remains elusive, the
potential for expanding the demand for voice services on
a VoiceWeb is exciting. Whether a VoiceWeb is the
ultimate outcome of VoiceXML's continued development and
implementation remains to be seen, but the numerous
other advantages of VoiceXML are, in any case,
Understanding The Risks
As with any new technology, there are risks attached to
VoiceXML and some very important questions about its
present and future remain unanswered. Despite the
promise that VoiceXML holds for the IVR industry, the
road to eventual success is not without hazards.
Will VoiceXML implementations become balkanized,
disabling interoperability? Early implementations of
VoiceXML have included many proprietary extensions. To
be fair, many of these implementations preceded the
VoiceXML 1.0 specification, but if VoiceXML
implementations continue with such an "embrace and
extend" philosophy, balkanization will occur and a
meaningful VoiceWeb will be an improbability.
In reality, support for an entire constellation of
markup languages will be necessary to ensure practical
interoperability of voice applications among diverse
platforms. Just as uneven browser support for HTML,
ECMAScript, DHTML, DOM, etc., have been a problem for
Web development, so too will uneven VoiceXML platform
support for VoiceXML, SGML, SSML, CCML, etc., be for
voice development. The industry needs to unite in its
demand for adherence to standards. In fact, without
standards adherence, VoiceXML will ultimately be a
Gaps in the current VoiceXML 1.0 specification need
to be closed. However, since the real impact of VoiceXML
likely won't be felt until version 2.0 is implemented
and service providers feel more comfortable with the
technology, many of the version 1.0 gaps that are fixed
in 2.0 won't really be a significant issue.
Nevertheless, even with the 2.0 specification in place,
more work remains.
Call handling is often cited as being a significant
deficiency, although hopefully efforts like CCML will
remedy this. Asynchronous event handling, however,
remains as an example of an unclosed gap. As long as
such gaps exist, the temptation will be present for
diverse proprietary extensions to be made that hinder
How Will Future Multi-Modal Standards Impact
Completion of the first VoiceXML specification (DialogML
or VoiceXML 2.0) to receive technical approval from the W3C
Voice Browser Working Group is expected soon.
Technology, however, is never satisfied with the status
The W3C expects to soon form another working group to
specify a multi-modal dialog markup language supporting
both visual and verbal user interfaces. This work will
be essential as 3G wireless services develop, but
questions about how this proposed markup language will
fit with VoiceXML are pertinent. For instance, will this
new markup language essentially be a superset of
VoiceXML and, if so, might VoiceXML gradually be
replaced as the "lingua franca" of the VoiceWeb in the
What Does It All Mean?
It's clear that VoiceXML still faces significant growing
pains on its way to becoming a viable, widespread
solution. However, the appeal of the advantages of
VoiceXML is broad and deep. Our industry has waited for
the many benefits that VoiceXML now promises for a very
long time. We think that our wait has been well
James Harvey has been Call
Interactive's director of technical operations since
1998. He is responsible for managing the company's
software and systems architecture, and heading up the
Network and Systems Development group. Call Interactive,
a subsidiary of First Data Corp., is a leading provider
of automated customer care solutions and a member of the
To The September 2001 Table Of Contents ]