It has been a whirlwind two years for the organizations that have
worked diligently to standardize the methods of creating speech-enabled
applications. In early 2001, the World Wide Web Consortium (W3C) adopted
VoiceXML as a markup language for creating speech applications such as
self-service, voice portals, and voice-enabled dialing. Shortly thereafter,
the Voice Browser Working Group (VBWG) -- a W3C working group dedicated to
evolving VoiceXML -- began working on the VoiceXML 2.0 specification; they
released the first public working draft in October 2001.
As developers and the voice industry started adopting VoiceXML, many
found that the standard wasn't robust enough to meet their needs. While VoiceXML
has well-rounded dialog interaction capabilities, it lacks the call
control functionality that is critical in many enterprise applications,
such as those deployed in contact centers.
This discovery led several members of the VBWG to propose a separate
markup language that would define the specification for call control
capabilities. That standard is Call Control eXtensible Markup Language
(CCXML), and it picks up where VoiceXML leaves off in terms of call
control functionality. The VBWG released the first public working draft in
Unlike VoiceXML, which is designed primarily to serve voice dialogs
between the user and the computer, CCXML offers sophisticated
call-handling capabilities and complex telephony applications such as
multi-party conferencing and interactive voice response on multiple call
legs (dedicated communication paths between either two callers or a caller
and a speech-enabled application). CCXML provides unique benefits for
enterprises, service providers, and developers that are taking advantage of
the latest advancements in speech-enabled technology.
Extending The Capabilities Of VoiceXML
The current VoiceXML specification includes rudimentary call transfer
capabilities but has some limitations. For example, it does not have the
features required to control outbound call legs. VoiceXML allows an
application to transfer a caller to another person, however, once the call
is transferred, the application terminates. The caller does not have the
ability to loop back to the application; instead, they would have to call the
There are other limitations in the current VoiceXML specification:
- Applications can't control the outbound call leg once a call is
- The standard does not facilitate the initiation of a dialog session
or voice user interface on a transferred call leg.
- VoiceXML does not support conferencing capabilities where multiple
callers can talk to each other at the same time.
- VoiceXML cannot support the development of many essential contact
center applications such as whispering (providing the data a caller enters
to an agent prior to call transfer) or supervised transfer (in which
agents can put callers on hold and introduce them prior to
transferring to other agents).
CCXML is designed to overcome these limitations but does not replace
VoiceXML. Instead, the two standards complement each other to provide the
best elements of application development with VoiceXML, and essential call
control functionality with CCXML. Therefore, developers can write
applications that are usable, conversational, and meet enterprise contact
center needs. The combination of CCXML with VoiceXML makes speech
With CCXML, developers can enhance their voice applications with the
following call control capabilities (and more), which provide a personalized
application experience that will increase the adoption of speech-enabled
- Control multiple call legs: Developers can place multiple outbound
calls and control each outbound call leg independently. Each call may
optionally provide voice dialog interaction.
- Event handling: The ability to handle asynchronous events that come
from telephony infrastructure and the VoiceXML Interpreter.
- Start and stop an IVR session: Developers have complete control
over initiating and terminating IVR sessions that are executed in a
- Conditional logic: Similar to VoiceXML applications, CCXML
applications can use conditional elements to implement the business
- Web server interaction: Communicate with Web servers using HTTP to
With the advanced features of CCXML, developers can create applications
that involve complex telephony and dialog interaction. There are several
user scenarios where CCXML is a natural choice for developers, including
find-me/follow-me applications and multi-party conferencing (see table
applications allow users to be reached at one of several locations
by a single number. Users benefit from using a single number and
callers memorize several numbers.
||In this scenario,
contact center agents can transfer callers to other agents
multiple times. Applications can use the DTMF input collected from
either call leg (caller or recipient) to determine its origin and
handle calls appropriately.
||In a supervised
transfer, agents can put callers on hold and introduce them prior
to transferring the call to other agents. Such introductions
enable personalized customer support and improve the user
||Providing a message or
data to the recipient is generally referred to as a whisper
transfer. A variety of user information is provided to the agent
in order to facilitate personalized and quick customer
|Transfer and run a
||In this scenario, a
caller is transferred to a number and has a voice user interface
presented to him to provide a personalized experience before an
agent interacts with the caller. This scenario might be useful to
limit user inputs before a call is transferred to an appropriate
IVR application and/or an agent.
||CCXML is used to create
conferencing applications that allow multiple users to talk to
each other. These applications may also have a self-service
component that requires dialog interaction provided by VoiceXML.
As demonstrated in the variety of user scenarios enabled by CCXML, the
standard is poised to make speech applications enterprise-ready and
provide the much-needed impetus for customer adoption and growth of the voice Web. CCXML offers benefits to all stakeholders in the
industry, including enterprises, application developers, service
providers, and voice Web platform/ASP vendors.
CCXML enables developers of enhanced service platforms and ASP vendors
to expand the capabilities of their service creation and service delivery
environments and provide added value to their partners and customers.
Benefits To Enterprises
Enterprises benefit from CCXML in a number of ways. Standardization of
call control capabilities enables enterprises to use multiple vendors to
build a best-of-breed solution that meets their unique voice application
needs. Companies have more flexibility and fewer risks than they would
have with a single vendor strategy.
Additionally, deploying speech-enabled self-service applications in a
contact center setting decreases the number of calls to operators,
increases customer loyalty and improves the customer experience. Companies
can expand their contact centers without having to hire more staff, and
better manage their internal resources.
Enterprises can also make effective decisions on build versus buy strategies,
as a variety of multiple application vendors will compete for the
enterprise business with applications based on open standards.
Benefits To Service Providers
Wireline and wireless service providers can use the standards-based
service creation and delivery functionalities available in voice Web
application platforms to reduce operating costs associated with internal
Voice Web platforms based on standards like CCXML also expand service
providers' business opportunities in the voice application
hosting market. Carriers can provide value-added voice application hosting
services to enterprises that want the benefits of carrier-class call
control and routing functionality and service creation capabilities.
Enterprises also benefit because they maintain control of their corporate
Benefits To Application Developers
Consulting companies and systems integrators can generate revenues by
providing custom speech-enabled applications that deliver the benefits of
CCXML. Developers only have to write an application one time, and then run
it on all standards-compliant VoiceXML and CCXML interpreters. This helps
developers sell their applications to a larger number of customers using
different channels and with minimal changes, as standards-based
interpreters are embedded in a variety of software platforms, gateways, and
devices. Application developers can also differentiate themselves from
incumbent service providers who deliver expensive, legacy applications
that require significant capital expenditures to scale and operational
expenditures to maintain appropriate levels of service.
CCXML also decreases development costs associated with labor and
capital equipment by using free developer programs and programmers who are
well versed in Web technologies and XML.
CCXML And VoiceXML: Complementary, Not Competing
CCXML is designed to work together with VoiceXML to fill in the gaps
related to today's call control capabilities. While CCXML is interesting
for standards enthusiasts and developers, enterprises are the real
beneficiaries because they can finally look forward to deploying
speech-enabled voice applications that benefit their customers.
CCXML is at draft stage, and it will be several months before the W3C
adopts the specification. Still, the development of the specification and
its future adoption represent an important
milestone in the two-year journey of standards development for the voice
Srinivas Penumaka is a product manager with Telera,
a software company whose Voice Web Application Platform enables service
providers to deliver a new class of advanced, business-centric voice
applications to enterprise customers. For more information on
standardization efforts, please visit www.w3.org/voice.