TMCnet - World's Largest Communications and Technology Community




Speech Technology: What’s The Word Of Tomorrow?

By Brian Garr, IBM Software Group


Speech Technology – Evolution Or Revolution?
IBM is no stranger to revolutions — we invented the PC. The PC caused a revolution in the business world. It dramatically changed the processes that businesses used to create value in the marketplace. Does speech technology have this same ability?

Humans, despite all the visual and auditory cues we get from looking at a speaker, still have a word error rate of about three to five percent, which means that the task can be monumental for an un-seeing, un-hearing, un-thinking computer to understand speech. IBM, after 35 years of work and some 250 patents contributed by really smart people at IBM Research, concluded that a computer should be capable of recognizing human speech, and we are highly successful at it, to the point where we are deploying speech in cars, handheld devices and in the contact center. But will the pervasiveness of speech cause a revolution in the IT industry, or will it drive the evolution of IT and contact centers in terms of improving their ability to retain customers, bring down costs and create new streams of revenue? Will speech revolutionize the way we humans interact with computers?

The course that speech is taking in our world is a gradual, evolutionary process of change into a different, more complex and better form. Speech recognition started off as its own separate and unique pillar. People first used it to dictate (as long as they used the proper pauses between words). The technology then evolved to allow natural speech (no pauses required).

Natural speech improved some lives and businesses. You can speak faster than you can type, but most people were not willing to put the time and effort into training and maintaining the system. Then we started putting speech into things, rather than desktop computers, and we started making peoples’ lives even better. The more places we put speech and find it is an advantage, the more the “silos” break down and speech becomes pervasive, as well as easier to design, deploy and use.

Making Sense Of Your Options
Speech applications can be broken down into two general areas. Many companies demand directed dialog applications in industries such as financial services and banking, where customers want to obtain their bank balance, locate a nearby ATM or transfer funds. The financial sector has always been the leader in the use of speech technologies. Other industries with strong demand for these kinds of inquiry/transaction interactions include insurance, telecommunications, utilities, government, travel and consumer packaged goods.

The second most common application in the market now is intelligent call routing using natural language understanding. Rather than listening to a litany of possible menu routes, the caller hears “How may I help you?” and the response utterance is evaluated by the system based on the possible set of allowable actions, and the call is routed appropriately. This kind of application cuts down on the amount of different phone numbers required for different functions, and leads to greater customer retention and satisfaction, as callers are directed to the right place immediately. A call routing application may, for example, connect you to a loan specialist or send you to a directed dialog system to obtain your loan balance.

Speech verification is in its very early adoption phase. Lots of businesses, particularly in the financial services sector, are trying it out. It has huge potential to reduce fraud and identity theft. But there is still very little in the way of government regulation to dictate when speech verification is legal proof of action, such as when the government declared that a facsimile is a legal copy. There are many flavors of speech verification out there, and the applications and user interfaces are still being refined.

Continuing The Evolution Of Speech And Your Business
We are just now beginning to see how natural language self-service can create quality benefits for companies across all industries. These applications are pleasant to interact with, and they enhance the customer experience while creating an on-demand environment for customer service. Their value will expand as we develop new tools to reduce the time-to-market for advanced natural language applications.

We have seen also great advances in the past five years in the quality of text-to-speech (TTS), and we will continue to see that technology become more natural and easy to listen to, which will inevitably lead to more applications that use text-to-speech rather than recorded prompts. Finally, one area that is virtually untapped is speaker-independent transcription over the phone, which would allow business people to dictate e-mail messages or add notes to their customer databases using the phone. In fact, we have a project in IBM Research which is working to make this a reality in the next five years.

The Value Of A Service-Oriented Architecture In The Contact Center
Let’s start with a generally accepted truth; businesses want to be “on-demand businesses.” They want to be able to respond to new threats and opportunities quickly. They want to be able to integrate end-to-end solutions that leverage all the pieces of their infrastructure to reduce the costs of transactions. And they want to integrate across industry value nets, with their partners, suppliers and customers.

A service-oriented architecture (SOA) is the key to achieving “on-demand” results. SOAs provide companies with the flexibility to treat elements of business processes and the underlying IT infrastructure as secure, standardized components (services) that can be rapidly reused and combined to address changing business priorities. These services are used to help get the right information to the right people at the right time.

The contact center needs to be one of the doorways to the SOA. The financial services sector is today leading the way in developing SOAs. The keys to retaining customers, reducing costs and creating new cross-sell opportunities and new revenue streams reside in the ability of the contact center to quickly make self-service available through multiple channels. In a hypothetical banking case, an SOA at Bank XYZ might have the following standardized components that one can call upon as needed:

• Customer authentication;

• Account balance;

• Loan application;

• Risk management;

• Credit check;

• Credit processing; and

• Loan fulfillment.

All of these services can stand on their own, but when they are easily accessible through open standards they become powerful tools for meeting contact center goals.

Speech technology is transforming the contact center from a “siloed” cost center to an integral part of a company’s business strategy. Not only have open standards made the contact center one of the doorways to the implementation of on-demand services, but these open standards have enabled horizontal integration of the contact center into the mainstream IT infrastructure. Capabilities such as verification take interacting via speech to a whole new plane, doing for the telephone what fingerprint technology has done for laptops, thereby opening up the universe of speech transactions even further. With these advances in conversational self-service, contact centers can transform into profit centers as they become a place to satisfy, retain and upsell current customers, as well as a place to service new customers.

In order for contact centers to grab the attention of the CEO and become more than an expense center, they need to embrace open standards and the creation of reusable business components. IBM is working with customers at all levels of SOA adoption, starting with Web services, contact center transformation, enterprisewide IT transformation and on-demand business transformation. SOAs don’t just happen — they require planning and skill to build, deploy and use in a managed and secure environment. SOAs require a strong middleware presence. The contact center takes advantage of that middleware presence to create profits, boost customer retention and lower costs.

Open Standards Create A Climate For Choice In The Contact Center
Across virtually all industries, companies with contact centers are transforming their customer service operations to reduce costs, increase customer satisfaction, grow revenue and attain competitive advantage. Fundamental to this transformation is the creation, adoption and promulgation of open standards.

We have seen this transformation occur repeatedly in technology cycles. Without open standards, the World Wide Web would be a hodgepodge of markup languages, all requiring a unique browser with unique extensions and capabilities. But because of open standards, the Web has flourished. Businesses people can choose from about four or five popular browsers, and be quite certain that most of the Web sites they visit will work just fine. The results are that companies can use the Web as a channel for sales and customer self-service at a dramatically reduced price to what it cost in the late 1990s.

That same phenomenon is occurring now in contact centers. The legacy paradigm is that everything runs on the IVR platform. The business logic is written in some proprietary language that is unique to each vendor’s IVR. Redundancy and failover is difficult and expensive since everything runs on the same machine. Moving to a new IVR platform means a total rewrite of the applications in a new and proprietary language.

Speech entered the picture through proprietary APIs that were different for each IVR vendor, so the availability of choice in speech vendors was conditional on the cooperation between the speech vendor and the IVR vendor.

Today we have made a complete shift because of open standards. With the advent and acceptance of VoiceXML, the IVR primarily takes on the job of answering the phone and passing off the calls to an application server that sends a VoiceXML page to the IVR’s voice browser, where it gets rendered. What this means is the business logic is separated from the IVR function of answering and transferring phone calls, so we can now monitor and launch Web applications and speech applications from the same application server. What once used to be two silos of technology is now one horizontal, integrated infrastructure. But VoiceXML is just one of the open standards that provides the customer with portability. SRGS and Speech Synthesis Markup Language are standards for grammar formats and text-to-speech tags which make not only the applications portable, but the associated grammars as well. VoiceXML was first submitted to the W3C on May 22, 2000, and the adoption rates have escalated rapidly.

According to the VoiceXML Forum (www.voicexml.org/faqs.html), there are thousands of VoiceXML applications running on platforms from nearly 100 different vendors. Not only has the standard taken off, but it has spawned a whole new category of tools from vendors who specialize in application builders that generate VoiceXML.

Now that we have a well adopted open standard that can let our applications be portable and allow us choice in the IVR vendor, how do we get choice in our speech vendor? The answer lies in a new Internet Engineering Task Force (IETF) standard called the Media Resource Control Protocol (MRCP). While the MRCP spec is only two years old, already all three of the major speech vendors claim support for it.

With MRCP, the proprietary connectors between the IVR and the speech vendor are gone. Open standards once again create a climate for choice.

Call Control eXtensible Markup Language (CCXML) is a new proposed standard before the W3C that will standardize the call control functions of an IVR, such as “answer a call,” “hang up,” or “transfer a call.”

As we adopt more open standards in the contact center space, the proprietary nature of IVRs will disappear and prices will go down. The contact center becomes horizontally integrated with the rest of the IT shop, and economies of size take shape and drive down the total cost of customer care while improving the customer experience. There are still a lot of legacy, proprietary IVR systems out there, left over from the Y2K buying binge, but as businesses see the need to improve customer care and create new channels of revenue, they will see the business justification for moving to open standards-based systems that provide reduced cost per touch, portability and protection of their investment, and a merged relationship with their other channels of contact center customer communications.

Challenges Moving Forward
The biggest challenge moving forward is the one thing over which we have no control: time. It will take time for the call center to converge with the contact center. It will take time for the development of new methodologies to drastically reduce the cost of implementing speech self-service applications. It will take time for companies to become on-demand with service-oriented architectures. We have already seen some consolidation in this industry, and you can be sure there is more to come. The investment required in research and development for speech is very high, and the real challenge is about who has the staying power to get past the “curve of enlightenment” to mass adaptation. CIS

Brian Garr is Program Director and Segment Manager for Contact Center Solutions in the Software Group of IBM. He has been with IBM for six years, and is an evangelist and speaker on machine translation, text-to-speech and speech recognition.

If you are interested in purchasing reprints of this article (in either print or HTML format), please visit Reprint Management Services online at www.reprintbuyer.com or contact a representative via e-mail at [email protected] or by phone at 800-290-5460.

For information and subscriptions, visit www.TMCnet.com or call 203-852-6800.

[Return To The Table Of Contents ]

Upcoming Events
ITEXPO West 2012
October 2- 5, 2012
The Austin Convention Center
Austin, Texas
The World's Premier Managed Services and Cloud Computing Event
Click for Dates and Locations
Mobility Tech Conference & Expo
October 3- 5, 2012
The Austin Convention Center
Austin, Texas
Cloud Communications Summit
October 3- 5, 2012
The Austin Convention Center
Austin, Texas

Subscribe FREE to all of TMC's monthly magazines. Click here now.