TMCnet News

IBM And Speech Technology: An Interview With Bruce Morse
[June 15, 2005]

IBM And Speech Technology: An Interview With Bruce Morse

Where does IBM stand in the realm of speech technologies? CIS spoke with Bruce Morse, vice president of Contact Center Solutions for the IBM Software Group.

By Tracey Schelmetic, Editorial Director, CUSTOMER [email protected] Solutions

To give readers a more comprehensive picture of where IBM stands today in the realm of speech technologies, Customer Interaction Solutions recently spoke with Bruce Morse, vice president of Contact Center Solutions for the IBM Software Group. Morse is responsible for establishing IBM as a major software provider for developing, deploying and managing contact center solutions. He has over 25 years’ of software and hardware experience in the information processing industry and has held executive positions in marketing, development, finance and business development. Prior to his current role, he was vice president, marketing, sales and business development for IBM’s Pervasive Computing business. In that role, he built a number of strategic alliances that established industry software specifications and standards, and he significantly expanded IBM’s software and services participation in the wireless/mobile and speech markets.

CIS: Historically, from where did IBM’s speech technologies grow in IBM’s product family?

Morse: IBM’s interest and investment in speech recognition began at IBM Research over 30 years ago. We anticipated that as the technology matured it would become the preferred method of accessing and interacting with information technology in a wide variety of scenarios. We’re now at an inflection point in speech recognition where users find it to be a satisfactory and pleasant way to do personal and company business.

IBM was the first to use a purely statistical approach to voice technology while others attempted to teach a computer how to mimic human linguistics. The early 1990s featured IBM dictation software. A few years later, IBM’s first speech recognition software family, VoiceType, was produced. IBM ViaVoice products were introduced in the late 1990s, and they continue to evolve today in offerings such as IBM Embedded ViaVoice, which speech-enables personal digital assistants (PDAs) and in-vehicle telematics.

IBM speech technologies are now an integral part of the WebSphere family of products. They leverage WebSphere process and application integration capabilities to model, simulate and optimize business processes, and to reliably and seamlessly exchange data between multiple applications.

As a technology company that has helped millions of customers make smart IT investments, IBM is uniquely positioned to help companies extend access to those systems to their employees, customers and business partners. Just as the personal computer and Web browser have opened up application access to millions of users, speech technology extends access to the two billion telephones in the world today, as well as to all kinds of mobile devices.

As the most natural way to interact, speech is at the beginning of a tidal wave in contact centers, devices and automobiles. Speech allows people to interact easily and cost-effectively; it improves customer service and lowers cost. The return on investment (ROI) for speech-enabled applications can be dramatic.

CIS: Why do you believe that speech is best delivered in an on-demand model?

Morse: In today’s business environment, companies have to be flexible, responsive and able to take advantage of opportunities instantly. That is the essence of the on-demand model. As a primary interface to a company’s customers, speech-enabled applications are at the forefront of the on-demand model. Contact centers worldwide are increasingly looking at integrating all methods of customer interaction, including Web and telephone, to ensure a consistent customer experience, reduce cost and drive revenue growth through cross-selling and upselling. Speech-enabled contact centers ensure that up-to-the-minute customer information is available and leveraged across multiple communications channels. For example, a retail bank may want to know when a customer calls requesting forms to apply for a home equity loan so [the bank] can immediately route [the customer] to a live agent, bypassing the speech application entirely in order to close the business quickly. When interest rates change, the bank may want to change its Web and speech-enabled applications immediately to cross-sell certain offerings over others. IBM provides highly flexible and customizable speech solutions built on the highly acclaimed WebSphere Application Server platform.

CIS: Why do you believe a company like IBM is better suited to offer speech than its many niche competitors?

Morse: Speech has evolved into a mature enabling technology that reaches far beyond turning spoken words into text. Speech extends access and interaction to an enterprise’s data and business processes, improving customer service while reducing the total cost of completing a transaction. Integrating speech access to business processes in a cost-effective, flexible and secure way requires a deep understanding of the enterprise’s IT infrastructure and business processes. IBM’s position as the leading middleware provider and our expertise in business process transformation uniquely position us to help our clients leverage speech to improve customer service, reduce cost and drive incremental revenue.

IBM is recognized around the world as one of the pioneers in speech research and development, with deep expertise to analyze, design and deploy speech-enabled applications. IBM’s research organization has over 30 years’ of experience in speech. It is highly skilled in voice user interface design, persona development and grammar, has more than 250 speech patents and over 100 researchers worldwide in speech labs, including China, Haifa, Tokyo, India and Almaden, working in more than 15 languages. Our work ranges from contact centers to mobile devices to automobiles. IBM is a leader in driving and incorporating speech standards such as VoiceXML, MRCP and W3C. We work with companies of all sizes. IBM was the first to deploy natural language understanding in an automated contact center. For two consecutive years, JD Power and Associates surveys rating customer satisfaction with in-car navigation systems found the top cars were from Honda and Acura, which use IBM’s Embedded ViaVoice speech recognition technology. Our contact center customers have found our speech solutions improve call retention rates by six to 10 percent, cutting call times by 10 percent and decreasing costs by up to 90 percent compared to assisted services.

IBM is also helping the large community of developers, ISVs and customers deploy and manage speech enablement. We have made significant contributions to the speech industry, through open standards work on VoiceXML, CCXML and MRCP, as well as to the Eclipse Foundation, including our recently announced contributions of VoiceXML and CCXML editors. In addition, we recently announced our contribution to the Apache Foundation of the Reusable Dialog Components (RDC) Framework. A barrier to the adoption of speech capabilities has been the skills required for high quality voice user interfaces. By moving the requirement into the building of RDCs that can be joined by application developers, IBM enables experienced application developers to concentrate on what they know best, while skilled voice user interface designers do their work up front, in the RDC.

IBM regularly participates in performance improvements and transformation efforts for the world’s leading organizations through our management consulting group, IBM Business Consulting Services. Our ongoing involvement with all of the major industries gives us a deep understanding of industry business models. Our teams ensure that our solutions are relevant, practical and well thought out.

CIS: On what applications for speech is IBM focusing?

Morse: IBM is focused on developing and offering first-class speech capabilities and tools, while our business partners and customers provide targeted speech-enabled applications. We are focused on three primary areas:

• Contact center functionality, such as call routing and natural language understanding.

• Multimodal interaction, or the ability to combine multiple input/output methods in the same interaction or session. IBM’s WebSphere software integrates different modes of data entry — speech, keyboard strokes, visual and handwriting-recognition technology. For example, one of our customers developed speech, keyboard and handwriting-enabled input and output applications on handhelds used by doctors in the pediatric intensive care unit of Miami Children’s Hospital. Healthcare providers can give spoken commands to access and input patient information and can enter repetitive data using multiple modes of interaction.

• Embedded speech in telematics (e.g., vehicles), devices (e.g., cell phones, PDAs, etc.) and other consumer appliances (e.g., set top boxes, DVD players). For example, IBM Embedded ViaVoice technology in OnStar provides, on some models, the basis for a hands-free, in-vehicle, safety, security and communication service, putting the company at the forefront of the automotive telematics industry.

CIS: Speech has historically been considered a “high-maintenance” technology. How is IBM carrying out its promise to lower development time and complexity?

Morse: There are two million to three million J2EE developers in the marketplace, and our tooling and open source strategy has been to enable this highly skilled group to expand its reach into speech enablement. By creating plug-ins to the Eclipse framework, we help developers leverage their existing skills in Web development to extend to speech. We are contributing to the speech industry’s efforts in order to shorten development time and decrease complexity through our commitment to open standards such as VoiceXML, CCXML, MRCP, xHTML and X+V. In addition, we have donated approximately 20 VoiceXML Reusable Dialog Components (RDCs) to the open-source community through IBM’s Alphaworks.

As more and more of the speech ecosystem adopts and writes to the RDC framework, the time and the skills needed to deploy will come down considerably. By moving the voice user interface (VUI) skill from the application layer to the RDC layer, we leverage the skills up front that are most in demand, which allows the J2EE developer to take advantage of the best practices already deployed internally in the RDC. IBM donated the framework and example tags to the Apache Software Foundation last fall, and we made them available to interested members of the community through the Apache Taglibs sandbox project. The financial value of this contribution was approximately $10 million.

CIS: Many companies still don’t understand why they need speech, or if they do, they don’t understand what’s involved in implementing it. How is IBM helping customers to understand the benefits?

Morse: We have worked with a variety of clients to successfully implement speech solutions. The best way to communicate the benefits of these solutions, and what’s involved in implementing them, is to use case studies and to describe the dramatic return on investment that many companies achieve once the solutions are deployed. We share these stories on our Web site, in our press releases and in our advertising. We publish technical papers that describe the implementation effort. Most important, our worldwide sales, services and consulting teams show customers the benefits of speech in hundreds of one-on-one customer engagements every year, as well as at many industry trade shows and events.

CIS: What level of knowledge must a user possess in order to administer and make changes to call flows?

Morse: First, the adoption of the VoiceXML standard has changed the way we administer contact center applications. We have moved the business logic away from the proprietary interactive voice response (IVR) scripting language to the Web application server. This has been a game-changing event, as the administration and development of speech-enabled applications moves to the millions of J2EE developers, therefore opening up the ability to manage call flows to a much larger community of developers.

Second, IBM has Eclipse-based plug-ins, such as the Call Flow Builder. It allows for graphical drag-and-drop modifications to call flows, making call flow maintenance an intuitive administrative step that does not require the knowledge of a proprietary scripting language.

The implementation of the call flows is also greatly simplified with the advent of VoiceXML. Preexisting scripts used for a particular task can be reused by the speech application, so there is no need to redevelop scripts for existing tasks.

CIS: What's the average implementation time, using a midsized company as an example?

Morse: The length of a speech implementation project is dependent on many factors. It should be broken up into several distinct phases, which include: business and application objectives; usability and human factors; business process integration; call flow design; development; testing; and deployment and post-deployment tuning. The final area that can impact the schedule is the level of training the customer has (which is why most of our initial deployments are done in conjunction with a very skilled systems integrator). Assuming all of these phases are included, implementation of a simple speech-enabled application can range from one to six months. A project of medium complexity can take three to nine months, and a complex application takes six months to one year.

Using standards-based programming techniques such as VoiceXML, the development, testing and deployment elements can be done more efficiently by reusing applications and application components that the enterprise has already developed and deployed, thus reducing implementation time and ensuring a rapid return on investment.

CIS: Is speech technology feasible for smaller companies?

Morse: Speech is a technology that can offset contact center costs, which makes it a very good source of bottom-line return for small companies. Implemented correctly, it can also improve customer satisfaction and generate revenue through upselling and cross-selling. It allows a small company to establish a unique persona and to gain differentiation in the marketplace.

There are many IBM business partners that offer tailored speech-enabled application solutions to small to medium-sized businesses. Although some small companies have the in-house expertise to deploy speech in their own environment, others may find it more cost-effective to outsource the speech elements, and a number of solutions are now available for them to do so.


Tracey Schelmetic is editorial director for CUSTOMER [email protected] Solutions. For more articles by Tracey Schelmetic, please visit:

[ Back To's Homepage ]