×

SUBSCRIBE TO TMCnet
TMCnet - World's Largest Communications and Technology Community

CHANNEL BY TOPICS


QUICK LINKS




 

Special Feature
November/December 2001

Optimized Telephony Platforms In The Voice Portal Market

BY DOUG PETTY

Finding and maintaining a competitive advantage in todays service provider markets presents a host of daunting challenges. Price wars have proven a flop. Customers are more fickle and demanding. And the latest economic slowdown has made the task of finding new sources of revenue even more difficult than just one year ago.

Is there help on the way? Pose that question to industry insiders these days and chances are that many will point to the burgeoning voice portal market as a good way to reduce customer churn and increase cash flow. For those who might be skeptical, consider this: Market consultancy. The Kelsey Group forecasts that the U.S. market alone for voice portal services and technologies will exceed $12 billion by 2005. Ovum, a consultancy based in the UK, predicts a world market of $26 billion, while Allied Business Intelligence estimates 56 million mobile voice portal users by year-end 2005, 250,000 voice sites, and a $50 billion v-commerce market.

Voice portals connect users to applications through a telephone. They combine best-of-breed applications with the worlds most user-friendly interface speech recognition as well as text-to-speech technology and innovative hardware components. Most currently offer customers access to Internet content, stockbroker information, e-mails, and airline reservation systems. Many predict these services will soon be followed by v-commerce, ticketing, and enterprise applications like voice dialing and extranet access. The great hope for voice portals over the short term for service providers is as churn preventers. They may also provide a way to increase subscription and per-call fees, improve service usage, and help add subscribers through new customized services.

VOICE PORTAL PLANNING
Getting a share of this emerging market, however, requires careful planning and wise technology choices. Advancements in computer telephony technology will be critical to success within the rapidly evolving voice portal arena. Service providers as well as application developers will need to find optimized platforms that maximize voice portal performance by facilitating a balance among speech recognition, telephony, and computing resources. This will ensure each resource is used to its full capacity, while helping decrease speech recognition costs by multiplexing incoming audio channels across the back-end software modules. Next-generation architectures will also implement critical features such as echo cancellation on digital signal processors (DSPs), thereby enhancing performance by offloading all but automatic speech recognition (ASR) and text-to-speech (TTS) functionality from the host processor.

The need for this kind of balancing act can be partly explained by the limitations of todays host CPU architectures. Pentium and RISC processors have finally gotten fast enough to do the intensive digitization, probability calculations, and matching once done exclusively by more expensive DSP resource cards. But even dual Pentium 800 systems can run only a limited number of active simultaneous ASR or TTS channels. These systems generally can handle a maximum of only 100 incoming ASR or TTS channels per chassis.

The most appropriate way for service providers to improve per chassis scalability and cost effectiveness for current voice portal architectures is therefore by increasing DSP horsepower rather than line densities. Each telephony network card should be equipped with enough processing muscle and flexibility to perform both incoming and outgoing audio streaming on all ports simultaneously. The ability to dynamically configure processing capabilities on each resource card helps reduce hardware costs and overall system complexity.

Flexible, clean, and low-latency audio streaming capabilities lie at the core of any optimized voice portal platform. Service providers should look for systems providing small buffers, G.168 echo cancellation, and voice activity detection the ability to accurately and efficiently stream only relevant voice signals to speech recognition engines.

Buffer sizes play an important role in overall system performance. Large buffers will result in high latency and consequent noticeable delays in ASR or TTS response times. Small buffers can help do away with latency problems. But they also place an added burden on precious host processing cycles. Balance is therefore needed. Buffers in the six-millisecond (ms) range required for voice-over-IP (VoIP) applications should be considered excessively demanding for voice portals. Those over the 100-ms range, however, will not be able to stream audio quickly enough to speech recognizers. Optimized platforms at the present time will come equipped with buffers in the 100-ms range to ensure adequate performance without placing an undue strain on processor resources.

An ability to detect and forward only relevant voice signals to host-based speech recognition engines also improves voice portal performance while reducing the load placed on CPUs. Voice detection technologies are currently under development by several telephony and portal vendors. The most cost effective may turn out to be an energy-based detector embedded within play/record DSP applications that can be dynamically configured on a per channel basis according to the status of the recording application. Pre-speech buffers will be critical to the effectiveness of this technology. Thats because words at the beginning of a speech signal can be omitted if there is a split second delay in triggering the speech recognizer engine. Whats therefore needed is a mechanism to continuously buffer voice signals and forward this small segment at the front of incoming voice signals when triggered. At the start of a speech burst, at least 250 ms of speech preceding its detection should therefore be passed from a buffer to the host.

Optimized voice portal platforms will also come equipped with market leading G.168 echo cancellation. This is critical for cut-through and barge-in functionality. Echo in a telephony communications circuit occurs when a portion of the transmitted signal is reflected back through the circuit to its point of origin. This can interfere with signal detection. When a voice prompt is being played, for example, a portion of the prompt can be reflected back. This reflected signal might interfere with the signal the speech recognizer is attempting to identify, reducing system performance. Echo cancellers with a minimum tail length of 12 ms can help alleviate this problem through a series of predictive algorithms that subtract echo from the merged source signal.

Over the next few years, the voice technology behind todays leading portals could change both Web-based services and services that have little or nothing to do with the Internet. It could streamline almost every informational interaction conducted over the phone, from receiving directory assistance and making reservations to retrieving bank balances and wrestling with corporate help desks. Such innovative technologies will ultimately enable service providers to evolve voice portals into a form of "intelligent dialtone" offering ubiquitous access to any product or service to any location simply by lifting a telephone handset or turning on a mobile phone. By ensuring the deployment of optimized platforms today, service providers will ensure they are well positioned to assume a leading position in this new paradigm.

Doug Petty is vice president of technology at Pika Technologies, Inc., a leading developer of robust hardware and software platforms and components that allow developers to build scalable, next-generation voice applications. Pika is committed to delivering high levels of customer care and satisfaction. The company serves customers in North America, Europe, and Asia. For more information, please visit their Web site at www.pikatech.com.

[ Return To The November/December 2001 Table Of Contents ]


Hosted Speech Recognition For Customer Service

BY BRUCE POLLOCK AND JONATHAN MCINTOSH

Speech recognition technology is powerful when used in call centers to handle calls or portions of calls normally handled by operators. The value proposition for speech enabling a subset of calls traditionally handled by operators is strong; companies can save money, improve customer service, and increase revenues. Per-call savings range from 50 to 70 percent over live agents depending on a number of factors including length and type of call being speech enabled. Speech recognition can also help companies to increase their self-service throughput rate (e.g. in the IVR system prior to transferring the call to an agent) by reducing the number of zero outs that may be happening in the touch-tone system.

Speech can improve customer service too. First, it reduces hold times by handling simple inquiries instantly (and thus removing these calls from the queue); leaving operators to handle the more complex calls (or portions of calls) more quickly. This helps to cut abandonment rates and increase revenues where there is a financial transaction involved (e.g. in a catalog call center).

Second, it allows companies who currently operate on a business day basis (e.g. 7:30 am to 6 pm) to be accessible on a 24x7 basis for simple customer inquiries like bill balances, account/shipment status, etc. Third, speech makes it easier for mobile phone users and callers using phones with keypads in the handset to communicate with the company they are doing business with. Callers dont have to be professional contortionists to be able to complete their self-service transactions!

THE CASE FOR SPEECH ASPs
Designing, building, tuning, and managing an enterprise call center speech recognition application that meaningfully achieves cost reduction, revenue, and service enhancement objectives is a difficult task. Most companies dont have the in-house speech recognition expertise and financial resources to do so, which is why it makes sense to rely on a service provider (outsourcer) with experience in this specialized field.

Experienced speech service providers can get a hosted application scoped, designed, and operating in about a third of the time of the average on-premises project, so call centers can enjoy cost savings, revenue increases, and service improvements much faster than if they attempt an in-house initiative. If youre considering an in-house project, dont forget to include the cost of recruiting and training a speech team and the opportunity cost of tying up your internal IT group; chances are, youll save money by going the ASP route.

Another benefit of dealing with a speech ASP is that most of them charge on an affordable per minute or call completed basis, and dont charge large up-front fees that youd have to pay in the form of capital expenditures if you want to build an on-site speech system. When you add up hardware, software, integration, tuning, and other expenses involved in an in-house initiative, you may find that its more money and development time than you anticipated. Moreover, the number of speech ports youll need to handle your peak call loads is larger for an in-house speech system than it is if you deal with an ASP, because ASPs spread call loads from various customers, giving them higher port utilization.

The leading speech ASPs in the market today also have dialog design and human factors expertise, which are integral prerequisite skills required to build a speech recognition system that callers will love and use. Dialog design is the interaction between the caller and the speech recognition system. Its both an art and science and if you get this part of the application right, youre on your way to having a speech system that has meaningful cost reduction and service improvement impact.
When choosing a speech ASP, prospective customers should focus on the companys expertise in areas like dialog design (as mentioned above) and experience in developing and deploying production speech applications. Additionally, prospects should assess the ASPs experience in building robust, reliable host-interface systems and deploying computer telephony integration (CTI) for call centers because most if not all speech recognition applications require some type of operator support. Prospects should also ensure the speech ASP has the capacity and operational redundancy to handle peak call loads. Ideally, the ASP should be able to demonstrate how they manage these complexities today for their current customer base.

Finally, here are a few tips for speech ASPs and prospective speech clients to help their programs run smoothly and effectively:

  • First, establish a speech recognition project committee (to manage the project day to day), and a smaller call center agents advisory group for input on the initial application;
  • Second, ensure the speech ASP trains the call center agents on the functions, features, and benefits of the speech system. Armed with good information, agents can be allies and proponents of speech in the call center and help to generate more self-service inquiries; and
  • Third, create a feedback loop so that agents can forward caller feedback to the speech ASP so the speech application(s) can be continuously improved.

Bruce Pollock handles speech recognition services, and Jonathan McIntosh is senior vice president at West Interactive. West is a leading provider of outsourced customer contact solutions that help Fortune 1000 companies acquire, retain, and grow profitable customer relationships. Wests customized solutions incorporate integrated speech recognition, IVR, Internet, and live outsourced operator services. For information, visit their Web site at www.west.com.

[ Return To The November/December 2001 Table Of Contents ]


Eight Things To Consider For Your Voice Portal

BY DAVID FRIEND

The ability of computers to understand human speech has been a goal of computer scientists for over 40 years. In the late 60s, I worked on a project for the U.S. Postal Service that was intended to allow mail sorters to speak zip codes into a computer and have an envelope correctly routed. Unfortunately, even the best computers of that era couldnt even manage a vocabulary of 10 spoken words. 

Today, inexpensive PCs armed with advanced speech recognition software can identify thousands of words in dozens of languages with sufficient accuracy for a great number of uses. The one billion phones are about to be transformed into the worlds most ubiquitous I/O device. Here are eight things youll need to consider when deploying speech applications for your company:

  1. Build vs. Buy
    Speech apps today are still mostly custom-designed and run on proprietary telephony hardware platforms that grew out of the voice mail and interactive voice response (IVR) industry. In the near future, well see a shift to running suites of off-the-shelf speech applications on commodity servers and IP gear. Applications will no longer be tied to a specific hardware platform but will simply be delivered as a CD-ROM. Great new GUI tools are already making the job of building and modifying applications much simpler, reducing development time by 7080 percent over writing code. 

  2. Text-to-Speech Engines
    Text-to-speech (TTS) engines let computers talk. The main thing to worry about with TTS is how natural it sounds and what languages it can speak. Unless youre assembling your own speech application platform from scratch, however, your platform vendor will probably make its own decision on which TTS engine to use.

  3. Automated Speech Recognition
    Again, the automatic speech recognition (ASR) engine is just one component of a speech application. With the exception of language support, the three leaders, Nuance, Philips Speech Processing, and SpeechWorks, are similar and Id recommend letting the platform vendor choose. Differences are subtle and subjective, so Im not sure you should worry about it unless youre building your own platform (not recommended!)

  4. Development Environment
    The development environment is something you should definitely worry about. The best GUI development tools can cut 70 percent or more off the time it takes to develop or modify an application. They also insulate your developers from some of the more arcane low-level details of telephony programming. Programmers like to write code, but a mission-critical speech application written in C++ is going to be a nightmare to maintain. Pick the best tool and learn how to use it. Take a cue from the Web world where Web design tools rule. 

  5. Run-time Platform
    Run-time Platform refers to the physical equipment that the phone lines plug into. Most of the platforms on the market today are based on specialized telephony switches and DSPs, all developed for traditional TDM-style telephony. But the world is moving inexorably to IP telephony and new platforms are emerging that process speech with standard Pentium-powered computers and IP switches. Costs are much lower, scalability is much greater, architecture is more open, and development tools more robust. The new generation of voice services platforms straddles the phone network and the Internet, and hence these devices are ready to support applications that marry the phone and the Web.

  6. VoiceXML
    VoiceXML is a programming language that many vendors of automated speech recognition (ASR) engines are promoting for the development of voice applications, particularly those applications that use speech recognition. Most voice applications, however, require features and call flows that go way beyond what VoiceXML can do today and nearly everyone in the industry uses tools or other languages in addition to VoiceXML. If you are planning to develop your own speech recognition applications, you should understand your vendors position with respect to supporting VoiceXML as it will probably be important a couple of years from now. 

  7. Applications
    Most companies need to develop their own specific applications, just as they need to develop their own Web sites. However, there are certain horizontal applications that nearly everyone seems to want, such as having e-mail read to them over the phone, access to calendars, task lists, contact lists, and so forth. There is no more point in trying to build one of these applications yourself than there would be in developing your own word processor. If you can buy and modify it, thats the way to go. 

  8. User Interface
    Designing a good user interface for a telephony application is not as simple as it would seem. Remember those horrible, cluttered, early attempts at Web screen design? Well, now we have horrible, frustrating telephony interfaces and the skills for designing good ones are still scarce. At a conference recently, I showed an example of a speech interface I stumbled across that made me answer six questions to locate a restaurant when two would have gotten me to the same place in half the time. Good development tools will encourage good design practices, and modifying a well-done commercial application will produce better results than starting from scratch. 

CONCLUSION
Just as most of us have grown to prefer ATMs over human tellers, speech applications will increasingly replace humans. Organizations and end users will both benefit with increased satisfaction levels and a surge in efficiency. As a result, you wont have to endure that annoying "Your call is important to us" message that has been plaguing enterprises and consumers for years.

David Friend is chairman and CEO of Sonexis, Inc. Sonexis is a global provider of voice solutions for enterprise customers and service providers, enabling anytime, anywhere transactions and information access. Its products and services reduce the cost, time-to-market, and complexity of developing voice solutions. Sonexis voice solutions focus on consumer and enterprise voice portals, voice commerce, enhanced self-service, and voice-enabled CRM. Contact Sonexis at www.sonexis.com.

[ Return To The November/December 2001 Table Of Contents ]







Technology Marketing Corporation

2 Trap Falls Road Suite 106, Shelton, CT 06484 USA
Ph: +1-203-852-6800, 800-243-6002

General comments: [email protected].
Comments about this site: [email protected].

STAY CURRENT YOUR WAY

© 2024 Technology Marketing Corporation. All rights reserved | Privacy Policy