×

SUBSCRIBE TO TMCnet
TMCnet - World's Largest Communications and Technology Community

CHANNEL BY TOPICS


QUICK LINKS




 

November 1997


To: CTI Readers

Subject: What If The Enterprise Landed On The Tower Of Babel?

BY Tom Keating


It’s all too easy to imagine a computer that can recognize and translate human speech. We see it on television all the time. Take Star Trek, for example. Nearly every episode of that program depends on the “universal translator.” The galaxy may be a veritable Babel, what with all those aliens threatening each other in strange tongues, but everyone is confident that the universal translator will make sure every statement is heard the way the speaker intended. Never, no matter how strange the alien creature, is a pronouncement such as, “I will accept your gracious invitation and all your thoughtful suggestions” misinterpreted as, say, “I will obliterate your ridiculous planet and all its noxious inhabitants.”

Confronted with the science fiction vision of speech recognition and translation, people with technical knowledge are almost embarrassed to admit the true state of today’s technology, especially if there are Trekkies around. Which is too bad. Although having that e-mail about your next appointment read to you over the phone isn’t the stuff of high adventure, it is useful, and far from trivial as a technical accomplishment. Further, today’s technology keeps improving. How long will it be before technological fact catches up with science fiction?]

PRIMITIVE BEGINNINGS
I remember when having your computer translate text-to-speech was a novelty. I couldn’t wait to add text-tospeech capability to my first real computer, a Tandy TRS-80 Color Computer (COCO), by installing RealTalker, a cartridge with a special chip inside. RealTalker plugged right into the side of the computer.

I invited my friends over and showed them how I could make my computer talk. They were, to my disappointment, unimpressed. But I knew my audience, and I instructed the computer to say things that would land any Star Trek crewman in the brig. Then, when COCO started swearing a blue streak, eyes widened and jaws dropped. My friends finally agreed that computers were cool after all. They all wanted a computer just like mine. (If only marketing text-tospeech were so easy today!)

I have to admit, though, the voice wasn’t all that great. The computer’s voice was just like the one used by the computer in the movie War Games. Basically, the computer sounded like a computer, not a human. The technology for making a computer sound like a human just wasn’t there yet. If text-to-speech was primitive 15 years ago, speech recognition was even worse. As a matter of fact, getting a computer to recognize the human voice is much more complex than getting a computer to perform text-tospeech. Yet, the complexity of the task hasn’t daunted researchers. The Advanced Research Projects Agency (ARPA) has supported research at several institutions, including the Massachusetts Institute of Technology (MIT), to foster the development of speech recognition.

Why is speech recognition so important that the government would see fit to give it a helping hand? It could be come the ultimate human/computer interface. No more clunky keyboard or “carpal tunnel syndrome inducing” mouse! No more telephone keypad! Just tell the computer what you want, and the computer responds. One day, we may, like Star Trek’s Scotty, be amazed that anyone would communicate with a computer any other way. (In one of the Star Trek feature films, Scotty traveled back in time, and was enjoined to use a mouse. So, he picked up the mouse and spoke into it, announcing, “Computer: I want you to…”)

PRACTICAL ADVANTAGES OF SPEECH RECOGNITION
Speech recognition is an interesting niche in the CTI industry. Everyone has been predicting its explosive growth, particularly in the IVR industry. Several IVR systems give you the convenience of being able to speak digits to traverse the IVR’s menu tree rather than using DTMF digits. Although DTMF has been the traditional interface for telephone users, this interface has several drawbacks. Using DTMF is time-consuming and often frustrating. Resorting to speech, on the other hand, is natural and much more powerful. In addition, many would-be DTMF users have rotary phones. Outside the U.S., the percentage of people using rotary phones is even higher. Here’s another example of how speech recognition can be advantageous. How many times have you called someone without knowing what that person’s extension was? You could, of course, go to the company directory and enter the person’s name. But this has three disadvantages. First, you may misspell the person’s name. Second, it takes a long time to key in a person’s name. Third, you will run up your phone bill keying in people’s names (or, in the case of 800 numbers, run up the phone bill of the person you’re calling).

Does the technology exist today for implementing speech recognition into an IVR/auto-attendant system? You bet! I’ve seen at least one company (Dialogic, in their sales department) implement speech recognition into their Interactive Voice Response system. So, it’s only a matter of time before other companies see the benefits of implementing speech recognition into their phone systems.

The latest continuous speech recognition systems allow callers to exchange much more information using complete phrases and sentences in a single response. Today, callers need not restrict themselves to simple, short phrases or numeric responses. Instead, they can give detailed verbal instructions to the system. For instance, if you are traveling and wish to remotely retrieve e-mail with a particular date, you can just say, “Retrieve e-mail from November 9, 1997.” Another example might be, “Read all voice mail from John Smith.” Accessing features in this way was impossible, or at the least very difficult, through the traditional DTMF interface. Thus, the latest generation of automatic speech recognition technology is helping today’s highly mobile business professional conduct business while traveling.

VENDOR COMMITMENT TO SPEECH RECOGNITION
Several vendors are active in speech recognition. Once such company, Dragon Systems, is discussed in this issue. (See our review of Dragon Systems’ NaturallySpeaking) In addition, several vendors who produce application generators have integrated speech recognition technology into their software. For instance, Artisoft’s Visual Voice application generator has integrated text-to-speech as well as speech recognition capabilities.

Brooktrout has entered the speech recognition arena by taking advantage of technology from Voice Control Systems. Brooktrout has used this technology to add a module to their Show N Tel product.

Speech Solutions (a subsidiary of Global Intellicom, Inc.) has developed speech recognition ActiveX custom controls. These ActiveX controls allow programmers to add speech recognition capabilities to applications simply by dropping these controls into their applications which support ActiveX, such as Visual Basic. (For more information, check out their We b s i t e a t www.speechsolutions.com.) These are just a few examples of how speech recognition is being embraced by software vendors.

What about the hardware side? Every voice processing board manufacturer has been scrambling to add speech recognition to its product line or to partner with a leading speech recognition vendor. For instance, Dialogic has their Antares line for speech recognition.

Natural MicroSystems, like Brooktrout, uses technology from Voice Control Systems. NMS’s recently released NaturalRecognition 2.0 leverages the speech recognition technology from Voice Control Systems to provide speaker-independent and speaker-dependent speech recognition capabilities. NMS’s NaturalRecognition supports 16 channels of speech recognition in a single PC slot that also provides fax and call processing capabilities.

THE MICROSOFT ANGLE
One other interesting tidbit in the speech recognition industry is that Microsoft recently put a $45 million stake in Belgian speech technology firm Lernout & Hauspie, which analysts say is about 5 to 7 percent of L&H’s capital. Microsoft’s entry into the speech recognition field certainly bodes well for this industry and only reinforces my belief that speech recognition will become increasingly important. Rumor also has it that Microsoft wishes to embed speech recognition technology into the operating system. This is very interesting. How this will affect other speech recognition vendors remains to be seen.

However, in my opinion, hardware-based speech recognition doesn’t have much to worry about. The reason is that hardware-based speech recognition uses DSP technology which allows for much more scalability. Without hardware, you are relying on the computer processor for speech recognition, which as you probably already know “chews up” a lot of CPU cycles. Therefore, I firmly believe that having L&H’s technology embedded into Windows will be targeted toward the low end of speech recognition.

I should note, however, that with the increasing power of computer processors, this “low end” scenario may not hold true forever. Still, in my opinion, DSPs offer more power per price-point than a computer processor, and thus there will still be a need for hardware-based speech recognition using DSP technology for higherend applications.

Having speech recognition embedded into the next Microsoft operating system will most likely affect soft-warebased speech recognition ven-dors at first. Thus, companies such as Dragon Systems and IBM would be wise to keep their eyes open. IBM has a software-based speech recognition product called ViaVoice. This product is similar to Dragon System’s NaturallySpeaking in that ViaVoice is a continuous speech recognition software product. I believe that L&H’s technology will first be used for entering voice commands into Windows rather than (or in addition to) using the mouse, or for dictation into Microsoft Office applications. We’ll find out in 1998 when the newest version of Windows is released!

UNDERLYING DRIVERS
Several forces are driving speech recognition, making it a practical and useful technology. First, the algorithms behind speech recognition are getting better. Second, the technology continues to improve and become more efficient in the use of computer resources. Third, there are more software development tools, such as application generators, for writing speech recognition applications. These software development tools make it possible to quickly and easily develop speech recognition applications; simply drag-and-drop a speech recognition block from a palette of blocks and edit the properties.

We should also remember that the processing power in computers continues to increase in accordance with Moore’s Law, which states that processing power will double every 18 months. Indeed, Moore’s Law has been holding true for at least the past 15 years. As computer processors become faster and faster, and as speech recognition uses more efficient and accurate algorithms, the factors listed above will greatly enhance speech recognition technology.

A HYPOTHETICAL KILLER APP
We all know that text-tospeech technology has been around for a long time, certainly as far back as 15 years ago, when I played with my first text-tospeech product, RealTalker. But now, I’d like to describe a hypothetical killer app that should not, given improvements in speech recognition, remain hypothetical for long.

Suppose we take a text-tospeech product, similar to RealTalker, but much more advanced. Suppose the software has very advanced knowledge and intelligence including grammar rules, and other language rules which give it the capability to translate text from one language to text in another language, similar to what most of us did in high school when we took Spanish 101 or French 101. You could then convert the translated text to speech using text-tospeech. Thus, any written text in English could be converted to text in French, German, Russian, or whatever language you choose.

Now consider this possibility. Many speech recognition products support multiple languages and can perform speech-to-text conversion. Take my previous example of converting text in one language to text in another language. Now using speech recognition you could, theoretically, talk to anyone in the world over the telephone, even if that person speaks a completely different language.

Here’s how it could be done. The speech recognition would first convert the speech to text. Next, a special text-to-text converter using grammar rules and the like would translate the text to another language, and then finally text-to-speech would convert the text at the other end of the phone to the native language of the person to whom you are talking. Universal language recognition — not since the destruction of the Tower of Babel has the possibility of universal language communication been possible.


Learn About Speech Recognition At CTI Expo

For the latest developments in speech recognition hardware and software, attending CTI EXPO (May 19-22 in Baltimore, MD) is an absolute must! Many leading speech recognition companies will be there, showcasing their latest and greatest speech recognition products. The amount of information you will learn from attending this show will be tremendous. We’ve also gone to great lengths to design an objective and educational seminar program that covers an extensive array of CTI topics. I hope to see you there!


What's Hot

Selsius Systems came to CTI Magazine’s offices and demonstrated their LAN PBX, an innovative product that not only integrates the PC-server onto the LAN, but the desktop phone handsets onto the LAN as well! The phones actually have a 10 Base-T Ethernet port that allows you to connect the phone onto the LAN. Each phone is assigned an IP Address (or you can even use DHCP) and can be monitored or have its settings changed, all from a Web browser. The power and robustness of this type of solution is quite apparent. For more information, see the CTI News section, which includes a story on this product.

Banksoft has acquired all product lines, assets, and technologies of PureData, Simplified Telephony, and Kolvox. PureData is known for its SatisFAXtion line of fax server cards and solutions, which were originally acquired from Intel Corporation. Sim-phony (another name for Simplified Telephony) is the creator of the voice messaging CTI software product called Simphony NT. Sim-phony has make it easy for resellers to get into com-putertelephony integration by pro-viding a free CD which contains an introduction to CTI, plus a demonstration of Sim-phony NT, which simplifies CTI for both vendors’ salespeople and their prospects. (See www.sim-phony.com.) Kolvox develops voice productivity software products using speech recognition engines from IBM, Kurzweil, and Dragon Systems. In addition, PureData is offering resellers up to a $250 rebate on its awardwinning line of SatiFAXtion fax modem cards. Resellers receive a rebate of $250 on the SatisFAXtion-OnDemand Platinum product, $50 on the SatisFAXtion-OnDemand Gold, and $50 on SatisFAXtion-OnDemand for Fax Servers. For more information check out www.SatisFAXtion.com







Technology Marketing Corporation

2 Trap Falls Road Suite 106, Shelton, CT 06484 USA
Ph: +1-203-852-6800, 800-243-6002

General comments: [email protected].
Comments about this site: [email protected].

STAY CURRENT YOUR WAY

© 2024 Technology Marketing Corporation. All rights reserved | Privacy Policy