Speech Recognition and Text to Speech Technology

TMCnet - World's Largest Communications and Technology Community

November 10, 2006

The Features and Functionalities of TTS Director: A Q&A Session with Loquendo's Ornella Ambrois

By Stefania Viscusi, TMCnet Assistant Editor

Advancements in speech recognition and text-to-speech technology are making possible spoken audio that is natural and accurate. Utilizing TTS, interaction with devices and applications becomes easier and more efficient.
To make these life-like interactions between human and computer-generated voice possible, it is necessary to have a powerful development tool that offers optimal quality speech interfaces.
Loquendo, provider of speech recognition and text to speech solutions, with over 30 years of R&D experience, recognized this need and offers TTS Director as a solution to help users write and modify vocal messages for their applications.
To find out more about TTS Director, its features and its functionalities, I asked Ornella Ambrois, Marketing Manager, Loquendo some more about TTS and specifically, Loquendo's TTS Director.
SV: What are some of the ways Text to Speech (TTS) can be used?
OA: Loquendo’s TTS can be used in a great variety of applications, such as the announcement of train departure times or the reading of map directions by car navigators, for example. One of the fundamental applications of TTS is in the call center, where a large number of customer inquiries can be handled efficiently by an automated service, so reducing costs while at the same time extending the service to 24/7 round the clock access. TTS has a further advantage over recorded prompts because dialogue and prompts can be modified as easily as a web page; texts can simply be rewritten as required, and the use of TTS allows uniformity of voices to be maintained.
TTS also plays a vital role in e-learning and m-learning projects, where it greatly increases accessibility and ease of use, as well as in many entertainment contexts such as avatars in computer games and providing additional audio content to MMS messages.
SV: How has TTS improved the way speech is being used today?
OA: TTS has made speech technology more flexible and more human. Now, automated call centers cannot just understand a caller’s questions, but can respond to them, and in an expressive and agreeable way. An increased emotional range also means that TTS is the right solution for a number of applications-- making communication more cost effective, more efficient and more responsive to customers’ needs. In recent years, an increasingly important trend in the speech technology market is the use of TTS to enable interaction with mobile phones, PDAs and computers. This is of particular relevance for those with disabilities in providing a vital lifeline that allows them access to technology, which, without TTS, would be denied.
SV: What is a common goal when using TTS and how does Loquendo answer to that need?
OA: TTS provides us with many benefits over recorded prompts since it is capable of reading dynamic data such as news, emails and many kinds of information services without the need to re-record. In addition, any information database can be accessed 24/7 by a TTS enabled service, providing callers access to limitless amounts of information (both public and personal) from government services, banks, train timetables etc. that, with recorded speech, would simply not be possible.
In fact, the use of TTS substantially reduces costs because there is no need to organize and finance new recordings since any prompt can be easily altered or completely rewritten, while still maintaining the same voices for the reading of these prompts.
With self-service applications, call time is also reduced while the caller is given a service that is both intelligible and pleasing to listen to. Furthermore, TTS improves accessibility to technology (particularly, but not only, for the disabled) and provides an increased and improved dissemination of information in many contexts, such as up to date traffic and safety info available with car navigators.
SV: Why was TTS Director created? Who is it aimed toward?
OA: TTS Director was created to provide an intuitive, user-friendly environment in which to create lifelike, expressive synthetic speech material, with the emphasis on simplicity of use.
With TTS Director, the whole process of writing prompts becomes quick and easy, while any glitches can be identified and ironed out straightaway.
The application is aimed at anyone wishing to design their own prompts, particularly those with a limited knowledge of the tools and control tags availablewith TTS technology.
SV: What are some of the distinguishing features of Loquendo's TTS Director?
OA: The primary feature of Loquendo's TTS Director is that it enables the use of, (in a greatly simplified and user friendly way) all the many distinguishing features of Loquendo’s award-winning TTS, namely: real expressivity, mixed language support, the audio mixer, the plug-in lexicon, etc.
For speech application developers, having all these tools readily available greatly simplifies the creation of prompts.
TTS Director’s design is extremely user friendly: text can be typed directly into the edit box, control tags can be inserted from the drop-down menus, and the results can be listened to and fine-tuned until the prompt is just how the user wants it.
The feature of TTS Director that really sets it apart, however, is ‘Expressive Speech, whereby a wide selection of commonly used phrases is pronounced in a truly lifelike way. Loquendo’s Expressive Speech makes such phrases and exclamations (How are you? I’m so sorry!) sound natural, lively and pleasing to listen to, whereas without this feature the same phrases would sound flat, expressionless and insincere.
Mixed language support facilitates the correct reading of multilingual texts. One approach is to switchvoices at every language change, and with Language Guesser each new language is detected automatically and the voice changed accordingly.However, this approach does not provide the best results when dealing with truly mixed-language text - where changes occur frequently and are embedded in the sentence, as in Web content, e-mails or information services, where foreign names and phrases (e.g. foreign film titles) occur frequently. In this case the Phonetic Mapping approach is effective: the mixed-language text can be read by a single Loquendo voice, which will read out any foreign words correctly while retaining its native accent. Phonetic Mapping makes such mixed language reading possible. By applying the phonetic transcriber to the foreign language text and then mapping the transcribed phonemes onto those of the voice's native language in order to access its acoustic units.
The Audio Mixer enables music and sound effects to be added to a prompt, as well as the rhythmic synchronisation of music and dialogue. Reverb, echo and stereo effects can be added, and prompts and music can be looped or faded in and out.
Also available with TTS Director is the plug-in lexicon feature, whereby specially compiled lexicons of word transcriptions can be added to the speech engine’s vocabulary. One popular example is the SMS lexicon, which enables the reading of the many abbreviations currently used in text messages and emails, which would otherwise not be correctly pronounced by the TTS engine. Any speech application involving any kind of jargon benefits from the compilation of a lexicon to enable the correct pronunciation of technical terms and less common words and names.
SV: How are developers able to utilize TTS Director to make speech interaction more life-like?
OA: TTS Director offers many features to make speech more lifelike: the tone and timbre of the voice can be modified, pauses can be added, speaking rate and emphasis can be adjusted. Combined together, these tools provide a powerful and effective means of adjusting the character of any Loquendo voice to suit the application’s needs.
Expressive Cues, on the other hand, make speech interaction truly lifelike, enabling the TTS to respond, for example, to a caller’s questions in a really natural way. For other applications, such as talking email or computer game voiceovers, TTS is able to express a wide range of emotions, to laugh or to yawn, to be angry or sad.
These Expressive Cues are accessible through an easy to use menu with TTS Director, from which the required phrase is selected and automatically inserted into the text.
SV: Is there a level of skill needed to be able to use TTS Director?
OA: TTS Director has been designed specifically for use by those with little technical knowledge of speech applications, while at the same time it remains an invaluable tool for application developers with a high level of know-how.
The various features and menus are intuitive and easy to use, and the user need only click on the required setting to produce the desired effect.
If help should be needed, a user guide can be rapidly accessed from the toolbar. 
SV: Can you describe expressive synthetic speech and how it is exclusive to Loquendo's technology?
OA: Loquendo is the first speech technology company to provide truly lifelike speech. Many commonly used expressions are pronounced in a really natural sounding way, adding a far wider emotional range that can be exploited in any speech application. An automated call center, for example, can greet its callers (Good afternoon! Thank you for calling! Goodbye!) in a way that is natural, expressive and pleasing on the ear. Your computer can read your emails to you in a tone and style that more closely reflect the content of the text, simply following the punctuation used by the writer (How are you? Great to hear from you! Miss you!). Loquendo’s TTS is unique in its field for having pioneered this important step towards truly human sounding speech.
SV: Do you see an increasing importance being placed on TTS in the future? How will Loquendo answer to these demands?
OA: The use of TTS is becoming more widespread, and the industry at large is currently experiencing phenomenal growth. While much of this growth stems from the market’s rapid uptake of automated call centres and other tried and tested applications, TTS is also rapidly moving into new areas: human-machine interfaces are becoming increasingly important, whereby more and more features on mobile phones, PDAs and computers can be operated by voice. This is, of course, of particular importance to those who are differently abled, but is also highly relevant for other users who often discover that hands-free operation is more pleasant, more convenient, and less likely to cause repetitive strain injuries.
This widespread adoption of speech technology will demand synthetic voices that are natural sounding, emotionally versatile and of the highest quality. To this end, Loquendo has undertaken much innovative research on the fluency and naturalness of Loquendo’s voices.
Innovative research has also been conducted on personalizing a voice. Saving various audio settings allows the user to construct a voice with a character all of its own. You can, for example, create the voice of a child, an elderly person, or a cartoon character.
For many years Loquendo has been interested in increasing the emotional element of its voices, in such a way that the tone of the voice may be changed to suit the application. However, the speech industry does not currently offer such solutions, capable of changing TTS voice style and tone upon command, or according to the nature of the text itself, without compromising on acoustic quality. In the near future, however, we believe research will make it possible to synthesize any kind of text in a preferred style (e.g. emphatic, formal, and informal) and in the emotional tone required (e.g. happy, sad, angry etc.).
Related Articles from Loquendo: Mixed Language Support     Audio Mixer
Stefania Viscusi is an established writer and avid reader. To see more of her articles, please visit Stefania Viscusi’s columnist page.

» Speech Recognition and Text to Speech Technology
» See All Feature Articles

Technology Marketing Corporation

2 Trap Falls Road Suite 106, Shelton, CT 06484 USA
Ph: +1-203-852-6800, 800-243-6002

General comments: tmc@tmcnet.com.
Comments about this site: webmaster@tmcnet.com.


© 2020 Technology Marketing Corporation. All rights reserved | Privacy Policy

Speech Recognition and Text to Speech Technology