Building Blocks For Building
Leveraging The Eight Service Building
Blocks Of Media Servers To Drive Costs Out Of Your Network
BY GRANT HENDERSON
Voice processing hardware can be found everywhere in todayï¿½s advanced wireline and wireless telecommunication networks. Whether embedded in traditional Class 5 switching equipment or offered as part of voice conferencing, IVR, or voice mail service nodes, specialized hardware provides the fundamental processing and media manipulation required for a broad range of network and enhanced services. While the historical proliferation of voice processing hardware has proven essential to the introduction of many new services, it has also resulted in significant functional duplication and unnecessarily high capital and operational costs for service providers.
Media servers, the newest entry to the voice processing market, are designed specifically to help reduce the duplication and costs traditionally borne by service providers. Unlike their service node counterparts, media servers are not tightly coupled to a particular application or to a single vendorï¿½s services architecture. Instead, media servers are multiservice, shared network resources, which provide eight generic service building blocks that can be applied, unchanged, to a broad range of services.
THE BUILDING BLOCKS CONCEPT
Any parent knows that LEGO blocks, those simple, multicolored plastic building blocks, can be combined by children and adults alike to make some amazing creations. The media server takes this same approach. A small collection of atomic building blocks is made available by the media server. These building blocks, much like LEGO pieces, can be combined together to create more complex capabilities for enhanced services. When coupled with application specific service logic executing in a softswitch or application server, the eight key building blocks can be used to create any of todayï¿½s existing services and are fundamental enablers for new service development and delivery. The eight fundamental media processing building blocks are described below.
The first and most fundamental building block is the media serverï¿½s ability to play announcements. Announcements, also known as prompts, are required for essentially any service that needs to communicate information to a subscriber through the voice channel. Announcements, consisting of one or more pre-recorded audio clips, may be provisioned on the media server in non-volatile memory or fetched from a remote server using standard protocols such as NFS or HTTP. When controlled by service logic in an application server or softswitch, the media serverï¿½s announcement functionality can be utilized as part of a wide range of services including service branding, informational messages and other capabilities traditionally provided by dedicated announcement servers.
Tones And DTMF
Tones are also used to communicate information to subscribers. However the most common application of tones for media servers is to allow users to communicate information to the service. DTMF, an acronym that stands for dual tone-multi-frequency, is the basis of service control in todayï¿½s networks. By pressing keys on their handset, subscribers can enter PIN numbers to gain access to a service, self activate and manage services, or navigate through service control menus. DTMF tone processing is performed in the media servers DSPs and works very well on uncompress codecs like G.711. When using G.723.1, G.729, or other compressed codecs, DTMF tones are not reliably detected, so media servers implement RFC 2833, a mechanism to communicate out-of-band DTMF events.
By combining DTMF collection and announcements together, essentially any service that requires an IVR can be replicated. These include services such as prepaid calling card, subscriber activation, and management of CLASS features and network ACD functions.
Voice Recording And Playback
The media serverï¿½s ability to capture and play back user recordings can be coupled with its announcement and DTMF collection capabilities to provide the media processing features required by todayï¿½s voice messaging systems. Since voice messaging systems deployed in carrier networks tend to serve hundreds of thousands of subscribers, the storage requirements for these systems can be massive. As a result, media servers are typically connected to an external NFS or HTTP storage server providing hundred of gigabits of storage capacity. In this configuration, the media server provides the IVR and RTP streaming/recording engine for a voice messaging service while the NFS/HTTP server provides the scalable and cost effective message store. External storage servers may also be used in conferencing applications where conference recording is an important feature.
In other services, recorded audio may be smaller and more transient in nature. For example, many reservation-less conferencing systems allow the user to record their name when joining a conference. This recording is then used to automatically announce the new participant and is also called on if the moderator requests a roll call. Since this recording is only required for the duration of the conference, it is transient in nature and is typically stored in RAM inside the media server to reduce network complexity.
While we are on the topic of audio conferencing, it is useful to quickly examine some of the media serverï¿½s features, which enable them to be applied to audio conferencing services including residential three-way and Centrex six-way and complex business conferencing applications.
The media serverï¿½s DSPs are ideally suited to execute complex audio mixing algorithms that combine the audio from multiple callers into a composite stream. For services, which only require basic conferencing, such as residential three-way conferencing and Centrex six-way conferencing, simple algorithms can be used. However for advanced conferencing services -- such as business conferencing services offered by ASPs, IXCs, and PTTs -- additional complexity in the mixing algorithm and extra features such as the ability to mute one or more participants, support listen-only participants, and support very large number of participants are required and offered by the media server.
Speech-based technologies, namely text to speech (TTS) and automatic speech recognition (ASR) represent the fifth and sixth service building blocks offered by media servers. A media server vendor can choose to implement ASR and TTS features within the media server itself or can interface to third-party speech servers through a variety of proprietary interface mechanisms. More recently there has been work initiated within the IETF, which will see the development of a standardized interface between speech servers and media servers.
While text to speech has traditionally sounded robotic and unnatural, there has been a significant improvement in quality by leading vendors such as Nuance, Speechworks, and AT&T. Using these technologies, it is possible for media servers to offer the text to speech capabilities which are needed for e-mail readers, richer information services, SMS messaging to legacy handsets and other applications, which require the rendering of text into spoken words.
Speech recognition can be applied to existing services, for example to enable speech control of voice mail or conferencing, or to create new services such as voice portals.
Much like text to speech, media servers can offer speech recognition features using one of two models, namely using internal or external speech recognition servers. The first and most popular is the client/server model. Under this model, the media server acts as a telephony-conditioning engine and provides buffering for utterance detection. The actual speech recognition function is performed on a separate server where the speech recognition engine resides.
A second approach embeds the speech recognition engine inside the media server. Which is better? Well that depends on your applications and network design. External speech recognition has the advantage when it comes to very large vocabularies but increases the complexity of the network and can result in communication link failures between the media server and speech server. Internal speech recognition is well suited for small to medium size vocabulary applications such as speech-based dialing and "command and control" interfaces but is more complex to implement in the media server and is not well suited to applications such as stock quotation or airline reservation systems.
The seventh building block is fax processing. While there are two forms of fax used in packet voice networks, namely T.37 (store and forward) and T.38 (fax relay), the most common use of fax with media servers is T.38 for unified messaging applications.
Arguably the media server's ability to provide video streaming, video bridging and video recording/playback functions could be the eighth, ninth, and tenth service building blocks. Each is a unique capability of the media server, and each can be used individually or together to create a variety of video enabled services. However, at least for now, the use of video in the public network services offered by telephone companies is a niche application. Thus we elect to lump all of these capabilities into the eighth and final service building block.
SAVINGS AND BENEFITS
There are many benefits to service providers who adopt a service platform model based on generic media servers combined with third-party service logic. First, capital and operational costs can be reduced significantly. For example, a typical conferencing solution in the PSTN network will cost between $800 and $1,000 per port. It is not uncommon for comparable decomposed solutions to be half to one quarter the price. In addition, while traditional conferencing bridges occupy approximately 20 to 24 inches of rack space to deliver 450 ports, the decomposed architecture is capable of delivering well over 15,000 ports in the same footprint. Thus it is clear that even with a single service alone, there is significant CAPEX and OPEX savings to be realized. These savings increase as the media server begins to service additional applications.
Furthermore, this open model of service development allows an unprecedented number of application developers to create and deploy new telecommunication services. Finally, by replacing proprietary hardware with multiservice media processing resources, fewer nodes need to be deployed and these assets can be leveraged across multiple applications. New service introduction can also be accelerated since new service return on investment and turn up time can be significantly improved when leveraging previously deployed hardware.
As shown above, the eight service building blocks are at the heart of most enhanced services and the differences amongst services tends to reside principally at the service logic layer, executing in the softswitch or application server. Through a planned and phased consolidation of their media processing platforms, service providers can realize real and valuable cost savings today while paving the way for a more elegant services architecture, one which results in better reuse of assets and offers improved ability to introduce and deliver new high-margin, differentiated services.
Grant Henderson is co-founder and executive vice president of marketing and strategy at Convedia Corporation, a leading supplier of next generation, softswitch-compliant media servers designed to enable communications service providers to rapidly deliver innovative and differentiated voice and video services over IP networks. For more information, visit the company online at
To The October 2002 Table Of Contents ]