ITEXPO begins in:   New Coverage :  Asterisk  |  Fax Software  |  SIP Phones  |  Small Cells
IMS Magazine
February 2007 — Volume 2 / Number 1
IMS Feature Article

The Media Server in IP Networks

By Tom Ray



The history of the media server goes back to the 1960s with automatic time and temperature announcements and Automatic Number Announcers. (ANA plays a message to a field technician that gives the telephone number associated with a wire pair.) The 1990s saw the wide deployment of stand-alone messaging, audio-conferencing, and fax servers. Then, in 1998-1999, the media server began to morph into its current form.

The media-server (MS) and application-server (AS) network-equipment product categories were the child of the late-’90s tech bubble. Cognitronics, which had been in the announcement business since 1961, along with startups Convedia, IP Unity, and Snowshore (now part of Cantata) are examples of the companies that helped define today’s media-server product category; BayPackets and Iperia are similar examples for application servers.

Well before IP telephony, voice-announcement media servers were peripherals to network switches. When IP telephony obviated TDM switches, there was still the need for network announcements and a plethora of other functions needed to implement value-adding services. So application servers evolved to fill the space left by the departure of the Class 5 switch. At the same time (1998-2000) the industry was defining the protocols (IPDC, MGCP, then Megaco/H.248) needed to logically and physically separate the media gateway controller from the media gateway. So the AS-MS separation was architecturally consistent.

But today, the industry is still struggling to come up with a protocol standard to support the AS-MS separation. This article describes the contending and developing protocols and standards that have evolved to support this control connection and the rapid development of value-adding applications.

IP Multi-Media Subsystem (IMS)

IP service-network architectures are moving towards the IP Multimedia Subsystem (News - Alert) (IMS) architecture developed by the 3GPP. IMS holds the promise of lowering the cost of deploying and maintaining flexible-function service networks and the service applications that use them. Industry-standard inter-entity protocols allow the various network elements to interoperate. Other application development standards help reduce the cost of service development.

The raison d’etre of IMS is to connect a caller with the desired service and then, of course, to render the service by the application servers and media servers within the network. That’s a given. But the IMS excitement comes from IMS’s promise to do this in a way that vastly improves on user value and experience, even while it reduces the service provider’s cost of development, deployment, and OSS expense.

IMS provides the following:

• The separation of services from transport: Just as the business of building and maintaining roads is different from that of service stations and restaurants, the business of building and managing networks should be different from the business of implementing and deploying value-adding services.

• Access-independent service networks: With IMS, the subscribers’ access device and network can be transparent to the service. Of course, the service application must know the capabilities of the endpoint, but those capabilities may be delivered by any access network.

• Separation of call and session control from transport: These two functions require significantly different resources and skills to implement. Separation permits a tighter focus of the industry’s innovation and production resources. Notice that the Call State Control Function has only dashed lines as connections in the IMS diagram, Figure 1.

• Separation of service applications and media-service resources: Media servers, the “Media Resource Function” (MRF) and the Application Servers are separated (see IMS diagram). The MRF can provide media services to multiple independent Application Servers. The separation of the media gateway and the media-gateway controller, and the application server and the media server are perfect examples of the use of network functional separation to achieve skill-set and resource separation in network equipment.

• Roaming — IMS allows subscribers to roam from enterprise to service-provider and beyond to another service-provider’s network without service interruption.

In addition to an open-ended list of network-based applications, the IMS architecture supports address resolution and routing, authentication, location and presence, network-based storage, and emergency services. And it is just as applicable to the enterprise as to the communications service provider. Absent “walled gardens” erected by carriers, the widespread implementation of IMS can dramatically lower the cost of developing and deploying network-based services, unleashing a torrent of innovation. Closed, carrier-controlled service networks, even though they may be based on IMS, will stifle innovation and reduce consumer choice.

Media Server (MRF)-Related Standards

The 3GPP has decomposed the MRF into the Media Resource Function Controller (MRFC) and the Media Resource Function Processor (MRFP). Possibly for consistency with the decomposition of the Media Gateway (News - Alert) (MG) and the Media Gateway Controller (MGC), IMS goes even further and specifies the use of H.248/Megaco as the standard for the control of the MRFP by the MRFC, as it does for the MGC and MG. However, few vendors have supported the use of H.248 for MRFC-MRFP control.

• VoiceXML makes the implementation of voice response applications (audio dialogs) productive and independent of call-control and other application logic. VoiceXML scripts are stored on an HTTP server and supplied to the media server “VoiceXML browser.”

CCXML (Call Control XML) is designed to provide telephony call control support for VoiceXML. CCXML scripts are stored on an HTTP server and supplied to the media server “voice browser.” A CCXML script might support an “outdial” notification function in a messaging system. CCXML

NETANN, (Basic Network Media Services with SIP) now on its eleventh version, is a SIP-based AS-MS protocol that supports the implementation of network announcements (“I’m sorry, the number…”) and other basic functions. Netann can also invoke VoiceXML scripts.

MSCML (Media Server Command Markup Language) is a SIP-based AS-MS protocol used to implement advanced conferencing and fax applications.

MSML/MOML (Media Sessions Markup Language/Media Objects Markup Language) are SIP-based AS-MS protocols used to implement advanced conferencing and fax applications.

• Media Server Control is a new initiative of an informal IETF working group called “Network Working Group”, which is specifying a “SIP control framework”. This approach separates SIP-based AS-MS control into various XML packages, which cover basic IVR, VoiceXML, fax, and conferencing. (See Figure 2.)

• MRCP (Media Resource Control Protocol, RFC4463) is a SIP-based protocol to utilize text-to-speech (TTS) and automatic speech-recognition (ASR) functions from VoiceXML. Using a network protocol to allow these resources to be separated from the primary media server allows them to be independently developed and for the service provider to be independent of any particular TTS or ASR implementation.

For example, a unified-messaging (UM) application might redirect a call stream to a T.38-capable fax media server (FMS) to handle a fax receive operation. The FMS will receive the fax and store it in a network-file-system location specified directly by the application server via MSCML. A subscriber may later retrieve the fax under control of the UM system.

A slightly different configuration has the UM application using MSCML to specify the URI of a VoiceXML script, which will lead the subscriber through a series of options. CCXML can be embedded in the VoiceXML script to implement a call-control function such as dialing the subscriber to notify her of the new fax message. And MRCP can be used in a VoiceXML script to utilize a TTS or ASR resource (text-to-speech/automatic speech recognition). The network diagram that supports this scenario is shown above.

Protocol Wars

Back in the mid-1990s there were two competing and incompatible open-architecture PCM highways: MVIP, sponsored by NMS Communications, and SCSA, sponsored by Dialogic (News - Alert) (now part of Eicon Networks). These two PCM highways performed the same function of PCM interconnect, but with few technical differences. Although two competing de facto standards were better than none, two were worse than one. Finally, after five years of the so-called bus wars, a new industry group, the ECTF, published its H.100 PCM-highway specification, creating a permanent cease fire.

Now, we have a similar situation in AS-MS protocols.

In February 2003, Snowshore Networks, now part of Cantata, published MSCML. In June of the same year, Convedia published MSML/MOML. Both of these XML-based AS-MS protocols were meant to pick up where NETANN and VoiceXML left off to offer more comprehensive conferencing support. In the same year Commetrex extended each protocol to support fax media servers. Today, application and media-server vendors are under pressure to support both. Although the primary vendors will argue the merits of one over the other, many observers point out that the advantages of one over the other do not overcome the disadvantages of having two competing protocols.

An informal industry group, the Network Working Group, also known as Media Server Control (MEDIACTL) which includes Ubiquity Software (News - Alert), BlankSpace, Hewlett-Packard, and Radvision, can play the same role today as the ECTF played in the mid-’90s. (see To date, the group has developed three draft documents:

• The Control Framework uses SIP to establish a control session between an application server and media server. The framework does not address the specifics of media control, which is the subject of various control packages. (see )

• The IVR Package supports voice-response functionality using the Control Framework. (see

• The VoiceXML Package supports scripted voice within the Control Framework. (see

• A Fax Package will be added in 4Q2006.

One possible outcome of MediaCtl, which has been an informal group for two years, is to “embrace and extend” the best of MSCML and MSML/MOML, thereby ending the AS-MS protocol wars. In October of 2006 the group became an IETF “BOF” (Birds Of a Feather), which is one step before becoming a full IETF “Working Group”.

Looking Ahead

The media server and its supporting protocols and standards open the door to the separation between more than the application and media server. They allow the separation between the development and deployment of network-based services and the underlying network. But just because the separation is supported, it will not necessarily happen. That will be up to the legislative, judicial, and regulatory bodies in each national administration.

True separation will mean layers of competition that will more-effectively marshal the industry’s resources. This will drive down costs, lowering the investment hurdles for a rich range of unanticipated services accessed from the entire array of networks and network-access devices.

Tom Ray is the Executive Vice President of Sales and Marketing at Commetrex Corporation ( He is the former Senior Vice President, Sales and Business Development, for Vocalocity.

Return To The IMS Table Of Contents

Today @ TMC
Upcoming Events
ITEXPO West 2012
October 2- 5, 2012
The Austin Convention Center
Austin, Texas
The World's Premier Managed Services and Cloud Computing Event
Click for Dates and Locations
Mobility Tech Conference & Expo
October 3- 5, 2012
The Austin Convention Center
Austin, Texas
Cloud Communications Summit
October 3- 5, 2012
The Austin Convention Center
Austin, Texas