TAPI 3.0: Excerpts From The Microsoft Whitepaper As
telephony and call control become more common at the desktop, a general telephony
interface is needed to enable applications to access all the telephony options available
on any machine. Additionally, it is imperative that the media or data on a call is
available to applications in a standard manner.
Microsofts TAPI 3.0 provides simple and generic methods for making connections
between two or more machines, and accessing any media streams involved in that connection.
It abstracts call-control functionality to allow different, and seemingly incompatible,
communication protocols to expose a common interface to applications. Much of TAPIs
design anticipates IP telephony, a demand poised for explosive growth as organizations
begin an historic shift from expensive and inflexible circuit-switched public telephone
networks to intelligent, flexible, and inexpensive IP networks. Now in its third major
version, TAPI is suitable for quick and easy development of IP telephony applications.
INSIDE TAPI 3.0
TAPI 3.0 integrates multimedia stream control with legacy telephony. It is an evolution of
the TAPI 2.1 API to the COM model. Besides supporting classic telephony providers, TAPI
3.0 supports standard H.323 conferencing and IP multicast conferencing. TAPI 3.0 utilizes
the Windows NT 5.0 Active Directory service to simplify deployment within an organization,
and it supports Quality of Service (QoS) features to improve conference quality and
network manageability.
There are four major components to TAPI 3.0:
TAPI 3.0 COM API: In contrast to TAPI 2.1, the TAPI 3.0 API is implemented as
a suite of Component Object Model (COM) objects. Moving TAPI to the object-oriented COM
model allows component upgrades of TAPI features. It also allows developers to write
TAPIenabled applications in any language, such as Java, Visual Basic, or C/C++.
TAPI Server: The TAPI Server process (TAPISRV.EXE) abstracts the TSPI (TAPI
Service Provider Interface) from TAPI 3.0 and TAPI 2.1, allowing TAPI 2.1 Telephony
Service Providers to be used with TAPI 3.0, maintaining the internal state of TAPI.
Telephony Service Providers
(TSPs): These are responsible for resolving the protocol-independent call
model of TAPI into protocolspecific call control mechanisms. TAPI 3.0 provides backward
compatibility with TAPI 2.1 TSPs. Two IP telephony service providers (and their associated
MSPs) ship by default with TAPI 3.0: the H.323 TSP and the IP Multicast Conferencing TSP,
which are discussed later in this document.
Media Stream Providers: TAPI 3.0 provides a uniform way to access the media
streams in a call, supporting the DirectShow API as the primary media stream handler. TAPI
Media Stream Providers (MSPs) implement DirectShow interfaces for a particular TSP and are
required for any telephony service that makes use of DirectShow streaming. Generic streams
are handled by the application.
CALL CONTROL MODEL
There are five objects in the TAPI 3.0 API:
- TAPI.
- Address.
- Terminal.
- Call.
- CallHub.
The TAPI object is the applications entry point to TAPI 3.0. This object
represents all telephony resources to which the local computer has access, allowing an
application to enumerate all local and remote addresses. An Address object represents the
origination or destination point for a call. Address capabilities, such as media and
terminal support, can be retrieved from this object. An application can wait for a call on
an Address object, or can create an outgoing call object from an Address object.
A Terminal object represents the sink, or renderer, at the termination or origination
point of a connection. The Terminal object can map to hardware used for human interaction,
such as a telephone or microphone, but can also be a file or any other device capable of
receiving input or creating output. The Call object represents an addresss
connection between the local address and one or more other addresses. (This connection can
be made directly or through a CallHub.) The Call object can be imagined as a first-party
view of a telephone call. All call control is done through the Call object. There is a
call object for each member of a CallHub.
The CallHub object represents a set of related calls. A CallHub object cannot be
created directly by an application they are created indirectly when incoming calls
are received through TAPI 3.0. Using a CallHub object, a user can enumerate the other
participants in a call or conference, and possibly (because of the location-independent
nature of COM) perform call control on the remote Call objects associated with those
users, subject to sufficient permissions.
MEDIA STREAMING
The Windows operating system provides an extensible framework for efficient control and
manipulation of streaming media called the DirectShow API. DirectShow, through its exposed
COM interfaces, provides TAPI 3.0 with unified stream control.
At the heart of the DirectShow services is a modular system of pluggable components
called filters, arranged in a configuration called a filter graph. A component called the
filter graph manager oversees the connection of these filters and controls the
streams data flow. Each filters capabilities are described by a number of
special COM interfaces called pins. Each pin instance can consume or produce streaming
data, such as digital audio. While COM objects are usually exposed in user mode programs,
the DirectShow streaming architecture includes an extension to the Windows driver model
that allows the connection of media streams directly at the device driver level.
These high-performance streaming extensions to the Windows driver model avoid
user-to-kernel mode transitions, and allow efficient routing of data streams between
different hardware components at the device driver level. Each kernel mode filter is
mirrored by a corresponding user mode proxy that facilitates connection setup and can be
used to control hardware-specific features.
DirectShow network filters extend the streaming architecture to machines connected on
an IP network. The RealTime Transport Protocol (RTP), designed to carry real-time data
over connectionless networks, transports TAPI media streams, and provides appropriate time
stamp information. TAPI 3.0 includes a kernel mode RTP network filter. TAPI 3.0 utilizes
this technology to present a unified access method for the media streams in multimedia
calls. Applications can route these streams by manipulating corresponding filter graphs;
they can also easily connect streams from multiple calls for bridging and conferencing
capabilities.
TAPI 3.0 H.323 TSP
The H.323 Telephony Service Provider (TSP) along with its associated Media Stream
Provider allows TAPI-enabled applications to engage in multimedia sessions with any
H.323-compliant terminal on the localarea network. Specifically, the H.323 Telephony
Service Provider (TSP) implements the H.323 signaling stack. The TSP accepts a number of
different address formats, including name, machine name, and e-mail address. The H.323 MSP
is responsible for constructing the DirectShow filter graph for an H.323 connection
(including the RTP, RTP payload handler, codec, sink, and renderer filters).
INTEGRATION WITH WINDOWS NT 5.0 ACTIVE DIRECTORY
H.323 telephony is complicated by the reality that a users network address (in this
case, a users IP address) is highly volatile and cannot be counted on to remain
unchanged between H.323 sessions. The TAPI H.323 TSP utilizes the services of the Windows
NT Active Directory to perform user-to-IP address resolution. Specifically, user-toIP
mapping information is stored and continually refreshed using the Internet Locator Service
(ILS) Dynamic Directory, a real-time server component of the Active Directory.
IP MULTICAST CONFERENCING IN TAPI 3.0
IP Multicast is an extension to IP that allows for efficient group communication. IP
Multicast arose out of the need for a lightweight, scalable conferencing solution that
solved the problems associated with real-time traffic over a datagram,
best-effort network. There are many advantages to using IP Multicast:
scalability, fault tolerance, robustness, and ease of setup. The IP Multicast conferencing
model incorporates the following key features:
- No global coordination is needed to add and remove members from a conference.
- To reach a multicast group, a user sends data to a single multicast IP address. No
knowledge of the other users in a group is necessary.
- To receive data, users register their interest in a particular multicast IP address with
a multicast-aware router. No knowledge of the other users in a group is necessary.
- Routers hide the multicast implementation details from the user.
TAPI 3.0 IP MULTICAST CONFERENCING TSP
The IP Multicast Conferencing TSP is chiefly responsible for resolving conference names to
IP multicast addresses, using the Session Description Protocol (SDP) conference
descriptors stored in the ILS Dynamic Directory Conference Server. It is complemented by
the Rendezvous conference controls, described later in this document. The IP Multicast
Conferencing MSP is responsible for constructing an appropriate DirectShow filter graph
for an IP multicast connection (including RTP, RTP payload handler, codec, sink, and
renderer filters).
TAPI 3.0 uses the IETF standard Session Description Protocol (SDP) to advertise IP
multicast conferences across the enterprise. SDP descriptors are stored in the Windows NT
Active Directory specifically, in the ILS Dynamic Directory Conference Server. In
contrast to the Dynamic Directory servers utilized by the H.323 TSP, there is only one ILS
Conference Server per enterprise, since conference announcements are not continually
refreshed, therefore consuming little bandwidth.
TAPI 3.0 RENDEZVOUS CONTROLS
The Rendezvous Controls are a set of COM components that abstract the concept of a
conference directory, providing a mechanism to advertise new multicast conferences and to
discover existing ones. They provide a common schema (SDP) for conference announcement, as
well as scriptable interfaces, authentication, encryption, and access control features.
A session description is broken into three main parts: a single Session Description,
zero or more Time Descriptions, and zero or more Media Descriptions. The Session
Description contains global attributes that apply to the whole conference or all media
streams. Time Descriptions contain conference start, stop, and repeat time information,
while Media Descriptions contain details that are specific to a particular media stream.
While traditional IP multicast conferences operating over the MBONE (IP Multicast
Backbone) have advertised conferences using a push model based on the Session Announcement
Protocol (SAP), TAPI 3.0 utilizes a pullbased approach using Windows NT Active Directory
services. This approach offers numerous advantages, among them bandwidth conservation and
ease of administration.
CONFERENCE SECURITY MODEL
TAPI 3.0s conference security system addresses who can create, delete, and view
conference announcements. The security system also serves to prevent conference
eavesdropping. TAPI 3.0 utilizes the security features of the Windows NT Active Directory
and LDAP to provide for secure conferencing over insecure networks such as the Internet.
Each object in the Active Directory can be associated with an Access Control List (ACL)
specifying object access rights on a user or group basis. By associating ACLs with SDP
conference descriptors, conference creators can specify who can enumerate and view
conference announcements. User authentication is provided by the Windows NT security
subsystem.
QoS AND TAPI 3.0
Quality of Service (QoS) in TAPI 3.0 is handled through the DirectShow RTP filter, which
negotiates bandwidth capabilities with the network based on the requirements of the
DirectShow codecs associated with a particular media stream. These requirements are
indicated to the RTP filter by the codecs via its own QoS interface. The RTP filter then
uses the COM Winsock2 QoS interfaces to indicate, in an abstract form, its QoS
requirements to the Winsock2 QoS service provider (QoS SP). The QoS SP, in turn, invokes a
number of varying QoS mechanisms appropriate for the application, the underlying media,
and the network, in order to guarantee appropriate end-to-end QoS. These mechanisms
include:
- The Resource Reservation Protocol (RSVP).
- Local Traffic Control (Packet Scheduling, 802.1p, and appropriate layer 2 signaling
mechanisms).
- IP Type of Service and DTR header settings.
RSVP
The Resource Reservation Protocol (RSVP) is an IETF standard designed to support resource
(for example, bandwidth) reservations through networks of varying topologies and media.
Through RSVP, a users Quality of Service requests are propagated to all routers
along the data path, allowing the network to reconfigure itself (at all network levels) to
meet the desired level of service.
Local Traffic Control
Packet Scheduling: This mechanism can be used in conjunction with RSVP (if the
underlying network is RSVPenabled) or without RSVP. Traffic is identified as belonging to
one flow or another, and packets from each flow are scheduled in accordance with the
traffic control parameters for the flow. These parameters generally include a scheduled
rate (token bucket parameter) and some indication of priority. The former is used to pace
the transmission of packets to the network. The latter is used to determine the order in
which packets should be submitted to the network when congestion occurs.
801.2p: Traffic control can also be used to determine the 802.1 User Priority value (a
MAC header field used to indicate relative packet priority) to be associated with each
transmitted packet. 802.1p-enabled switches can then give preferential treatment to
certain packets over others, providing additional Quality of Service support at the data
link layer level.
Layer 2 Signaling Mechanisms: In response to Winsock 2 QoS APIs, the QoS service
provider may invoke additional traffic control mechanisms depending on the specific
underlying data link layer. It may signal an underlying ATM network, for instance, to set
up an appropriate virtual circuit for each flow. When the underlying media is a
traditional 802 shared media network, the QoS service provider may extend the standard
RSVP mechanism to signal a Subnet Bandwidth Manager (SBM). The SBM provides centralized
bandwidth management on shared networks.
IP Type Of Service
Each IP packet contains a threebit Precedence field, which indicates the priority
of the packet. An additional field can be used to indicate a delay, throughput, or
reliability preference to the network. Local traffic control can be used to set these bits
in the IP headers of packets on particular flows. As a result, packets belonging to a flow
will be treated appropriately later by three devices on the network. These fields are
analogous to 802.1p priority settings but are interpreted by higher layer network devices.
ENTERPRISE DEPLOYMENT OF TAPI 3.0
TAPI 3.0 has been designed to scale from the smallest business up to the largest
organizations, while at the same time taking advantage of the Windows NT Active Directory
to bring IP telephony to the enterprise.
The ILS Dynamic Directory Servers and the ILS Dynamic Directory Conference Server
provide functionality for point-to-point and multiparty conferencing. IP telephony clients
can utilize video and audio capture equipment, but can also support legacy telephones
through the use of a PSTN add-in card. The IP/PSTN Gateway digitizes incoming analog voice
calls from PSTN lines and encapsulates them in H.323 streams, and vice versa, providing
users with the ability to send and receive legacy voice calls through existing telephony
infrastructure. The H.323 Proxy allows H.323 clients connectivity with the Internet by
forwarding H.323 streams through the enterprise firewall. This enables H.323 Internet,
Intranet, and business-to-business connectivity.
The function of the IP Multicast Proxy is somewhat similar to that of the H.323 Proxy
to forward multicast conference packets but also furnishes clients with the
ability to propagate selected conference announcements to and from the Internet. The IP
Multicast Proxy monitors conference announcements stored on the ILS Dynamic Directory
Conference Server and broadcasts conferences with appropriate scope and security
attributes to the Internet using the Session Announcement Protocol (SAP).
Conversely, the IP Multicast Proxy listens for appropriate conferences from those
broadcast over the Internet and populates the ILS Dynamic Directory Conference Server with
these announcements. In this manner, the IP Multicast Proxy allows users conference
connectivity over the Internet while ensuring the confidentiality and security of private
conferences.
The information contained in this article represents the current view of Microsoft
Corporation on the issues discussed. Because Microsoft must respond to changing market
conditions, it should not be interpreted to be a commitment on the part of Microsoft. For
more information on Microsofts CTI initiatives, TAPI 3.0, or for a complete version
of the white paper, IP Telephony With TAPI 3.0, visit the Microsoft Web site at www.microsoft.com Direct correspondence to:
Microsoft Corporation, One Microsoft Way, Redmond, WA 98052-6399 USA. |