| Application Flow Management: At Last A
Useful Model For The Enterprise BY THERESA MCGUIRE AND EILEEN HAGGERTY
Todays corporate networks are crucial to the success of the business, and while
managing these networks has always been a challenge, the explosion of Enterprise Resource
Planning (ERP) applications such as SAP R/3, Microsoft Exchange, PeopleSoft, and Lotus
Notes has changed the game entirely. The network is now a major corporate asset, and the
complexities of keeping ERP applications under control and identifying and resolving
performance issues across diverse local and wide area networks will be your most
challenging task to date. You need visibility into the application layer to be able to
protect the important business processes and make certain that mission-critical
applications dont suffer. Preparing for the battle is easy if you understand the
Enterprise Service Level Management Model. Using Application Flow Management as both your
offensive and defensive weapons will not only bring you to battle readiness it will
ensure that you win the war against enterprise-wide network performance problems.
WHAT YOU NEED TO KNOW
How is your WAN performing? Is it meeting standard performance metrics on availability,
latency, and throughput? What is the percentage of network utilization and what protocols
are being carried? What applications are running and how much bandwidth does each
application consume? Who are the users and is there a set pattern of usage? Are any
applications fighting for network bandwidth? Can you trace the exact user who has launched
a process that is bringing your network to its knees? If you dont know the answers
to these questions when the network is running smoothly, how will you even begin to fix it
when trouble starts?
Not all network congestion problems are related to a specific protocol or application,
although it may seem that way from a cursory look at the network traffic. In reality, the
problem may be that a particular user or host is misusing the network resources,
generating excessive traffic and impacting a critical application. Application layer tools
allow you to gather information at all layers and track network usage all the way down to
the node.
Another service of great value is the ability to capture network packets and decode
them in detail, allowing a network manager to view the contents of the traffic directly.
Using this feature, a network manager can go beyond simply looking at IPX traffic, and
instead look at NetWare Directory Services (NDS) or SAP updates, and implement changes
accordingly.
Furthermore, these decoding tools should allow the management personnel to view
application-specific protocols, such as SMTP, Oracles SQL*Net, and others. Since
this type of traffic generally is harder to filter out at the router level, having the
ability to examine application-specific traffic patterns allows for changes to be made to
the end systems directly, allowing the enterprise network to run more efficiently.
You may want to be sure that whatever you implement has the ability to not only take
such measurements, but also establish an Application Service Level Agreement (ASLA) with
your users, where you provide expectations for Lotus Notes response time or SAP R/3
response time. You should also be able to set a series of alarms to notify you if the ASLA
is being violated. On a weekly or monthly basis, you could generate reports on how the
network performed overall as related to ASLAs for mission critical applications. Web
access, with appropriate security, would be the ideal medium to easily distribute your
network performance reports to your department manager, a corporate VP, or even your
application users.
END USER PERCEPTION
Most end users reporting a problem to the help desk typically express it as an application
response time problem: My e-mail is too slow or I have to wait forever
to get confirmation on an order entry. The help desk can test for network delay by
pinging both the end-user and the application server to see what response times are being
experienced. ping is a good ballparking tool, but it isnt an accurate
measurement of the delay that the user is actually experiencing because it executes at a
different protocol layer than the users application. Deeper troubleshooting is
required.
The performance of an individual application depends on the inter-related performance
of many individual network elements. Traditional network management systems are successful
at focusing on the individual components and devices of a network such as routers,
servers, and switches in the LAN, or DSU/CSUs, multiplexers, and switches in the WAN.
Application Flow Management isnt meant to replace these traditional tools but
rather to look at the data in the connections between these devices. Its the only
tool with visibility into the application layer to find the root source of a problem when
all of the individual network elements are reporting no physical or logical failures.
When a performance problem like this develops, it frequently consumes multiple IS
resources. Help desk personnel will pass what sounds like a network problem to the LAN or
WAN manager, who may investigate and determine its not a network problem and pass it
on to a systems guru. If the systems manager eliminates his server and database equipment
as the source of the problem, then its passed on to the application developer. Each
group uses a different set of tools for troubleshooting their piece of the network, but
none of those tools are capable of seeing the big picture from the end
users point of view.
One of the goals of Application Flow Management is to provide a comprehensive tool that
not only delivers the big picture and the crucial new information to address each
groups separate issues, but also satisfies the necessary categories of network
management for all topologies:
- Real-time troubleshooting.
- Historical baselining and trending.
- Visibility to applications for end-user support.
- Capacity planning
This would be a good time to review the Enterprise Performance Management Model and lay
out some guidelines.
One simple way to approach the Enterprise Performance Management Model is to correlate
its three main layers to similar functions within the OSI layers:
- Layer 1 (physical) falls under control of Connectivity Assurance.
- Layers 2 and 3 (data link and network) are checked within Network Service Level
Verification and Analysis.
- Layers 47 are bundled into the Application Performance Assurance layer.
Application Flow Management occurs at Layer 7, but it relies heavily on the stable
foundation provided by the two lower layers. After all, if you are not connected to the
network or packets are not getting through, then you know that application delivery is not
being met either. Troubleshooting should always start by eliminating causes at the lower
layers first.
CONNECTIVITY ASSURANCE
Connectivity assurance includes physical layer connectivity checks and functions such as
diagnostics and data collection. Connectivity testing tells you whether or not the network
and its various components are functional, but it does not tell you how well it is
functioning. At this layer, we still rely heavily on discrete network components to feed
us information on basic device health and status and network port and link statistics.
NETWORK SERVICE LEVEL
Once you have determined that your basic LAN and WAN physical connections are up, then you
need to proceed to the network service level assurance layer to address a possible WAN
performance issue. Luckily, WAN performance tools have been continually fine-tuned since
the days of manual physical loopbacks and external BERT pattern testing. Advanced
management tools such as Simple Network Management Protocol (SNMP) and Remote Monitoring
(RMON) have greatly improved life for the network manager.
WAN reliability and quality have also increased dramatically enough to allow service
providers to offer specific service level agreements (SLAs) on WAN performance. An SLA
will guarantee explicit levels of network performance based on three key measurements:
network availability, latency, and throughput.
Regardless of whether you have a contractual SLA, you always need to be able to measure
those basic network parameters to quickly determine if the WAN is the problem. Accurate
measurements of latency and throughput with real-time and historical troubleshooting
capabilities will help you quickly identify a WAN problem that can be turned over to your
carrier for resolution. RMON is the leading technology used for the collection of network
performance statistics for real-time and historical reporting. In North America, the
DSU/CSU is a natural place to integrate RMON intelligence, since it historically has been
the agreed-upon demarcation point between the service provider and the user.
WAN devices with embedded RMON functionality can provide real-time and historical
reports showing network performance, throughput, latency, and availability for SLA
verification. Detailed reports can be generated on both link and circuit utilization for
capacity planning. With this information network managers can assess whether they need to
subscribe to more or less bandwidth. WAN links are the most expensive item in a telecom
budget, bar none, so proper sizing and utilization are vital.
APPLICATION PERFORMANCE ANALYSIS
Now that you have used connectivity assurance and network service level assurance to
eliminate the WAN, whats next? This is where application performance assurance steps
in to provide the necessary visibility into activity at Layers 37. Its not
just for response time problems, either. For example, application performance assurance is
useful when trying to determine how well the network will respond to changes in the types
of traffic being carried. If a network manager wanted to know what the impact of migrating
SNA traffic onto the network was going to be, he/she could use application performance
assurance tools to build predictive models that would show the effect on the network.
METHODS
Measuring and tracking the response times of critical applications allows you to set
expectation levels with your users and provides you with key information during
troubleshooting. To address our users complaint that order entry delays are
intolerable, the three main techniques that could be used for further evaluation are:
- ICMP/protocol ping.
- Application simulations.
- Probing and observation of actual application layer requests from the user to the server
and back.
The industry standard ICMP (Internet Control Message Protocol) ping is not the most
comprehensive, but its the easiest and most common method used to check basic
connectivity and response time. A ping is sent to a remote device and network flight time
is calculated based on the arrival time of the echo reply.
The next two methods, application simulations and probing, get you to the right point for
troubleshooting the application layer but they will require an investment in
either additional hardware or software tools.
For application simulations, some tools use scripts to emulate a user transaction
across a network and then measure the response time, throughput, and connectivity based on
the simulated transaction. Skinny software agents are loaded onto network
elements most likely the application server and the remote clients. Simulations
have some limitations in that they are launched after the fact and will not be reflective
of the users exact experience. If you were troubleshooting with simulations across a
WAN, an additional concern would be the increased traffic load that may impede other
users.
The other application level approach is to observe actual user transactions. By placing
monitoring tools in local and remote sites you can obtain the information you need for
complete response time calculations. The monitoring tool closest to the user (client) will
clock the users request as it leaves the site and clock it again as it returns to
arrive at the total application response time. The monitoring device in front of the
server will clock it as it enters the server and as it exits. This will be the application
server think time. Subtracting server time from application response time you
get the network flight time. You can observe the end users actual
request and the servers response.
Monitoring also does not add traffic load to the network. By baselining and trending
this information over time, you can establish predictable response times and identify
which, the server or the network, is causing the delay.
SUMMARY
Application Flow Management is all about managing the network for the business use of the
company. When the success of your company depends on the performance of your
business-critical applications, then you must be aware of the health of the network. With
the right tools and information, you can prioritize resources, direct the right people to
correct problems at the right time, decrease the time it takes to close trouble tickets,
better manage your network budget and perhaps even reduce it. Ultimately, you will have a
very satisfied user population. Isnt that what its all about?
Theresa McGuire is product marketing manager, Paradyne Broadband Access Solutions,
and Eileen Haggerty is business development manager, NetScout Systems. Paradyne and
NetScout are partners in providing standards-based solutions designed to monitor and
ensure frame relay network service levels and monitor distributed applications from end to
end across enterprise networks. For more information, please visit their Web sites at www.paradyne.com and www.netscout.com.
|