July 2009 | Volume 12 / Number 7
Network Monitoring in the Enterprise
By: Richard “Zippy” Grigonis
Network monitoring often conjures up such minutiae as router traffic, Quality of Service (QoS), Quality of Experience (QoE), intrusion detection, reporting and alerting on network and systems availability, usually under the control of a friendly, web-based GUI. But in fact the concept of network monitoring is becoming part of a much larger monitoring process that involves the real-time examination and tweaking of the whole business process.
That’s not to say that paying attention to details such as QoS and network elements isn’t important. Take ADTRAN (News - Alert), a prime supplier of network and communications equipment. Their Enterprise Networks Division specializes in Internetworking, VoIP and IP telephony solutions to help Small-to-Medium-sized Businesses (SMBs) and enterprises implement voice, video, data and Internet connectivity over various kinds of wide and local area networks.
Jamie Britnell, a product manager in ADTRAN’s Enterprise Networks Division, says, “Any of our products capable of running voice, such as our NetVanta line or Total Access 900 series, is now subject to our Voice Quality Monitoring [VQM], which moves beyond Quality of Service [QoS] thus allowing the network administrator to examine the full data stream and identify problem areas down to the packet level using a GUI. We saw the need to implement some type of voice quality monitoring as networks transition from just a pure data pipe to more of a mixed voice and data model. Obviously, VoIP is essentially data, but you need to classify that traffic in a different way, and the QoS aspect of it is quite important compared to ‘ordinary’ data that isn’t real-time in nature. So the VQM takes a look into all of the data traffic, and specifically picks out the voice part, and allows you to do network troubleshooting and monitoring of your voice traffic.
Britnell continues: “We look at things as MOS [Mean Opinion Scores], packet delay, jitter, packet loss and discarded packets, using an algorithm that generates that information. All this allows the service provider to look and see what’s going on with quality on the customer side. A lot of voice quality monitoring in the past had been in the network, using things such as Brix and Empirix (News - Alert)-type probes, and there wasn’t a lot being done by the CPE or on-site itself. With our VQM, we integrated all of that technology into the CPE device without the need to resort to some other type of probe at the customer side — everything can now be integrated into the device itself. We put all those features and functionality into our CPE device and we look at both incoming and outgoing RTP streams so you can do real-time measurements for VoIP calls. We look at both RTP streams and we can determine all of the information I mentioned earlier related to what’s going into and out of the router on the customer side. This allows the service provider to take network monitoring one step further and actually monitor the voice traffic. In looking at the RTP streams, in the stream from the service provider network the call quality could be good but the call quality from the customer side could have some issues. We troubleshoot everything from the CPE device.”
“Then there’s our n-Command Enterprise and MSP editions of the platform which can aggregate all of the information and help the CPE devices report to us and some third-party application like a Brix or an Empirix so you get a full end-to-end analysis of what’s going on,” says Britnell. “The n-Command suite of managed services software and network productivity tools for NetVanta and Total Access-based networks helps IT administrators in that it simplifies network operation, device installation and configuration with quick device discovery, global modification of features and Access Control Lists, and you can automate individual or network-wide firmware upgrades, configuration changes or backups. It’ll also let you view and report content and historical VoIP performance statistics and generally monitor the overall health of the network by location or customer. But in terms of what VQM does, all you need is a CPE device on the customer side of the router or the IAD, and we calculate that information internal to our MOS. It’s not only good for the service provider, but the enterprise customer as well. If a company has an IP PBX (News - Alert) or distributed enterprise type of application, they can use this information to determine call quality issues too.”
“One differentiator we have that makes our products easier to use is our GUI,” says Britnell. “Information may be out there in other equipment but sometimes it’s not presented in a very intuitive fashion, and that can be a burden for those smaller enterprise customers who may not have a full-time IT staff. So we saw the need to present all of the information in a friendly, accessible, interactive GUI.”
Whose Network is It, Anyway?
It’s becoming difficult to talk about enterprise monitoring without talking about what service providers are doing, since network connectivity to partners, suppliers and customers is on the rise. Federating data, SIP trunking and various other practices can sometimes give enterprises the impression that they’re in the network business, and the mix and interplay of outside services, overlays and CPE-based technologies adds to the complexity of any given network, making the definition of ‘network monitoring’ an interesting challenge in itself.
Over at Computer Associates (News - Alert), Steve Guthrie, CA’s Product Marketing Director, says, “Certainly there has been a lot going on in the area of MPLS management for service providers and large enterprises. Why is this important? Because in communications service providers are looking at their MPLS backbone as the delivery mechanism for new services, whether it’s metro Ethernet of IPTV (News - Alert) or video-on-demand, or hosted VoIP applications or even managed services. It’s fair to say that MPLS took its time maturing and getting its ‘feet on the ground’. But that’s behind us now and it’s the future of networking. They’re even running ATM and frame relay over it. So we see service providers needing good, deep visibility into their core traffic engineered networks and having the ability to rapidly identify where any problem originates and resolve it, because now we’re talking about service level agreements, revenue generating services and ‘stickiness’. This new technology that we’re announcing is the Spectrum (News - Alert) MPLS transport manager. I asked an ISP a few days ago what all of this meant to him, and we started talking about the cyclotrons at CERN, which generate a firehose of raw data that is pushed out of CERN and it goes over an MPLS backbone network, and here in the U.S. the Energy Sciences Network receives that data and then pumps it out to Lawrence Livermore or to Fermi Labs, who then process the data and throw it back to CERN with analysis and meaning. If any degradation or outage occurs in the MPLS backbone, then that data will probably never get processed, because there’s so much more data coming down the pipe. I don’t think they discard it, but they never get a chance to process it. That’s why uptime is important to all of our products. CERN’s MPLS network has been able to achieve 99.997 percent uptime. Whenever they have an outage, they need to resolve it quickly in order to get that data back into processing mode. They figure that with our CA (News - Alert) Spectrum MPLS Transport Manager, they’ll reduce the time to repair/resolve by an order of magnitude.”
“What does that mean? I’ll tell you,” says Guthrie. “An easy problem could take 10 minutes to resolve today, but with the CA Spectrum MPLS Transport Manager, it might take as little as a minute. That’s the one feature that I think is important on the Spectrum side of the house. On the enterprise side, our CA eHealth Performance Manager helps you improve IT service quality, increase IT efficiency and reduce cost, since it proactively manages performance of the enterprise voice and data networks, physical and virtual servers, databases, and client-server applications. We continue to improve its proactive performance management capabilities for VoIP systems and with our latest announcement we’re focused on the Cisco Unified Communications (News - Alert) Manager as well as Siemens HiPath.”
“CA has a third product,” says Guthrie. “It’s called CA eHealth for Voice. This is a standalone solution that handles IP telephony for Cisco, Avaya (News - Alert) and Nortel systems, and TDM or legacy telephony management for Avaya and Nortel. What we’re doing with eHealth Performance Manager Release 6.1.1 is that we’re moving the Cisco IP telephony support into eHealth and we’re adding coverage of Siemens (News - Alert) HiPath. There are several reasons we’re doing that. First, IT departments are asking us to help them reduce the number of tools that they use. We have clients with up to 58 monitoring and management tools and they’re trying to reduce that. Recently we worked with 2 companies. With one we helped them go from 26 to 6 tools, and in the other case we helped them transition from 14 to 7 tools. So there’s clearly a desire and trend for IT to reduce the number of tools they use. They tell us that it would be ideal to move IP telephony management into the eHealth product proper.”
Brian Bakstran, CA’s VP of Product Marketing, says, “Next, eHealth has some very significant built-in intelligence concerning ‘time-over-threshold’ and ‘deviation from normal’. That allows us to filter out what I’ll call ‘infrequent spikes’. In the case of telephony, the CPU of the call processing server and its memory usage may spike because of a transient high call demand. So you have a CPU that spikes once or twice in any given 1-hour period, say, and you must ask yourself whether that’s something you really should take a look at, or is it truly transient and it’s going to come-and-go, and it’s not going to have an impact on your users. With time-over-threshold and deviation from normal, eHealth is able to filter out those transient events and only alert you concerning what it determines to be systemic events. This is powerful technology based on built-in intelligence, to help IT focus on real problems and not be distracted by the occasional spike or two.”
Monitoring as a Service
Brad Canham, Vice President of Business Development, says, “Once a company signs up and has our service, they can pick the various pieces of monitoring tools they need to use. It’s different than some other companies. It’s not an ‘a la carte service where you buy, say, VoIP monitoring and that’s what you get. We provide the whole suite and then people can use a tool and then stop using it, and then add more tools as they see fit. That is a trend in the industry that makes sense. Companies wanting to use services as they're needed. It's the bene-fit of a SaaS suite rather than buying a big application and piece of infrastructure to do monitoring.”
“In terms of what we monitor and why, our focus is on server performance monitoring,” says Canham. “That’s how Dotcom-Monitor (News - Alert) was built from the ground up — for IT administrators who are responsible for SLA requirements and server performance. So all the pieces of the product — everything from detection to the diagnostics on the back-end to the notification processes — are very specifically built for SLA management and server performance monitoring. A key piece that makes Dotcom-Monitor a bit different is just the way we manage volume. The Dotcom-Monitor user interface and the way the workflow is set up, are arranged to ensure IT administrators can easily large volumes of tasks and/or ‘devices’ in a straightforward manner.”
Vadim Mazo (News - Alert), President and CTO of Dotcom-Monitor, says, “With SaaS, people come to our website and they sign up for a package of functions and devices they want to monitor. We have servers that keep the data and we have 15 distributed agents around the world, on separate backbones. They give good coverage of the worldwide Internet infrastructure. Those remote agents perform actual monitoring. For example, if you have a website and you would like to makes sure it’s accessible from Japan, Germany, Canada, Texas, and so forth, those monitoring agents will perform the actual tests. And so we can help you figure out whether there’s a problem with, say, the Asia network.”
Dotcom-Monitor’s new SIP Monitor for SMB VoIP service manages SLA requirements and hybrid VoIP traffic routes and proactively mimics the end-user’s perspective from external locations, rather than only relying on passive internal network analysis systems. It can quickly identify and pinpoint VoIP connectivity error conditions. SIP Monitor is a simple, cost-effective external system, rather than a large, expensive in-house system. Indeed, it can be configured and managed with little or no IT expertise. Moreover, its proactive monitoring ensures that connectivity errors can be addressed before the errors become downtime problems for customers. The SIP monitoring service ensures SMBs can rely on their VoIP systems, VoIP service providers can monitor their infrastructure, VoIP wholesalers can monitor service provider connectivity and reliability, and VoIP VARs and managed service providers can count on client uptime and revenue.
Seeing the Forest from the Trees
As perhaps the world’s leading provider of broadband satellite services, networks, and products for enterprises, governments, small businesses, and consumers, Hughes Network Systems, a subsidiary of Hughes Communications (News - Alert), offers complete turnkey solutions, including program management, installation, training, maintenance and support for professional and rapid deployment anywhere, worldwide. Their Network Operations Center is staffed by engineers who monitor customer networks 7x24. It’s all backed by an extensive field operations organization that provides quick service to all Hughes customers.
Hughes Network Systems (News - Alert)’ Doug Medina, Senior Director of Marketing, says, “I’m on the B2B side of our business. Since 1985 we’ve been servicing large enterprise customers. Walmart was one of our first customers and still is. Our strength is providing managed network services to large, distributed enterprises in North America, such as BP, Sonic Restaurants, Exxon, Blockbuster, and so forth. The come to us because we can securely and cost-effectively interconnect all of their distributor branch locations back to their headquarters through a network management portal we call a Customer Gateway (News - Alert).”
“Everyone thinks we’re the satellite guys,” says Medina. “That’s true. If you’re a consumer who can’t get on the Internet because you can’t get DSL or cable, HughestNet is one company you’d approach for satellite service. On the B2B side, we’ve not only done satellites since the late 1980s, but we’ve also added a whole LAN-based suite of services. So not only do we sell broadband satellite, but we also offer cable, DSL, T1, 3G wireless and so forth, all under one network management umbrella. The way the service is delivered, everyone of our large enterprise customers gets the portal access, which is not too originally-named the customer gateway. It’s a managed network portal. For example, if you’re BP, you can get online and see your 12,000 locations. It doesn’t matter whether they’re on DSL or satellite, it doesn’t matter. You can see just with one view a U.S. network and all the locations with nodes in greed, red or yellow, depending on whether they are on DSL or broadband satellite. The first tier help desk can set trouble tickets. They can go in and actually see pictures of their installations. They can also see moves, adds and changes. They can also obtain performance and management capabilities. For example, you can see the top 20 applications that are running across the network, and you can get an idea if one site is very ‘chatty’ or using more bandwidth than you anticipated. You can ‘drill in’ and take a look at it and get network reports and graphs to determine what applications are running and why the thing is using bandwidth at 3 a.m. when that shouldn’t be happening. That’s our Customer Gateway Portal. One of its claims to fame is that it is totally agnostic to the underlying broadband infrastructure.”
“As for trends,” says Medina, “our customers, particularly when deploying large networks, and opening stores, for example, want the ability to schedule installations, schedule maintenance and see a picture of what the installation looks like in case there’s a problem. Since they’re doing tier 1 help desk and we’re tier 2 in support of their tier 1 help desk, they’re keen on that capability, because it acts as their eyes and ears out there. They can see a photo of what the installation is supposed to look like when its finished, and if there’s any problem they may tell a non-technician out at the store to plug something in, jiggle a cord or check to see if a particular light is on. You can also check to see if a site is using backup communications. Some of our customers actually buy dual paths, such as a landline connection and a wireless overlay connection.”
Monitoring Not Just Your Network, But Your Business Too
As business processes become communications-enabled, the formerly simple act of network monitoring becomes part of a larger overall monitoring activity involving the whole organization as a system.
For example, Nimsoft Monitoring Solutions (NMS) offers technology for monitoring the performance and availability of your entire IT infrastructure, both physical and virtualized. NMS strives to achieve ease of use and speed of deployment in monitoring, even when confronted with monitoring requirements ranging from emerging companies having a few servers to an MSP managing several network operations centers, or a Fortune 100 enterprise seeking to monitor vast mission-critical business services and the entire infrastructure. To that end (and also expanding on the idea), the Nimsoft Monitoring Solution consists of an integrated event, performance and availability, end-user response, service level and Business Service Monitoring (BSM), with bi-directional data integration into related applications such as CMDB (Communications Management Database) and service desk. Nimsoft’s broad approach means that they’re really presenting a business service and can aggregate data across various disciplines.
Among the many functional modules in Nimsoft portfolio, the new Nimsoft BSM Express can put IT infrastructure performance into a business context by monitoring key performance indicators of business services, and presents these metrics along with user experience and infrastructure health information, thus providing actionable insights into how to tune IT performance so it enables maximum business productivity. BSM Express thus shows how well IT services align with business goals, enabling the business to quickly visualize the health of their services and to account and measure the financial impact that poor performance or downtime have on the organization.
Nimsoft’s product suite also offers monitoring solutions for verifying SLAs, databases and applications, the latter giving administrators a complete, multi-layered view of their such apps as Exchange, Active Directory, VoIP, Apache, Tomcat, JBoss, Websphere, Weblogic, Citrix, and many others—as well as .NET (News - Alert), J2EE, and custom applications. Nimsoft can do everything from provide a picture of host server performance metrics to metrics on the actual performance experience end-users have, which aides in troubleshooting issues. (User experience is measured with both active and passive solutions, which can be used individually or in tandem.)
NMS’ single, administrator-friendly console monitors all core server resources and aides in centralized management of remote processes and services. Administrators can also leverage monitoring data using real-time alarm dashboards, performance trend reports, and SLA compliance reports. NMS “dashboards” can yield a single view of such metrics as help desk call statistics, application performance metrics, IT resource utilization, and much more. NMS dashboards can be accessed remotely via the web.
Chris O’Connell, Director of Product Management at Nimsoft, says, “We’ve been in the business for 10 years and have 800 customers, 200 of which are MSPs, a fact of which we’re quite proud. Then there are all the different verticals such as banking, government, retail, and so forth. We’re also quite proud of our three-tiered architecture, unlike that of many of our competitors. It’s very modern, ‘lightweight’ and adaptable. Our open application interface even enables you to code the agent probes. As IT gets closer to the nature of a business and everybody is paying attention to where the dollars are going, many of our customers have been asking us about Business Service Management. They want to allocate resources and processes to what they’re doing in terms of their business. They want to prioritize all of their IT based on the business, and not just some of it. Many customers ask for this technology. Back in July 2008 Nimsoft acquired a company called Indicative Software which had some great technology in the area which we adapted and have launched a new product called the Soft Monitoring Solution with BSM Express. Customers have the infrastructure and the applications, but now they need to understand how it all relates to the business process. They not only need a nice dashboard, but they need to be able to break out the data by the components of the supply chain. They need more information than just, ‘is the network and the applications running on it healthy or not’. They need to understand how the components are performing based on IT. The middle level executives and administration people are being squeezed by the CIOs and by the lines of business to deliver on the ability prioritize IT and show in the boardroom what IT is doing for the business and how it reacts to business challenges. It’s been an interesting year as we’ve been building our product line.”
“It’s all about visualization, as well as separation and aggregation of information, and also of dynamics,” says O’Connell. “This isn’t business intelligence where you run a report every night. This is a real-time dynamic where customers need to know what’s going on right now. They don’t have the luxury of waiting 24 hours or running reports. We see that there is a play where customers having 1,000 or so employees are really demanding this type of technology. Interestingly, MSPs are serving their own customers by delivering service portals, as it were, and they’re looking for something like BSM to plug into that portal so that they can be more transparent to their customers and show how they’re delivering on the contract that they signed with them.”
Getting Lost in the Machinery
What started out as a look at enterprise network monitoring progressed to an examination of how service providers and enterprises deal with each others turf, and finally how monitoring
The following companies were mentioned in this article:
ADTRAN - (www.adtran.com)
Computer Associates - (www.ca.com)
Dotcom-Monitor - (www.dotcom-monitor.com)
Hughes Network Systems - (www.hughes.com)
Nimsoft - (www.nimsoft.com)
Today @ TMC
ITEXPO West 2012
October 2- 5, 2012
The Austin Convention Center
The World's Premier Managed Services and Cloud Computing Event
Click for Dates and Locations
Mobility Tech Conference & Expo
October 3- 5, 2012
The Austin Convention Center
Cloud Communications Summit
October 3- 5, 2012
The Austin Convention Center