CNC Logo

Performance Monitoring

Peter J. Welcher


Introduction

In teaching the CiscoWorks class and doing network management consulting, one of the hot topics I frequently encounter is "what tool should I use for Performance Monitoring?" I've looked at a few of the products that are available, and I'd like to share some of my impressions with you the reader this month.

Just as "Network Management" seems to mean different things to different people, so does "Performance Monitoring". This whole topic seems to be good for starting debates, so I'm expecting to get lots of email telling me about neat products I overlooked, things I got right or wrong, etc. I'm extra opinionated this month, so don't let me down!

Device Managers

Performance Management might refer to an occasional look at a device to see how it's doing. The look might be a Cisco router show command, or it might be via some graphical interface, perhaps CiscoView. That's certainly important. It's going to become a lot simpler to do this, from what I'm hearing. A lot of vendors will be coming out with Web-Browser network management (monitoring?) tools. I'd prefer to call such tools "Device Managers". They are typically used to interact with a single device.

Vendors certainly have an incentive to provide such Device Managers. The use of a Web format frees the vendor from having to worry about the user's choice of computer platform. Less obviously, it frees the vendor from having to code to the requirements of HP, IBM, Sun, or Cabletron. Reduced Time-To-Market and simplicity means less cost, which is a darn powerful incentive for vendors to do this sort of thing. It may well mean more innovation, with vendors using the quality and power of their Web interface as another feature distinguishing their product.

That's great!

I do find myself asking, why would we want to interact with a single device? There seem to be two main reasons:

(How often do you have the time to browse your routers and switches, perhaps port-by-port, looking for trouble? I don't think we've had any blue moons lately!)

Moving Beyond Single Devices

What people do from this point on seems to vary a good bit: The first of these is more of a trouble-ticket response and resolution technique. It still requires you have a trouble report before you start to react. Realistically, that's the level a good many organizations are operating at right now. It works, it's all that folks have time for, and it certainly is possible to spend a great deal of money and not end up with much more capability.

Most organizations would like to be pro-active, however. That's the attraction of the Performance Management software and/or the RMON / Trap approach. The difficult part lies in the details, and it's curiously hard to tell how easy or hard a product is to put into use from the press releases. That means word-of-mouth or consultants play a role in the adoption of such products, and it takes a while before they become prevalent.

Some analogies come to mind. Some Performance Management software is like the way some companies manage people. They have entralized management, with one first-level manager for every X employees, and one second-level manager per every X first-level managers, etc. This is what "distributed management" seems to come down to right now, generally with few management stations.

The RMON approach is like the TV stations that get people to call news in on their car phones. (How come we don't have a network protocol that advertises where to send SNMP Traps to?) The issue here is again, how easy is it to set up or change the setup.

What's best? Perhaps a combination of the two approaches. That's like using car phones to report emergencies, and otherwise reporting routine news from a home or office phone, in a regularly scheduled call.

For the RMON approach to be pro-active, we might have to send SNMP Traps not just when there's a real problem, but when some normal thresholds are exceeded. That's like having your smoke detector go off because you're exceeding the EPA smog limits. Since my home cooking does this frequently, I can honestly report that it leaves you with the urge to pull the batteries out of the smoke detector. If the Traps go to a trouble-ticket system, we may be able to distinguish between true "fires" and the messages that say "heads up, the WAN links are getting busy". It can be done, but it takes work to meaningfully filter messages based on Trap contents.

What Are We Really Looking For?

I'm looking for a little more. One of the best troubleshooting tricks is to be able to answer "what changed?" So I'd like to snapshot a network, so that I can watch it change over time. And so that when I see 80% utilization on a WAN link, I can find out (quickly!) whether the utilization has been creeping upwards over time, or whether this is a sudden increase.

But I don't want a wad of paper. Paper is bulky, and I can't search it easily. More than that, the "wad of paper" doesn't tell me what's important. I want reports that point me at potential trouble spots.

Since your wishes may be a little different, here's a list of features Performance Management software might provide.

Let's try looking at products, drop a few names, see what's available in the marketplace.

Performance Management Products

The first category of products that comes to mind is the Net Management base platform packages: IBM NetView for AIX (SystemView, etc.), HP OpenView, the former SunNet Manager, Solstice Enterprise Manager, or Cabletron Spectrum. The first three can do polling. NetView for AIX can even store data into one huge table in a SQL database. Graphics they've got, with printing via screendump in NetView/OpenView/SunNet Manager. Matches one of my wish list items, can match a few more with some work. Just do it in your spare time! (That is not very cost effective unless you're a Very Big Organization and can afford to customize things). Figuring out which interface is busiest can involved a lot of clicking. If you want printed graphs, more custom work or someone gets to click in various menus a lot. In other words, such tools are okay for trouble reporting. But for Performance Management, they're only good for watching a small number of devices, not so good if you want to watch 100-200 or more routers and get information rather than data .

A company named 3DV (www.dddv.com) has products which run on Windows NT. The product allegedly scales to 150-250 routers managed. The attraction of the product is simplicity: install it (and lots of disk space), point it at your routers, and let it cook. Every Monday morning you archive old data, move the data around, generate reports, etc. That's it! The downside is simplicity: there isn't much you can change. The reports look pretty useful, although the problem thresholds seem rather conservative (as in, if it tells you you've got a problem, you may really have a problem!) Graphics and drill-down add detail to the on-line reports. Telling statement (paraphrased) from an anonymous source: "It paid for itself the first time we used it. We spotted a couple of T1's that weren't being used. We'd botched the IP addresses on the interfaces."

Concord Communications (www.concord.com) has a router Network Health Report, and they are now also doing Frame Relay reporting. The UNIX-based Concord tools sound like 3DV on steroids, ditto the pricing. Looks pretty functional from the glossies but I haven't encountered anyone using the Concord software. My quick impression is that if you're in this price range you might be using some RMON probes instead, depending on your topology. If you really like or hate this product, please let me know (and why, and if it's ok to pass it on)!

Visual Networks (www.visualnetworks.com) sells a line of CSU/DSU units with integrated WAN RMON-like probes. They also sell reporting software. I believe their line is now also being resold as part of the HP NetMetrix line. Specialties include per-PVC traffic-by-protocol breakdown. You might be able to achieve much the same result if you have a star Frame Relay topology with the hub router on an Ethernet backbone, by using an RMON probe on the backbone Ethernet. The Visual software comes in PC or UNIX versions, and can be used in a star topology with just one of their probes, providing some minor loss in reporting capability. If you use exclusively their probes, proprietary CSU/DSU to CSU/DSU delay measurements are apparently available.

HP NetMetrix is perhaps the most complete RMON probe/software product line, and also has a large market share. It consists of various software packages and hardware probes. NetMetrix can apparently profile network traffic patterns and then set its own thresholds, which is useful if you're pressed for time. It looks good in the glossies. It also looks like you can spend a lot of money on it.

RMON software in general is a moving target right now. Frontier Software (www.frontier.com) is partnered with Network General, Cabletron, and Cisco, and in terms of market share they are #2 after HP. Competitors Axon and Armon recently got bought by 3Com and Bay Networks, which may affect their future impact in terms of innovation (in which direction, I'm not sure). In general, network management software is expensive to develop and a relatively small market, so it's hard to predict what we'll be seeing in this category in the future. The core functionality usually is a RMON probe Device Manager, software that reports on what the probe is seeing. Usually meters show LAN traffic utilization, errors, and so on. Packet analysis and multi-probe reporting capabilities are natural directions to grow this feature set in.

TRENDsnmp+ (Desktalk Systems, www.desktalk.com) is another interesting-looking product. It comes with Sybase runtime database and runs on UNIX. The GUI lets you set up polling, build reports or graphs, and automate printing of reports or graphs if you so desire. It comes with an impressive-looking set of canned reports, including level-of-service agreement reports and graphs. If you don't like level-of-service weights, you can define your own. Free 30-day trial, some minor quirks, but my impression and the feedback so far is "pretty good, and a heck of a lot cheaper than most alternatives." They also have a scheme for distributed pollers in a large network, with TCP-based regular (nightly?) transfer of data back to a central database.This looks like a very good tool if you want regular network reporting, access to old data, and automated disposal of data after a specified period of tme.

Netsys (www.netsystech.com) Performance Tool I'm very familiar with, from teaching the class on it. This is a superset of the Connectivity Tool, running on UNIX. The idea of the Connectivity Tool is to feed your Cisco router configs to the Connectivity Tool, which parses them, provides a sanity-check and error report. It also displays your network topology, as inferred from the config files. It simulates routing tables (including access lists, distribute lists, IPX SAP filters, redistribution, etc.). A previous article discussed this aspect of Netsys in more detail; see http://www.netcraftsmen.net/welcher/papers/netsys.htm . If you're trying to do Cisco router config offline testing, problem-resolution, or change control, you need this tool now.

The Netsys Performance Tool builds on the above capabilities. It automates collection of MIB-II and Cisco MIB data, or IP Accounting data, or Frontier Extended RMON data. It comes with a script for translation of Sniffer(TM) CSV files to its format. The ASCII data format is well-documented, and a simple Perl script would be all you'd need to convert any reasonable data format into the desired form.

The idea of the Performance Tool is to snapshot your network's traffic and other information, load it in, and run an analysis on it. There is Cisco internal router performance modelling data built in, so the tool can estimate router CPU utilization. It builds text reports that can be accessed on-line, with drill-down for interface and similar details. On-line reports can be sorted on any column by clicking on the column header, a very nice touch. There are also summary reports pointing to key routers and links. All this can be printed or saved to file as desired.

But the neat part of this is that the reports are also visually available, as a colorized form of the network topology map. So, for example, the Utilization report colors links and routers to reflect their respective percent utilization. Glance at the map, and see in red or orange all the busy links and devices. Numerous other colorized displays provide insight into your network. They include (among others):

The best part of all: you can simulate link or router failures, re-analyze, and watch what that does to the utilization and other statistics! Best comment I've heard: "My boss doesn't like tables; they're too technical. He understands a map showing red."

Seriously, I'm very impressed with the Netsys products. The programmers keep doing extremely neat stuff, the user interface is very good, the software is very functional for its short time in existence, and the company is partly owned by both Cisco and Network General so it should stay healthy. You may have heard about the Performance Tool as a modeling tool, but lately I've been thinking of a Performance Management tool as well. It's a little more focussed on examining a snapshot of network performance in detail than on regular reporting. But that seems to be the way most folks work. The Perl scripts that actually collect the data (driven from a very good GUI normally) appear to be capable of being driven from the UNIX command line, in which case cron and a small amount of Unix scripting might suffice to set up regular data collection.


Dr. Peter J. Welcher (CCIE #1773, CCSI #94014) is a Senior Consultant with Chesapeake NetCraftsmen. NetCraftsmen is a high-end consulting firm and Cisco Premier Partner dedicated to quality consulting and knowledge transfer. NetCraftsmen has nine CCIE's, with expertise including large network high-availability routing/switching and design, VoIP, QoS, MPLS, network management, security, IP multicast, and other areas. See http://www.netcraftsmen.net for more information about NetCraftsmen. Pete's links start at http://www.netcraftsmen.net/welcher . New articles will be posted under the Articles link. Questions, suggestions for articles, etc. can be sent to pjw@netcraftsmen.net . 



9/96
Copyright 1996, Peter J. Welcher