In a previous article we talked about RMON, the SNMP Remote Monitoring MIB. We can use RMON management software to set up network monitoring in LAN and WAN probes. We can then retrieve data, graph it, and produce reports. RMON includes packet filtering and capture. With good RMON software, we can do remote packet analysis. (Distributed Sniffer (TM), but standards compliant). The RMON2 standard extends this, providing visibility and statistics at higher layers of the OSI Reference Model. This gives visibility to application layer data flows, very useful for managing and modeling a network.
We also noted that certain IOS 11.2 images implement full RMON capabilities. Also, and more crucial to this month's article: the Cisco IOS now implements at least 2 RMON groups, alarm and events, as of IOS 11.1. Some of the Catalyst switches also implement these groups (Cat1200, Cat5000).
The previous article showed (briefly) the configuration commands that exploit these two groups. I'll bet 90% of you zipped right past those sleep-inducing configuration commands! Well, good thing you did!
This month we'll talk about the Threshold Manager, a relatively new tool for CiscoView (CiscoWorks) that eases use of the event and alarm RMON groups. If you've got support, the price is right! And the user interface is good enough I think Cisco should clone the programmer!
Note: Threshold Manager was developed for CiscoView 4 and has not been carried forward to CiscoView 5 (Web CiscoView). 7/24/2001
Just testing if variables exceed or fall below threshold values doesn't do us much good unless we've got a way to get the information out of the device. TM offers us a choice. We can
When active on-screen, TM also retrieves the log table periodically from the router it is pointed at. So we can easily see what's been happening within that router. Maybe it sent us a trap, or we have some network problem, and we use this capability to review what else has happened recently in that router. Imagine responding to a trouble-ticket saying that the network is slow. You figure out what WAN link is pertinent, pull up TM, and there's a bunch of WAN utilization events in its log. That gets you started. RMON and RMON2 can then help track down heavy users of the link, if that's necessary.
As you'll recall, RMON doesn't repeatedly alarm when a threshold is repeatedly crossed. There's a rising and a falling value with each threshold. If we cross the rising-threshold, no further alarms occur until the variable being watched drops below the falling-threshold. And vice versa. That cuts the nuisance alarms significantly.
The default policy has 18 pre-defined thresholds for SNMP variables:
In particular, the above variables let the router tell us when it gets too busy, or when free memory gets low. The router can also tell us when interface utilization gets too high. Be careful with ifInOctets and ifOutOctets, which only count packets to/through the router. They don't fully reflect LAN utilization. Instead, enable "rmon promiscuous" (carefully) on an Ethernet. This watches all packets on an Ethernet, at some cost in CPU performance. You'll need to manage etherStatsOctets for this to do you some good.
Did I mention that the interface-specific variables can automatically be checked on each router interface? TM gets a list of interfaces from the router, and sets up an appropriate threshold for each. We click on the ones we're interested in, and "enforce" those. That's pretty simple!
To do all this, TM has to look at the interval specified for testing the variable under scrutiny, say ifInOctets. It then calculates from the bandwidth (obtained by SNMP) how many bytes it takes to exceed the threshold utilization percentage over the test interval. (RMON doesn't do much in the way of calculations, so all we can really check with variables like ifInOctets and ifOutOctets is whether they increased by more than some number). That saves us a bit of fiddling with numbers.
Let's walk through the sort of calculation TM does behind the scenes. Suppose we're checking ifOutOctets every 60 seconds on a 56Kbps line. Suppose we're going to have the router tell us when utilization hits (passes) 80%. TM will calculate that 60 * 56,000 = 3,360,000 bits might move on a 56Kbps line in 60 seconds. Divide by 8 to get 420,000 bytes. Our threshold is 80% of that, or 336,000 bytes. So the router is to check ifOutOctets, check it again after 60 seconds, and if ifOutOctets goes up by more than 336000, it is to let us know.
Having TM calculate this for us is a real convenience!
By the way, we don't have to figure out which SNMP variables are available in each device. TM has some built-in knowledge of what each Cisco device has available in it.
If you have support for CiscoView or CiscoWorks, you should be able to get to the CiscoView Upgrade Planner on www.cisco.com. Go to the CiscoView Upgrade Packages area. The URL is http://www.cisco.com/cgi-bin/tablebuild.pl/cview311.
Look for stuff starting with TM (for Threshold Manager) and use your browser to ftp a copy back to your site. You'll need to match your operating system in the name of the file you download. It runs with CiscoView for various flavors of UNIX, also for MS Windows (95 or NT).
Use tar to unpack, then cvinstall to install TM. Run it from CiscoView, or from the command line. The documentation is included in the download as tmgsg.pdf (Adobe Acrobat viewable: you've got Acrobat with the Cisco documentation CD). There's also an informative readme file, which I'll summarize mercilessly in what follows.
Or start from the command line:
tm -I ip_address -p $NMSROOT/etc/cview/devices/Threshold-Mgr(or -n host_name). The "-p" and arguments tells TM where to find its policy files. Other command line options specify SNMP community strings, timeouts, etc. The PDF documentation covers the options.
When we enforce a TM policy, we're setting things in the router by SNMP. Not too surprisingly, those settings are visible in the running configuration on the router. Save that to NVRAM, and you don't have to re-enforce the policy after a router reboot.
That suggests a procedure to me. Suppose you have 100 or 1000 routers that you'd like to use TM with. Maybe you use TM to set up one router, and try that out for a while. As you gain experience with TM, you vary some of the default policy thresholds -- your tastes vary from Cisco's. Maybe you add some additional thresholds.
Don't go overboard, the router may not like it. There's no point in logging a zillion events that you're never going to look at or act upon. There's even less point to torture-testing your network with a zillion SNMP traps reporting such events!
Well, OK, now we've settled on a pretty good policy.
We could point TM at various routers, click, and set up thresholds on our other 99 or 999 routers. That might get a bit tedious! But when we look at our test router's configuration, we see lines like the following. And we can use the CiscoWorks Snap-In Tool (User Command set) to feed similar configuration lines out to all our other routers with similar interfaces and bandwidths. (Or whatever other method we use to bulk-configure routers. Carefully.). Do whatever is easiest, don't feel like you have to use the tool all the time!
The following are from a 2509 router with Ethernet and two serials, bandwidths 1000 bps and 125000 bps. I added this to a recent CiscoWorks class, and set some routers up to really try TM out. [I've obviously deleted some lines for brevity]. Here's some of what we saw:
rmon event 1 log trap public description "Rising Event for ifInOctets" owner admin
rmon event 2 log trap public description "Falling Event for ifInOctets" owner admin
rmon event 3 log trap public description "Rising Event for ifInOctets" owner admin
rmon event 4 log trap public description "Falling Event for ifInOctets" owner admin
(etc.)
rmon event 13 log trap public description "Rising Event for locIfCarTrans" owner admin
rmon event 14 log trap public description "Falling Event for locIfCarTrans" owner admin
(etc.)
So these are all the events, all the things that can happen, including descriptive text. Each one has a sequential number. We can see the sequence numbers in what follows, where for example the first alarm uses events 1 and 2.
rmon alarm 1 ifEntry.10.1 60 delta rising-threshold 37500000 1 falling-threshold 18750000 2 owner admin
rmon alarm 2 ifEntry.10.2 60 delta rising-threshold 3750 3 falling-threshold 1875 4 owner admin
rmon alarm 3 ifEntry.10.3 60 delta rising-threshold 468750 5 falling-threshold 234375 6 owner admin
rmon alarm 7 lifEntry.21.1 60 delta rising-threshold 10 13 falling-threshold 10 14 owner admin
rmon alarm 8 lifEntry.21.2 60 delta rising-threshold 10 15 falling-threshold 10 16 owner admin
rmon alarm 9 lifEntry.21.3 60 delta rising-threshold 10 17 falling-threshold 10 18 owner admin
So when our Ethernet octet count is tested for changes ("delta") every 60 seconds, if it increased by more than 37500000 (50% utilization) we've crossed the rising threshold and send a trap or log the event. We then must have a 60 second period with less than 18750000 octets (25% utilization) -- which causes a falling-threshold event -- before we get another rising-threshold event.
Looking at this, we can also see the differing octet counts set up automatically for us by TM.
The following suggests the sorts of information available through the TM window. Note the log entries, which are what show up in the TM main window.
tokyo>show rmon event
Event 1 is active, owned by admin
Description is Rising Event for ifInOctets
Event firing causes log and trap to community public, last fired 00:00:00
Event 2 is active, owned by admin
Description is Falling Event for ifInOctets
Event firing causes log and trap to community public, last fired 2d05h
Current log entries:
index time description
1 2d05h Falling Event for ifInOctets
Event 3 is active, owned by admin
Description is Rising Event for ifInOctets
Event firing causes log and trap to community public, last fired 2d05h
Current log entries:
index time description
1 2d05h Rising Event for ifInOctets
(etc.)
Event 9 is active, owned by admin
Description is Rising Event for ifOutOctets
Event firing causes log and trap to community public, last fired 2d06h
Current log entries:
index time description
1 2d05h Rising Event for ifOutOctets
2 2d06h Rising Event for ifOutOctets
(etc.)
Dr. Peter J. Welcher (CCIE #1773, CCSI #94014) is a Senior Consultant with Chesapeake NetCraftsmen. NetCraftsmen is a high-end consulting firm and Cisco Premier Partner dedicated to quality consulting and knowledge transfer. NetCraftsmen has nine CCIE's, with expertise including large network high-availability routing/switching and design, VoIP, QoS, MPLS, network management, security, IP multicast, and other areas. See http://www.netcraftsmen.net for more information about NetCraftsmen. Pete's links start at http://www.netcraftsmen.net/welcher . New articles will be posted under the Articles link. Questions, suggestions for articles, etc. can be sent to pjw@netcraftsmen.net .