Network Management with Periscope NetMRI

   
  Peter J. Welcher and Kevin Stone
 
   


 

NOTE: the product name has changed from "Periscope" to "NetMRI" -- pjw.

Introduction

Last month's article was about managing networks and spotting problems. It can be found at http://www.netcraftsmen.net/welcher/papers/perf-dsniff.html. In that article, I mentioned our Network Health Check, service, where we analyze problems in networks. I have begun distinguishing between design and technical analysis. Both are components of what we do. I also think the distinction between problem-spotting and performance improvement is important, since improving performance in a solid network may require deeper analysis.

One of the reasons this is on my mind is that  we've been conducting field trials of a product from a  partner company (Netcordia) and associate and friend (Terry Slattery). 

This new product, Periscope NetMRI, in part automates some of the technical information gathering that we may do in our network health checks. We and other parties have worked with Netcordia on what the product does, what's needed, and understanding what  it does well and where improvement is needed. The tightly focussed and highly expert programming team at Netcordia has assimilated all the inputs, bug reports, and refined the product. (The guys there are very good, also fast!)  All this stimulus has had me thinking, what do I want  in a network management tool, and why is it appropriate, what's the "right" approach.

I thought this month we'd take a brief look at Netcordia's NetMRI Periscope. It is a fresh look at  a user-friendly automated easy-to-use network  tool. The article ends with commentary from a network that tried the product.  It was generous of them to allow public display of information that  could be taken to indicate a condition less than perfection -- in other words, a real network!

Please also bear in mind this is my take on the product, and that this article was not particularly solicited by Netcordia. They might well have chosen to emphasize other parts of the product had they written this article. I do know the folks there well, and had some input to the product design, so perhaps some conflict of interest. Both I and the product team would be interested in hearing what you like and don't like about NetMRI Periscope, and what you think your network management needs are.  See also http://www.netcordia.com.

Network Management Trends

There are two trends I've been seeing affecting network management.

First, products cost far too much, excluding many small to medium size companies from using them. And second, products are too hard to use. Again, small to medium sized companies are hit the worst by this.

If there are only one, two, or a few "network administrators" who have to manage servers, network devices,  firewalls, management software, etc.,  there is often little time to work with the management software. Worse, when you sit down to use it, you find out that recently-added devices aren't known to the product, or there is some other snafu (Situation Normal All Fouled Up) in using it. So you end up spending time fiddling with the network management tool, not getting results.

Netcordia sees all this as an opportunity to try a different approach to network management.  Initial quality seems high (despite being version 1.0). Price isn't cheap but isn't that expensive either. Some of the justification is that it saves perhaps some fraction of a full time employee, both in terms of automated expertise but also in terms of information it gathers.

We at Chesapeake Netcraftsmen see an opportunity here too. Automated information gathering makes us more productive in doing network health checks. That way we can either come in and just do an automated (quarterly?) network checkup with report, or we can let Periscope do its thing while we gather design and other information for a deeper look at the network.

About Periscope

Periscope is sold as an appliance. You'll want to warn the security administrator you're going to be using it (we'll see why in a second). Then you plug the small box  (3 x 6 x 10 inches) into your network. It requests an address via DHCP. It is also accessible via a private address if you put a PC on the same physical segment (VLAN) with another address in the same private subnet. The box runs Linux, but you don't need to know that, since all administration is done via a Web interface. By the way, don't obstruct the fan in the box, the product does get very hot to the touch! (A literal, hot product).

When Periscope discovers it is on a new network, the web interface comes up with a short series of screens asking you about the network it is on. The point to this is for you to specify CIDR blocks of addresses making up the network to be managed. You also specify the SNMP read-only community string you're using. That's about all you tell it in the present version.  Periscope also has some built-in SNMP community strings, considered "obvious guesses". Yes public and private are on the list. If you have multiple strings in your network, tell them all to Periscope, and it will figure out which community string belongs to which device. 

Periscope then starts intelligently discovering the network or part of the network you've told it to check out. It conducts some port scanning and characterizes the devices found (as PC, router, switch, firewall, etc.).  All this is visible through a list of discovered devices, along with numbers representing degree of certainty as to the device identification. 

This is why you do have to alert the security administrator: the initial characterization and port scanning should trigger warnings on an IDS. You probably don't want the security person(s) running around thinking there's a hacker probing from the inside. Of course, if you're cruel, you could reason that this is a test of whether that IDS output is actually being read and reacted to.

Looking for Trouble

So you plug the Periscope box in, give it a little bit of starting information, and it quietly starts discovering. After 24 hours or so, it has usually found out enough to start producing interesting results.

How do you usually spot problems in a network? Waiting for something to turn red, you only find out about problems when something fails. Ideally, you'd like to catch them earlier. Some high-end products like Concord produce various health reports, using metrics to prioritize what needs your attention. If you gather data, a TopN listing as in SolarWinds Orion does help show the hot spots or problem areas over some period of time. Threshold alarms to HP Openview or CiscoWorks Device Fault Manager console is yet another way.

Periscope does something a bit different. It adds some intelligent criteria to a TopN approach, and provides a daily report on current problem areas in the network.

The philosophy of Periscope is "no news is good news". That is, it is not primarily intended as a reporting tool, although the data is there and accessible. Who goes looking at graphs for fun? (Besides me, I mean.) What most people want is something that cuts to the chase and tells them about stuff they care about that needs fixing.  And that's what the Issues screen in Periscope does. See  the following figure.

By the way, the figures are all based on data from real networks, and shown with permission.  Addresses, names, and other information were changed so as to reveal little about the internals of the demo networks. The issues shown have since been resolved.

Issues Screen

The numbers after each issue indicate the number of devices with the problem. Periscope orders the issues based on severity, using a weighting scheme reflecting several factors (technical severity, number of devices, etc.). You then click on an Issue to obtain more detail. Here are some more issues:

Issues Screen 2

Clicking on an issue drills down to detail, such as the following:

Issue A

If you want, you can click on the device name or address. This gets you to a screen summarizing the device interfaces. From there, you can click to obtain various performance graphs. For example:

Image E

Note the steadily decreasing memory, indicating a memory leak. (Time for a new Cisco IOS image!).

It turns out there are various Cisco IOS images with SNMP problems, including memory leaks. Periscope now knows about these, and reports if it encounters them. If the problem is one that might lead to a reboot (per public Cisco bug reports), Periscope allows you to disable SNMP on the device. This is visible now as a separate issue in the issues screen captures above.

Here's another (very common) problem caught by Periscope:

Image F

If you have duplex full hardcoded into a PC and switch attempting to autonegotiate, you will have duplex errors and user slowness. This automates finding such ports.

Here's another interesting report from Periscope:

Image G

This shows VLAN's, identified by number, name, and root bridge. If you look closely, you can see that the root bridge for the selected switch is the switch itself (middle, top), and that it  is the only switch with that root bridge for VLAN 91 (left: count is 1). This happens when switches aren't cleanly trunked together, as can happen when stackables are centrally connected to routed ports, but not connected to each other (as was originally planned).

A similar report shows HSRP addresses. When no inactive backup peers are found, there's a problem. This may be getting added as another Issue even as I type.

Reporting in Periscope

Periscope contains a wealth of data. If the network is healthy, all the work it is doing is not readily visible to the user. So there is some overview reporting in the current product. Plans include automated emailing of PDF versions of these to managers who subscribe.  Future releases may also have more reports available.  For now, this is another way to keep an eye on your network.

To get to the Reports, you click on Reports. The Network Health report is the default choice. It starts with a score, which should be taken as a relative health measure for the one network being monitored. 100% means no problems. Points come off based on severity of issues and number of devices affected.

Image H

Scrolling down, we see:

Image I

This lets you see whether the number of issues is going up or down over time. 

A wide variety of graphs indicate numbers and changes for various aspects of the network. One of the graphs under  the Performance link follows. (There are plenty more where this one came from!)

Image J

Note that device uptime is reported for various lengths of time. So you can see at a glance that devices in this network have been up for  various lengths of  time. Reboots generally indicate a problem, stable devices just stay up and working.

Field Experience

Periscope has been field tested at a number of sites.  One such has agreed to be mentioned in this article.  The following comment comes via Kevin Stone.

Johns Hopkins Bloomberg School of Public Health as a leading international authority on public health is dedicated to protecting health and saving lives. Every day, the School works to keep millions around the world safe from illness and injury by pioneering new research, deploying its knowledge and expertise in the field, and educating tomorrow's scientists and practitioners in the global defense of human life. See http://www.jhsph.edu/ for more information about the School.

To accomplish these goals the School has put in place a state of the art data network.  As the School has grown so has the network, to over 200 network devices and 3500 computers.  Managing this network cost effectively without the help of diagnostic tools has become overwhelming.  Equally overwhelming has been the complexity and cost of most network management applications.  One of the priorities of the Information Systems department has been to document and put in place tools to provide the necessary data to ensure proper operation of the network, shown below.

The School has installed Periscope by Netcordia to help do that, and has realized many operational improvements through its use.  As an appliance Periscope solves the complexity issue by only requiring minimal configuration and providing browser based access.  Taking the expertise at Netcordia and putting it into the Periscope system allows even the most junior staff member pinpoint problems and take action.

Periscope has provided the School with invaluable information without inundating the staff with alerts and reams of data.  Device discovery and identification has allowed the team to identify rogue wireless access points and see what devices are on the network. This has been invaluable in the fluid academic research environment at the School.

The Issues screen with its ranked items and color coding allows for quick determination of critical problems  The network health report allows both management and the network team to have a single page that shows the relative health of the entire network and how this has changed from previous days.  More detail is readily available if the network team needs it for further monitoring and troubleshooting.

Overall the Periscope appliance has reduced the troubleshooting workload of the network staff while allowing the staff to improve the operation of the network with minimal setup time and configuration.

Kevin also notes that he likes the "Servers Alive" product, from Woodstone.  http://www.woodstone.nu. This product tracks service availability with a set of features that strike me as appropriate to Windows-based environments.  It looks to me (Pete) like a supported commercial counterpart to the Big Brother/Sister tools that were popular a couple of years back.

Conclusion

I like the Periscope product a lot. It provides easy startup and simplicity of use. It catches a bunch of things and reports them simply as Issues. If you have time to go poking through the reports, you can learn more. 

Periscope is not now a Fault and Event management system. You can buy HP OpenView or What's Up Gold for that.

Periscope is not right now a reporting product per se. It does tell you what you need to know about potential problems. Future versions may add a network map, more reporting, trending and / or a wider variety of issues. 

Please contact Chesapeake Netcraftsmen (http://www.netcraftsmen.net) if you'd like us to come do a Network Health Check, with or without Periscope. Please contact Netcordia (http://www.netcordia.com) for more information about the product, or to pass along what you're looking for in a network management product.


Dr. Peter J. Welcher (CCIE #1773, CCSI #94014, CCIP) is a Senior Consultant with Chesapeake NetCraftsmen. NetCraftsmen is a high-end consulting firm and Cisco Premier Partner dedicated to quality consulting and knowledge transfer. NetCraftsmen has eight CCIE's, with expertise including large network high-availability routing/switching and design, VoIP, QoS, MPLS, IPSec VPN, wireless LAN and bridging,  network management, security, IP multicast, and other areas. See http://www.netcraftsmen.net for more information about NetCraftsmen. Pete's links start at http://www.netcraftsmen.net/welcher . New articles will be posted under the Articles link. Questions, suggestions for articles, etc. can be sent to pjw <at> netcraftsmen <dot> net.

Kevin J. Stone is a Senior Network Administrator at the Johns Hopkins Bloomberg School of Public Health. Kevin has over 12 years in the networking field, with companies ranging from startup Fortune 1000 and now higher education experience. Kevin works with both server and the network equipment and likes to emphasize managability and monitoring in networks he runs.

9/8/2003
Copyright (C)  2003  Peter J. Welcher