|
NOTE: the product name has changed from "Periscope" to "NetMRI" -- pjw.
Introduction
Last month's article was about managing networks and spotting problems.
It can be found at http://www.netcraftsmen.net/welcher/papers/perf-dsniff.html.
In that article, I mentioned our Network Health Check, service, where
we analyze problems in networks. I have begun distinguishing between
design and technical analysis. Both are components of what we do. I
also think the distinction between problem-spotting and performance
improvement is important, since improving performance in a solid
network may require deeper analysis.
One of the reasons this is on my mind is that we've been
conducting field trials of a product from a partner company
(Netcordia) and associate and friend (Terry Slattery).
This new
product, Periscope NetMRI, in part automates some of the technical information
gathering
that we may do in our network health checks. We and other parties have
worked with Netcordia on what the product does, what's needed, and
understanding what it does well and where improvement is needed.
The tightly focussed and highly expert programming team at Netcordia
has assimilated all the inputs,
bug reports, and refined the product. (The guys there are very good,
also fast!) All this stimulus has had me thinking,
what do I want in a network management tool, and why is it
appropriate, what's the "right" approach.
I thought this month we'd take a brief look at Netcordia's
NetMRI Periscope. It is a fresh look at a user-friendly
automated easy-to-use network tool. The article ends with
commentary from a network that tried the product. It was generous
of them to allow public display of information that could be
taken to indicate a condition less than perfection -- in other words, a
real network!
Please also bear
in mind this is my take on the product, and that this article was not
particularly solicited by Netcordia. They might well have chosen to
emphasize other parts of the product had they written this article. I
do know the
folks there well, and had some input to the product design, so perhaps
some conflict of interest. Both
I and the product team would be interested in hearing what you like and
don't like about NetMRI Periscope, and what you think your network management
needs are. See also http://www.netcordia.com. Network Management Trends
There are two trends I've been seeing affecting network management.
First, products cost far too much, excluding many small to
medium size companies from using them. And second, products are too
hard to use. Again, small to medium sized companies are hit the worst
by this.
If there are only one, two, or a few "network administrators"
who have to manage servers, network devices, firewalls,
management software, etc., there is often little time to work
with the management software. Worse, when you sit down to use it, you
find out that recently-added devices aren't known to the product, or
there is some other snafu (Situation Normal All Fouled Up) in using it.
So you end up spending time fiddling with the network management tool,
not getting results.
Netcordia sees all this as an opportunity to try a different
approach to network management. Initial quality seems high
(despite being version 1.0). Price isn't cheap but isn't that expensive
either. Some of the justification is that it saves perhaps some
fraction of a full time employee, both in terms of automated expertise
but also in terms of information it gathers.
We at Chesapeake Netcraftsmen see an opportunity here too.
Automated information
gathering makes us more productive in doing network health checks.
That way we can either come in and just do an automated (quarterly?)
network checkup with
report, or we can let Periscope do its thing while we gather design and
other information for a deeper look at the network.
About Periscope
Periscope is sold as an appliance. You'll want to warn the security
administrator you're going to be using it (we'll see why in a second).
Then you plug the small box (3 x 6 x 10 inches) into your
network. It requests an address via DHCP. It is also accessible via a
private address if you put a PC on the same physical segment (VLAN)
with another address in the same private subnet. The box runs Linux,
but you don't need to know that, since all administration is done via a
Web interface. By the way, don't obstruct the fan in the box, the
product does get very hot to the touch! (A literal, hot product).
When Periscope discovers it is on a new network, the web
interface comes up with a short series of screens asking you about the
network it is on. The point to this is for you to specify CIDR blocks
of addresses making up the network to be managed. You also specify the
SNMP read-only community string you're using. That's about all you tell
it in the present version. Periscope also has some built-in SNMP
community strings, considered "obvious guesses". Yes public and private
are on the list. If you have multiple strings in your network, tell
them all to Periscope, and it will figure out which community string
belongs to which device.
Periscope then starts intelligently discovering the network or
part of the network you've told it to check out. It conducts some port
scanning and characterizes the devices found (as PC, router, switch,
firewall, etc.). All this is visible through a list of discovered
devices, along with numbers representing degree of certainty as to the
device identification.
This is why you do have to alert the security administrator:
the initial characterization and port scanning should trigger warnings
on an IDS. You probably don't want the security person(s) running
around thinking there's a hacker probing from the inside. Of course, if
you're cruel, you could reason that this is a test of whether that IDS
output is actually being read and reacted to.
Looking for Trouble
So you plug the Periscope box in, give it a little bit of
starting information, and it quietly starts discovering. After 24 hours
or so, it has usually found out enough to start producing interesting
results.
How do you usually spot problems in a network? Waiting for
something to turn red, you only find out about problems when something
fails. Ideally, you'd like to catch them earlier. Some high-end
products like Concord produce various health reports, using metrics to
prioritize what needs your attention. If you gather data, a TopN
listing as in SolarWinds Orion does help show the hot spots or problem
areas over some period of time. Threshold alarms to HP Openview or
CiscoWorks Device Fault Manager console is yet another way.
Periscope does something a bit different. It adds some
intelligent criteria to a TopN approach, and provides a daily report on
current problem areas in the network.
The philosophy of Periscope is "no news is good news". That
is, it is not primarily intended as a reporting tool, although the data
is there and accessible. Who goes looking at graphs for fun? (Besides
me, I mean.) What most people want is something that cuts to the chase
and tells them about stuff they care about that needs fixing. And
that's what the Issues screen in Periscope does. See the
following figure.
By the way, the figures are all based on data from real
networks, and shown with permission. Addresses, names, and other
information were changed so as to reveal little about the internals of
the demo networks. The issues shown have since been
resolved.

The numbers after each issue indicate the number of devices
with the problem. Periscope orders the issues based on severity, using
a weighting scheme reflecting several factors (technical severity,
number of devices, etc.). You then click on an Issue to obtain more
detail. Here are some more issues:

Clicking on an issue drills down to detail, such as the
following:

If you want, you can click on the device name or address. This
gets you to
a screen summarizing the device interfaces. From there, you can click
to obtain various performance graphs. For example:

Note the steadily decreasing memory, indicating a memory leak.
(Time for a new Cisco IOS image!).
It turns out there are various Cisco IOS images with SNMP
problems, including memory leaks. Periscope now knows about these, and
reports if it encounters them. If the problem is one that might lead to
a reboot (per public Cisco bug reports), Periscope allows you to
disable SNMP on the device. This is visible now as a separate issue in
the
issues screen captures above.
Here's another (very common) problem caught by Periscope:

If you have duplex full hardcoded into a PC and switch
attempting to autonegotiate, you will have duplex errors and user
slowness. This automates finding such ports.
Here's another interesting report from Periscope:

This shows VLAN's, identified by number, name, and root
bridge. If you look closely, you can see that the root bridge for the
selected switch is the switch itself (middle, top), and that it
is the only switch with that root bridge for VLAN 91 (left: count is
1). This happens when switches aren't cleanly trunked together, as can
happen when stackables are centrally connected to routed ports, but not
connected to each other (as was originally planned).
A similar report shows HSRP addresses. When no inactive backup
peers are found, there's a problem. This may be getting added as
another Issue even as I type.
Reporting in Periscope
Periscope contains a wealth of data. If the network is healthy, all the
work it is doing is not readily visible to the user. So there is some
overview reporting in the current product. Plans include automated
emailing of PDF versions of these to managers who subscribe.
Future releases may also have more reports available. For now,
this is another way to keep an eye on your network.
To get to the Reports, you click on Reports. The Network
Health report is the default choice. It starts with a score, which
should be taken as a relative health measure for the one network being
monitored. 100% means no problems. Points come off based on severity of
issues and number of devices affected.

Scrolling down, we see:

This lets you see whether the number of issues is going up or down over
time.
A wide variety of graphs indicate numbers and changes for
various aspects of the network. One of the graphs under the
Performance link follows. (There are plenty more where this one came
from!)

Note that device uptime is reported for various lengths of time. So you
can see at a glance that devices in this network have been up for
various lengths of time. Reboots generally indicate a problem,
stable devices just stay up and working.
Field Experience
Periscope has been field tested at a number of sites.
One such has agreed to be mentioned in this article. The
following comment comes via Kevin Stone.
Johns Hopkins Bloomberg School of Public Health as a leading
international authority on public health is dedicated to protecting
health and
saving lives. Every day, the School works to keep millions around the
world
safe from illness and injury by pioneering new research, deploying its
knowledge and expertise in the field, and educating tomorrow's
scientists and
practitioners in the global defense of human life. See http://www.jhsph.edu/ for more
information about the School.
To accomplish these goals the School has put
in place a
state of the art data network. As the
School
has grown so has the network, to over 200 network devices and 3500
computers. Managing this network
cost effectively without the help of diagnostic tools has become
overwhelming. Equally overwhelming has
been the complexity
and cost of most network management applications. One
of the priorities of the Information Systems department has
been to document and put in place tools to provide the necessary data
to ensure
proper operation of the network, shown below.
The School has installed Periscope by
Netcordia to help do that, and has realized many operational
improvements through
its
use. As an appliance Periscope solves
the complexity issue by only requiring minimal configuration and
providing
browser based access. Taking the
expertise at Netcordia and putting it into the Periscope system allows
even the
most junior staff member pinpoint problems and take action.
Periscope has provided the School with
invaluable
information without inundating the staff with alerts and reams of data. Device discovery and identification has
allowed the team to identify rogue wireless access points and see what
devices
are on the network. This has been
invaluable in the fluid academic research environment at the School.
The Issues screen with its ranked items and
color coding
allows for quick determination of critical problems
The network health report allows both management and the
network
team to have a single page that shows the relative health of the entire
network
and how this has changed from previous days. More
detail is readily available if the network team needs
it for
further monitoring and troubleshooting.
Overall the Periscope appliance has reduced
the
troubleshooting workload of the network staff while allowing the staff
to
improve the
operation of the network with minimal setup time and configuration.
Kevin also notes that he likes the "Servers Alive" product,
from Woodstone. http://www.woodstone.nu.
This product tracks service availability with a set of features that
strike me as appropriate to Windows-based environments. It looks
to me (Pete) like a supported commercial counterpart to the Big
Brother/Sister tools that were popular a couple of years back.
Conclusion
I like the Periscope product a lot. It provides easy startup and
simplicity of use. It catches a bunch of things and reports them simply
as Issues. If you have time to go poking through the reports, you can
learn more.
Periscope is not now a Fault and Event management system. You
can buy HP OpenView or What's Up Gold for that.
Periscope is not right now a reporting product per se. It does
tell you what you need to know about potential problems. Future
versions may add a network map, more reporting, trending and / or a
wider
variety of issues.
Please contact Chesapeake Netcraftsmen (http://www.netcraftsmen.net) if
you'd like us to come do a Network Health Check, with or without
Periscope. Please contact Netcordia (http://www.netcordia.com)
for more information about the product, or to pass along what you're
looking for in a network management product.
Dr. Peter J. Welcher (CCIE #1773, CCSI #94014, CCIP) is a
Senior Consultant with Chesapeake NetCraftsmen. NetCraftsmen is a
high-end consulting firm and Cisco Premier Partner dedicated to quality
consulting and knowledge transfer. NetCraftsmen has eight CCIE's, with
expertise including large network high-availability routing/switching
and design, VoIP, QoS, MPLS, IPSec VPN, wireless LAN and
bridging, network management, security, IP multicast, and other
areas. See
http://www.netcraftsmen.net for more information about
NetCraftsmen. Pete's links start at
http://www.netcraftsmen.net/welcher . New articles will be posted
under the Articles link. Questions, suggestions for articles, etc. can
be sent to pjw
<at> netcraftsmen <dot> net.
Kevin J. Stone is a Senior Network Administrator at the Johns
Hopkins Bloomberg School of Public Health. Kevin has over 12 years in
the networking field, with companies ranging from startup Fortune 1000
and now higher education experience. Kevin works with both server and
the network equipment and likes to emphasize managability and
monitoring in networks he runs.
9/8/2003
Copyright (C) 2003 Peter J. Welcher
|