|
||||||||||||
IntroductionI've been doing some work with Fluke Networks recently.
This article is about the subject of some of that work, namely NetFlow
and NetFlow Reporting. We'll talk a little bit about what NetFlow is
and how it collects traffic data for us. We'll then look at how useful
that information can be. We'll also briefly discuss how to configure
NetFlow and usage considerations. Show command output is the easiest way for you to check out
NetFlow: enable it on one or a couple of devices, and then you
can see the sort of data it makes available. Samples are in my prior
article. This article updates an article from 2001:
http://www.netcraftsmen.net/welcher/papers/netflow.html.
See that article for some details omitted from this one.
About NetFlowWhen Cisco first introduced NetFlow a while ago, in or before the Cisco IOS 11.1 CA code, it was a caching technique based on "flows". It still is, but the emphasis is now on statistics collection. NetFlow caching is still present and may help when you have long access lists.A network flow is a unidirectional stream of packets, identified by source and destination IP addresses, IP protocol, source and destination ports (where relevant), and Type of Service (TOS) byte. Inbound interface and other information can be tracked per flow. The crucial statistics tracked by NetFlow are packet and byte counts. When you enable NetFlow on a router or switch, statistics are collected on the IP traffic passing through that device. The expiring flow data can then be exported to a collection system for post-processing and storage. Reporting software then gives you access to the information.Flow expiration is based on obvious termination of the flow
(TCP FIN or RST) and timers, including a longevity timer to ensure that
data is occasionally exported, to prevent loss of information. Using NetFlow DataNetFlow data export allows you to gather data from across a network about the traffic passing through that network.What reports you can pull out then depend on the choice of reporting tool. For example, I experimented some recently with ntop, an interesting freeware (except on Windows) tool that can work with either packet capture or NetFlow data. I found the reports showed me top talkers and ports used to some extent. But the reporting didn't let me slice and dice the data in the ways I needed. I wanted site to site flows. Instead, I was buried in a mass of host-specific information. Don't get me wrong, ntop has its uses. What else could you want? Well, NetFlow contains information
about interfaces. So one might want reporting on how much traffic
passes through each interface, and what the application mix is on that
interface. One might also want to report on the top transmitters
and receivers, also on top conversations. This could be useful, for
example, when there is a report that a WAN site is experiencing
slowness. One group I've worked with uses distributed Sniffer for this.
But NetFlow can provide the same information, in near realtime. (There
is a little delay due to the export processing). Another use that's being seen in the field is cost recovery
due to identification of wasted bandwidth. Are your users running
peer-to-peer file/music sharing? Perhaps they are acting as an ad hoc
server, and not aware of the bandwidth effects, and legal and other
implications (risks). For that matter, identifying applications that
are major bandwidth consumers helps you focus on where to focus efforts
to increase efficiency. Several NetFlow vendors as well as Cisco have realized that
NetFlow can provide good information about virus or worm outbreaks and
other unusual activity. You are receiving data on every source and
destination IP, as well as ports. You can then watch traffic headed to
a well-known port such as 445. A sorted report on sources of traffic
flows going to port 445 then quickly identifies infected computers. If
the infected computer runs say an FTP serverlet to download malware,
you can also look at clients, that is, hosts sending FTP traffic to
that computer. Service Providers use NetFlow BGP AS information, to track
flows between BGP AS's transiting the SP network. You can get at some of this using RMON probes and reporting
software. My experience with RMON is that the reporting does
interface-centric tasks well, but often isn't as good at "across the
network" information, e.g. reporting on one port across multiple
probes. This can be true with NetFlow data reporting as well: if
the data storage is optimized for one type of report, it may be
impossible or very slow and painful to extract reports that "go against
the grain". You can to some extent get at this sort of information using
packet capture and analysis. The problem with such an approach is the
packet capture buffer is limited in size, and the analysis therefore
shows you one snapshot in time, rather than more comprehensive data. My
analogy for this is that Ethereal and packet capture in general is the
equivalent of a microscope. Sometimes, a microscope is just not
the right tool: too tight a focus, too narrow a view. NetFlow Data Export Packets
The following screen capture shows an Ethereal decode of a
NetFlow export packet. Note the version is 5. The packet contains
a version-specific header, followed by flow records. The capture
expands one of the flow records so you can see what's in a typical flow
record.
There's an excellent Cisco reference
document about NetFlow. If you want more detail about packet
formats, which NetFlow variants are supported in which IOS/CatOS
versions, that sort of thing, it is the place to look. It is the first
Cisco reference below. About IPFIXThe IETF has been working to standardize NetFlow. The effort is named IPFIX, which stands for IP Flow Information eXport. The IETF considered working implementations as a starting point, and elected to work from Cisco NetFlow version 9. Version 9 extends classic NetFlow by using templates to describe the flow records. This provides extensibility. The charter for the working group also lead them to allow securing the flow information -- SCTP secure stream transport can be used instead of TCP or UDP for transport. IPsec or TLS can also be used.The IPFIX standard also allows for sampled data, which
alleviates the burden on devices of classifying and reporting on
each and every packet. Cisco is recommending random sampling
(probabilistic sampling) to ensure you don't miss flows, for example
when there are recurrent (periodic) data patterns. More IPFIX information can be found at
the IPFIX URL's listed below. Most major vendors including Cisco appear to intend to support IPFIX going forward. Fluke Networks ReporterAnalyzerFluke Networks has been moving up the OSI stack, and now
supplies a wide range of link testing tools and "higher level"
reporting tools. Their
website is at the URL http://www.flukenetworks.com/us/default.htm.
Fluke Networks
is now selling a NetFlow-based product, ReporterAnalyzer. It is an OEM
version of
NetQoS's product of the same name. I'm including it here because it
has some interesting and useful NetFlow capabilities. I've driven the
user interface, but haven't used this product in production. The site
administrator was rather enthusiastic about the product, however!
Product information can be found at http://www.flukenetworks.com/us/WAN/Monitoring+Analysis+Diagramming/ReporterAnalyzer/Overview.htm.
The screen capture of a sample "calendar graph" for one
interface over time (color indicates percent utilization) follows.
There is an interesting white paper by NetQoS at http://www.cisco.com/warp/public/732/partnerpgm/docs/netqos_netflow_rmon2.pdf. It shows some examples of how NetFlow data solves customer problems. It dates Yes, it is a marketing document, but it nicely makes some points about where this sort of data can be useful. When you compare to the freeware below, bear in mind that the
usual choice applies here. Generally you can pay for a supported
product
that works smoothly and does a lot, or sink time and energy into
freeware. With commerical network management software, I have
lately been recommending trying it in your network, to ensure it gets
you the reports you want, and that it works reasonably well in
production.
NetFlow FreewareI've recently been experiencing a certain amount of frustration with freeware tools. Admittedly, I'm running cygwin and not one of the more standard Linux variants. Having said that, much of the freeware is very poorly documented, especially concerning what versions of other freeware the build process actually works with. I have a hypothesis ("Pete's Law of Freeware"?) the hassle factor goes up with square of # of free tools included. So for example I'm pro-Cacti, since it is available on Windows and involves relatively few (5?) disparate components. Ethereal is great -- 2 components, seems to work well. RRFW / Torrus looks interesting and powerful, builds on Fedora, but has too many components (20?) that apparently only sometimes work together. The more complete NetFlow reporting freeware packages below look a bit risky, in that they involve getting a fair number of other components to work well together.
Impact of NetFlowIf you're thinking about deploying NetFlow, you might want to think about and plan for the following potential issues or concerns:
Concerning other impacts, you don't want to just go turning on
NetFlow everywhere, you want to think about what the key reports are,
what you're trying to get out of NetFlow. Often NetFlow on data center
Layer 3 switches or on WAN routers suffices. You also want a gradual
deployment plan: bugs and suprises do happen. The device impact comes from two sources: traffic needs to be
characterized so the proper packet and byte counts can be incremented.
This includes cache management. That takes CPu and memory. The CPU is
also used to build the UDP NetFlow export packets and transmit them.
Sending that data then consumes some network bandwidth. Per Cisco, processor impact is generally low, but does need to
be considered if your CPU is high or spiking already. One reference
based on testing says about 4% CPU impact for 10,000 active flows, up
to 16% for 65,000 active flows. "Your mileage may vary". Memory
impact is about 64 bytes per cache entry, and can be controlled by
adjusting the cache size, at least in software-based platforms (1 to
512 KB). The 6500 does NetFlow in hardware and cache size isn't
configurable (from 32 to 230 K flows, based on PFC2 to PFC3BXL). The last planning factor is ensuring sufficient collector
capacity. This is a characteristic of the Operating System and
collection software used. Fluke Networks' collector ("Harvester") can
handle approximately 20 routers, sometimes as many as 50, but this
depends on the rate at which it is receiving flow information. That can
be more than 100,000 flows / second. I haven't researched the
capacity of collectors from other vendors (or freeware). Configuring NetFlowYou need to make sure CEF (distributed CEF, fast caching) is running first.You then configure "ip flow ingress" on the desired
interfaces. The old syntax was "ip route-cache flow ". That starts
statistics collection, which can be viewed using show commands. This
can be a good idea. Among other things, the number of active flows can
be used with the above information to estimate export traffic volume
before you actually go and activate NetFlow export. To start NetFlow export, configure the destination collector
and port (and version, optionally) as follows: router(config)# ip flow-export destination 10.1.1.1 9995 It is a good idea to specify a loopback interface (last line
above), since NetFlow export packet source, to avoid any potential
confusion by the receiving collector. See my prior article for some of the more detailed configuration options. And the Cisco documentation (links below) for full details. NetFlow ReferencesThe following table includes some of the best references I found while researching NetFlow. There are many more research papers out there doing statistical and lab analysis of NetFlow or IPFIX. Some do get rather technical!Reader Participation
I've got a reader-participation project this month, one I'd appreciate
your (brief!) thoughts and email on. I'd like to write an article
titled something like "Surprise: Top 10 (or 20) Things That Defeat or
Disable CEF". For example: |
||||||||||||||||||||||