NetFlow and IPFIX
I've been doing some work with Fluke Networks recently. This article is about the subject of some of that work, namely NetFlow and NetFlow Reporting. We'll talk a little bit about what NetFlow is and how it collects traffic data for us. We'll then look at how useful that information can be. We'll also briefly discuss how to configure NetFlow and usage considerations.
Show command output is the easiest way for you to check out NetFlow: enable it on one or a couple of devices, and then you can see the sort of data it makes available. Samples are in my prior article.
This article updates an article from 2001: NetFlow. See that article for some details omitted from this one.
When Cisco first introduced NetFlow a while ago, in or before the Cisco IOS 11.1 CA code, it was a caching technique based on "flows". It still is, but the emphasis is now on statistics collection. NetFlow caching is still present and may help when you have long access lists.
A network flow is a unidirectional stream of packets, identified by source and destination IP addresses, IP protocol, source and destination ports (where relevant), and Type of Service (TOS) byte. Inbound interface and other information can be tracked per flow. The crucial statistics tracked by NetFlow are packet and byte counts.
When you enable NetFlow on a router or switch, statistics are collected on the IP traffic passing through that device. The expiring flow data can then be exported to a collection system for post-processing and storage. Reporting software then gives you access to the information.
Flow expiration is based on obvious termination of the flow (TCP FIN or RST) and timers, including a longevity timer to ensure that data is occasionally exported, to prevent loss of information.
Using NetFlow Data
NetFlow data export allows you to gather data from across a network about the traffic passing through that network.
What reports you can pull out then depend on the choice of reporting tool. For example, I experimented some recently with ntop
, an interesting freeware (except on Windows) tool that can work with either packet capture or NetFlow data. I found the reports showed me top talkers and ports used to some extent. But the reporting didn't let me slice and dice the data in the ways I needed. I wanted site to site flows. Instead, I was buried in a mass of host-specific information. Don't get me wrong, ntop has its uses.
What else could you want? Well, NetFlow contains information about interfaces. So one might want reporting on how much traffic passes through each interface, and what the application mix is on that interface. One might also want to report on the top transmitters and receivers, also on top conversations. This could be useful, for example, when there is a report that a WAN site is experiencing slowness. One group I've worked with uses distributed Sniffer for this. But NetFlow can provide the same information, in near realtime. (There is a little delay due to the export processing).
Another use that's being seen in the field is cost recovery due to identification of wasted bandwidth. Are your users running peer-to-peer file/music sharing? Perhaps they are acting as an ad hoc server, and not aware of the bandwidth effects, and legal and other implications (risks). For that matter, identifying applications that are major bandwidth consumers helps you focus on where to focus efforts to increase efficiency.
Several NetFlow vendors as well as Cisco have realized that NetFlow can provide good information about virus or worm outbreaks and other unusual activity. You are receiving data on every source and destination IP, as well as ports. You can then watch traffic headed to a well-known port such as 445. A sorted report on sources of traffic flows going to port 445 then quickly identifies infected computers. If the infected computer runs say an FTP serverlet to download malware, you can also look at clients, that is, hosts sending FTP traffic to that computer.
Service Providers use NetFlow BGP AS information, to track flows between BGP AS's transiting the SP network.
You can get at some of this using RMON probes and reporting software. My experience with RMON is that the reporting does interface-centric tasks well, but often isn't as good at "across the network" information, e.g. reporting on one port across multiple probes. This can be true with NetFlow data reporting as well: if the data storage is optimized for one type of report, it may be impossible or very slow and painful to extract reports that "go against the grain".
You can to some extent get at this sort of information using packet capture and analysis. The problem with such an approach is the packet capture buffer is limited in size, and the analysis therefore shows you one snapshot in time, rather than more comprehensive data. My analogy for this is that Ethereal and packet capture in general is the equivalent of a microscope. Sometimes, a microscope is just not the right tool: too tight a focus, too narrow a view.
NetFlow Data Export Packets
The following screen capture shows an Ethereal decode of a NetFlow export packet. Note the version is 5. The packet contains a version-specific header, followed by flow records. The capture expands one of the flow records so you can see what's in a typical flow record.
There's an excellent Cisco reference document about NetFlow. If you want more detail about packet formats, which NetFlow variants are supported in which IOS/CatOS versions, that sort of thing, it is the place to look. It is the first Cisco reference below.
The IETF has been working to standardize NetFlow. The effort is named IPFIX, which stands for IP F
port. The IETF considered working implementations as a starting point, and elected to work from Cisco NetFlow version 9. Version 9 extends classic NetFlow by using templates to describe the flow records. This provides extensibility. The charter for the working group also lead them to allow securing the flow information -- SCTP secure stream transport can be used instead of TCP or UDP for transport. IPsec or TLS can also be used.
The IPFIX standard also allows for sampled data, which alleviates the burden on devices of classifying and reporting on each and every packet. Cisco is recommending random sampling (probabilistic sampling) to ensure you don't miss flows, for example when there are recurrent (periodic) data patterns.
More IPFIX information can be found at the IPFIX URL's listed below.
Most major vendors including Cisco appear to intend to support IPFIX going forward.
Fluke Networks ReporterAnalyzer
Fluke Networks has been moving up the OSI stack, and now supplies a wide range of link testing tools and "higher level" reporting tools. Their website is at the URL http://www.flukenetworks.com/us/default.htm.
Fluke Networks is now selling a NetFlow-based product, ReporterAnalyzer. It is an OEM version of NetQoS's product of the same name. I'm including it here because it has some interesting and useful NetFlow capabilities. I've driven the user interface, but haven't used this product in production. The site administrator was rather enthusiastic about the product, however! Product information can be found at http://www.flukenetworks.com/us/WAN/Monitoring+Analysis+Diagramming/ReporterAnalyzer/Overview.htm.
The screen capture of a sample "calendar graph" for one interface over time (color indicates percent utilization) follows.
There is an interesting white paper by NetQoS at http://www.cisco.com/warp/public/732/partnerpgm/docs/netqos_netflow_rmon2.pdf. It shows some examples of how NetFlow data solves customer problems. It dates Yes, it is a marketing document, but it nicely makes some points about where this sort of data can be useful.
When you compare to the freeware below, bear in mind that the usual choice applies here. Generally you can pay for a supported product that works smoothly and does a lot, or sink time and energy into freeware. With commerical network management software, I have lately been recommending trying it in your network, to ensure it gets you the reports you want, and that it works reasonably well in production.
I've recently been experiencing a certain amount of frustration with freeware tools. Admittedly, I'm running cygwin and not one of the more standard Linux variants. Having said that, much of the freeware is very poorly documented, especially concerning what versions of other freeware the build process actually works with. I have a hypothesis ("Pete's Law of Freeware"?) the hassle factor goes up with square of # of free tools included. So for example I'm pro-Cacti, since it is available on Windows and involves relatively few (5?) disparate components. Ethereal is great -- 2 components, seems to work well. RRFW / Torrus looks interesting and powerful, builds on Fedora, but has too many components (20?) that apparently only sometimes work together. The more complete NetFlow reporting freeware packages below look a bit risky, in that they involve getting a fair number of other components to work well together.
Impact of NetFlow
If you're thinking about deploying NetFlow, you might want to think about and plan for the following potential issues or concerns:
- Meeting hardware and software pre-requisites
- Impact on infrastructure devices (processor, memory)
- Network bandwidth
- Collector capacity
The pre-requisites are easy: check your vendor (if non-Cisco). If Cisco, run code that's not ancient. Done! Ok, if you want IPFIX / NetFlow version 9, you need relatively recent code.
Concerning other impacts, you don't want to just go turning on NetFlow everywhere, you want to think about what the key reports are, what you're trying to get out of NetFlow. Often NetFlow on data center Layer 3 switches or on WAN routers suffices. You also want a gradual deployment plan: bugs and suprises do happen.
The device impact comes from two sources: traffic needs to be characterized so the proper packet and byte counts can be incremented. This includes cache management. That takes CPu and memory. The CPU is also used to build the UDP NetFlow export packets and transmit them. Sending that data then consumes some network bandwidth.
Per Cisco, processor impact is generally low, but does need to be considered if your CPU is high or spiking already. One reference based on testing says about 4% CPU impact for 10,000 active flows, up to 16% for 65,000 active flows. "Your mileage may vary". Memory impact is about 64 bytes per cache entry, and can be controlled by adjusting the cache size, at least in software-based platforms (1 to 512 KB). The 6500 does NetFlow in hardware and cache size isn't configurable (from 32 to 230 K flows, based on PFC2 to PFC3BXL).
Concerning network bandwidth, the rule of thumb appears to be 1-1.5% of the total bandwidth of interfaces where NetFlow is enabled. A research study seems to fall in the same ballpark, reporting about 8 Mbps of export traffic per million active flows. I like having two ways to estimate, since that allows a cross-check. I can't speak to the accuracy of these formulas.
The last planning factor is ensuring sufficient collector capacity. This is a characteristic of the Operating System and collection software used. Fluke Networks' collector ("Harvester") can handle approximately 20 routers, sometimes as many as 50, but this depends on the rate at which it is receiving flow information. That can be more than 100,000 flows / second. I haven't researched the capacity of collectors from other vendors (or freeware).
You need to make sure CEF (distributed CEF, fast caching) is running first.
You then configure "ip flow ingress" on the desired interfaces. The old syntax was "ip route-cache flow ". That starts statistics collection, which can be viewed using show commands. This can be a good idea. Among other things, the number of active flows can be used with the above information to estimate export traffic volume before you actually go and activate NetFlow export.
To start NetFlow export, configure the destination collector and port (and version, optionally) as follows:
router(config)# ip flow-export destination 10.1.1.1 9995
router(config)# ip flow-export version 5
router(config)# ip flow-export source loopback0
It is a good idea to specify a loopback interface (last line above), since NetFlow export packet source, to avoid any potential confusion by the receiving collector.
See my prior article for some of the more detailed configuration options. And the Cisco documentation (links below) for full details.
The following table includes some of the best references I found while researching NetFlow. There are many more research papers out there doing statistical and lab analysis of NetFlow or IPFIX. Some do get rather technical!
I've got a reader-participation project this month, one I'd appreciate your (brief!) thoughts and email on. I'd like to write an article titled something like "Surprise: Top 10 (or 20) Things That Defeat or Disable CEF". For example:
- Using a packet or QoS classification ACL with "log" in it
- Large packets sent out a GRE tunnel needing fragmentation and being process switched because of that
What other things disable CEF? I'm particularly interested in those that most surprised you. Bragging rights go to the "best" entry.
For those worried that I ignored other commercial products, my grounds are that I don't know much about them. I've talked to a couple of vendors of NetFlow software at trade shows, but I'm left feeling that these tools really need a little "testdrive" time before you can spot what they do well, and what they don't do or don't do all that well. Here's a Google link for those wishing to investigate other NetFlow vendors: http://www.google.com/search?hl=en&q=netflow+reporting+product.
I'd like to say thanks to all the folks that came up to say hello and catch up at Networkers 2005 in Las Vegas. I hope to see you there again next year! Also thanks to the people who have emailed me. It's very gratifying to know these articles are being read!
Your comments, questions, and suggestions for future articles are of course welcome! See below to decipher my email address.
Copyright (C) 2005, Peter J. Welcher