Troubleshooting Too Much Multicast
In this article we describe a real-world problem observed in a customer network. We hope this story is useful as a tale in preventative design. The story might also help you recognize and troubleshoot this particular multicast issue in your own network. We will of course conceal the identities of all parties involved. In this case, we think the story is really not about any error on anyone's part, but about some multicast behavior one might not expect, behavior it behooves us all to know about.
For those who aren't comfortable with multicast, you might want to skim Pete's previous articles on multicast as needed, for the briefest of introductions. They are:
The network we will be talking about belongs to a large financial enterprise using a fairly common high availability Cisco infrastructure. Let's call it Company X.
To understand the multicast problem, we need to first understand the Company X network. It is a fairly common large Enterprise design.
The Company X network uses dual 6513's, with dual power, dual Supervisor, and dual MSFC's. They are deployed so as to provide Layer3 switching at core and distribution layers. There are 6509's without MSFC's acting as server switches, and 4006's as closet ("IDF") switches. All edge switches, closet or server, are dual-connected to distribution layer 6513's. Servers are (supposed to be) dual-homed to two switches, with a third management interface. VLAN's and STP domains are small, running from one closet or server switch to distribution layer ("MDF") switch. Layer 3 switches run OSPF. Most VLAN's are passive for OSPF, reducing adjacencies and OSPF traffic.
This brings to Pete's easily side-tracked mind some short but timely comments on switched network design. We've put them at the end, after the main story.
In general, Company X has been very happy with its Cisco switch network. One comment we heard was that the network was "too reliable". The point was that links would fail and there would be no overall outage, so one could easily not notice unless a second failure happened. This organization formerly had a Token Ring network with daily outages. Quite a change!
This does raise the issue of network management, managing events, and some related comments. We've also put these at the end.
Multicast at Company X
Multicast was deployed at Company X to support upgrading a substantial Novell server infrastructure to IP and Server Location Protocol (SLP). Version 2 of SLP is specified in RFC 2608, with refinements in other RFC's. To read up on it, you can find the RFC at: http://www.ietf.org/rfc/rfc2608.txt
Searching Novell's site of course yields many links on SLP. Some good starting documents:http://support.novell.com/cgi-bin/search/searchtid.cgi?/10014788.htm
In SLP, Service Agents in printers, servers, etc. respond to User Agent requests for a service if they can provide that service. The optional Directory Agent can instead learn about services by Service Agent registration messages, and then it can respond to User Agent requests. For those who collect well-known multicast addresses, SLP uses 188.8.131.52. The address 184.108.40.206 is used for locating Directory Agents.
The point to SLP is that it uses multicast as an efficient way for ad hoc service discovery. It only uses bandwidth when needed, whereas the older Novell SAP advertisements were broadcasted regularly, interrupting all CPU's on each Novell segment. SLP is much wiser in its use of bandwidth.
When multicast was deployed at Company X, we set up router interfaces with PIM sparse-dense mode, to make it self adjusting. With PIM sparse-dense mode on interfaces, each multicast group is flexibly handled in dense mode if no sparse mode Rendezvous Point (RP) is known for that multicast group address.
Auto-RP is used to scalably manage RP's and provide some resiliency should the primary RP fail. This alleviates most concerns about losing the acting RP and reverting to PIM-DM, with consequent floods of multicast every three minutes.
In Auto-RP, routers can be set up as candidate RP's, for all or for selected multicast groups. They advertise themselves via multicast to 220.127.116.11. Mapping agent routers receive these multicasts, and select the highest candidate IP address as acting RP for each group or range of groups advertised. They then periodically advertise this information to multicast group 18.104.22.168. Other routers receive this multicast and now know RP's for the advertised multicast groups or ranges.
Generally with Auto-RP, interfaces are put into sparse-dense mode, with dense mode used for Auto-RP traffic, and since RP's become known for other groups, sparse mode for other multicast groups. This solves the chicken-and-egg problem: if you want Auto-RP to be sparse, you'd have to statically configure RP's for Auto-RP groups into all routers. The point here being to avoid all that static configuration in each and every router. Instead, with Auto-RP, you configure candidate RP's and mapping agents, and that's it.
By the way, the Auto-RP advertisements go out every 60 seconds. That's relevant later.
For how to configure Auto-RP and a little more detail, see IP Multicast and PIM Rendezvous Points
What We Saw
At the time this happened, we were starting a QoS project at Company X, and were reviewing frame drop rates in switches. The reasoning was that QoS in the campus is about protecting VoIP and Video over IP from packet or frame drops. So we thought we'd see if the switches were dropping anything, or if the utilization was so low that drops just weren't happening. You might consider going through this drill if you need to prioritize, to decide whether to implement QoS in the campus or the WAN first.
Pete's perspective lately is: if you can possibly do so, it's time to get going on QoS. You may need it more than you suspect! Can you honestly say you know all the traffic on your network, and there's no central backup or video or Kazaa (Gnutella, Napster, etc.) or other bandwidth hogs? Because with any of these, you may have some interesting bursts of traffic, drops, etc. If you're trying to do VoIP or IP Video Conferencing (IPVC), then you might want to protect it from bandwidth-hungry or bursty applications. (And of course we'd be glad to provide you with QoS consulting services.)
The following is what was observed at Company X using the Concord Network Health reports. And this was on a good day. When we did some consistency checking, the PIM commands were missing on some VLAN interfaces. We also reduced the number of Auto-RP mapping agents from 4 to 2. Prior to doing this, multicast bursts of about 40,000 frames per second were seen.
Here is a sample Concord Network Health report graph showing the frame discards we noticed:
Tracking It Down
Upon drilling down, it turned out the discard and multicast bursts were occurring almost exactly every four minutes. Packet capture and analysis confirmed this. Troubleshooting tip:
knowing how often something happens is sometimes a good clue!
It wasn't too long before our thoughts turned to PIM Dense Mode (PIM-DM). It floods every 3 minutes. How to get from 3 to 4 minutes? Some head scratching ensued. The PIM Prune information that temporarily blocks PIM-DM flooding ages out after 3 minutes. But the actual multicast traffic to the multicast groups in question was only getting sent every minute. So the timing went: send multicast, 3 minute PIM-DM timeout, one minute, send next multicast, major traffic burst!
The packet capture also helped in another regard. Troubleshooting tip: they often do! Use the RSPAN feature in your Cisco switches to capture sample traffic when you have a problem.
In this case, the packet capture was interesting when we looked at source and destination. There were only two sources, and only two destination IP multicast groups. The two sources were the two Auto-RP mapping agents. The two destinations were 22.214.171.124 and 126.96.36.199, which are the multicast groups used by Auto-RP.
Since the network was using Auto-RP, all multicast groups should have been using PIM Sparse Mode (PIM-SM). But when you think about it, or look back at your favorite multicast reference, there are two exceptions with Auto-RP: Auto-RP itself normally uses PIM-DM. That told us that nothing too terrible was happening, that only the Auto-RP PIM-DM control traffic was subject to bursts. The next question was why.
There were two possibilities: that this was normal burstiness, or that somewhere we were getting an IP multicast forwarding and multiplication loop. To cut to the chase, it turned out the latter was the problem.
What had happened was as follows. On one of the MSFC's, someone had fat-fingered an OSPF network statement with 0.0.0.255 as 0.0.0. 225. A previous Cisco IOS version had apparently auto-corrected this in determining what to do, or maybe it just never got noticed. When the switch was upgraded, this statement didn't pass the syntax check, hence never got used or saved to NVRAM.
So on one of the distribution layer MSFC's, OSPF was only running on one of two key VLAN interfaces, the two VLAN interfaces back to the core Layer 3 switches. Say the two VLAN's were 100 and 101, with OSPF on one of the MSFC's only running on VLAN 101. By bad luck, the RPF selection process for all the other distribution layer MSFC's chose VLAN 100 as the RPF interface. So when the key PIM-DM multicast arrived, 29 MSFC's forwarded it from VLAN 100 to 101. The resulting 29 packets then all passed the RPF check for the mis-configured MSFC, which forwarded them back onto VLAN 100. Whereupon all 29 MSFC's flooded them back to VLAN 101... Luckily the TTL was set to 8, and the multicasts had to go two hops to get to the part of the network with this problem, so this only resulted in approximately 29 + 29 x 29 + 29 x 29 x 29 packets, or about 1:900 multiplication of multicast.
The following diagram shows the first stages of this, but with only 3 switches forwarding. In the real world, there were a lot more switches, and each had two MSFC's in it, both forwarding until pruned.
Fixing the OSPF network statement on the one MSFC for VLAN 100 fixed the problem.
The moral? Unicast routing "steers" or directs multicast forwarding, via the RPF check. If you see odd multicast behavior, check for unicast routing anomalies. And avoid PIM-DM wherever it is reasonably possible to do so, except for low bandwidth applications!
Good Switched Network Design
Let's digress for a moment about switched network design. This might fit under the title "how to have many fewer Really Bad days with your switched network", i.e. not having to troubleshoot in the first place by using good design practices.
Some large organizations are still doing mostly Layer 2 switching in their networks. We have been brought in to diagnose a couple of networks like this, generally in the middle of a network meltdown. Turning off STP (Spanning Tree Protocol) in a flat Layer 2 network can be a very Bad Thing!
In general, it costs serious money to put in Layer 3 switches and create small "trouble-walled" Layer 2 switching domains. But it is well worth it, in that Layer 3 designs generally fail in localized fashion. Whereas pure Layer 2 designs can take out the entire Layer 2 switched network! It's a whole lot better to have only one building down, and not an entire campus! If you haven't personally experienced this, there was a recent highly public occurrence that may have served as a wake up call for those with large flat Layer 2 topologies. See also:
Sometimes pure flat Layer 2 networks happen, either through lack of central control and management, lack of planning, rapid growth, or these days, through unwise cost avoidance. Both have been potential issues in the several medical (hospital, HMO) networks we've seen. There is often no funding or political support for infrastructure in hospitals, or sometimes staff is just spread too thin. Layer 2 looks attractive for reducing management costs ("plug them in and they just work"). In the past, we have helped a couple such organizations build more solid networks with appropriate use of Layer 3 switching. For more about good design practices for switched networks, see Pete's Switching: Campus Design
article. The information is still valid, and the only change is that much more powerful Layer 3 switches are available, and at quite attractive prices.
This is going to lead us briefly into a second side track (rant?), something Pete has seen in several organizations now. As the network becomes more reliable, network management practices need to change. Change Control becomes a factor, to reduce and manage human error. Network management and reporting tools become more important. Supporting VoIP and QoS requires raising the bar in terms of procedures and practices. Staff have to transition from being "fire-fighting heroes" to "managers of devices", which sometimes requires an attitude change. It's really easy to get motivated to be a hero. Wading through network management reports on the other hand is a quite different workday.
If you're skeptical about this, let us ask you: how do you locate duplex or speed mismatches in your switched network? If you're using port error reports, especially of ports with the highest error rates, then you probably are well along the learning curve, already using some tools to help provide pro-active quality service. If you're responding to help desk phone calls because the user says things are slow, you might want to start finding a network management reporting tool you can afford (both as to financial cost and maintenance hassle) and are comfortable with.
For more along this line, you might try the Cisco VoIP readiness questionnaire, at
http://tools.cisco.com/Assessments/jsp/welcome.jsp?asmt=VOIP . Pete takes this series of questions as indicating areas where VoIP projects might run into trouble. Possible causes include: lack of network management reporting or QoS, lack of adequate voice requirements information, lack of adequate voice design. Anyway, if VoIP might be in your future, spend a few minutes answering all the questions in this assessment, it may save you some grief later on!
Some of this is checking that you did your homework as to overall and VoIP network design. But there are also a number of questions that more or less ask, how good an idea you have of your network's behavior, and to what extent you've got the appropriate management tools in place.
What shape are your network management tools in? When were they last used? Is it time to get them off the shelf, dust them off, get them configured properly, and use them to squeeze better behavior out of what you've got? Especially if you're probably not investing much in new bandwidth or equipment?
There are some fresh QoS seminars posted at Seminars. I'm re-working and adding to the content as time permits. I've recently had the pleasure of working with a couple of large organizations on QoS, and also have delivered some onsite customized QoS classes extending the Cisco DQOS course content. I hope to soon be presenting some of the QoS (and MPLS) seminar material in cities in the NorthEast.
I've posted a couple of documents that I hope are of general interest. One is a Network Management Configuration Template, which is intended to represent the current Best Practice for configuring Cisco routers and switches for manageability. See also Cisco Router Management Configuration Template.
The second document I've posted is an updated version of our "Switching Cheat Sheet", a summary of the key CatOS and Cisco IOS commands for working with switches. The PDF is in landscape mode, so it looks a whole lot better printed than it does 90-degree rotated on screen. You can find it at the Cisco Catalyst Command Comparison Sheet PDF.
I'm also posting slides from a customized course on Cisco basics, HP OpenView, and CiscoWorks, as time permits. The last few chapters can be used as a tutorial in CiscoWorks. The course is intended as a 7-day introduction to network management of Cisco devices, for those with little to no prior Cisco or network management experience. The course shows screen captures of the previous version of CiscoWorks, due to the needs of the particular very large customer. However, other than the initial drawers screen, little of the remaining user interface has changed. The courseware can be found at http://www.netcraftsmen.net/welcher/ovcourse/index.html
Copyright (C) 2002, Peter J. Welcher