Some of the long-time and very experienced folks that founded Chesapeake Computer Consultants (which grew into Mentor) are starting a new company. The new company is named Chesapeake Netcraftsmen, and our web page is at http://www.netcraftsmen.net . (Yes, please watch out for the ".net", also that's "men" plural). We're going to be focused on consulting (mostly, with the occasional special training class). We have a core group of 6+ CCIE's, a couple of other very savvy people, and tons of experience. (If you remember "Chesapeake Computer Consultants" you're probably experienced too.)
We've been talking about IP Multicast. This month I'm going to start covering PIM Sparse Mode. Previous articles that might be of interest:
PIM Sparse Mode uses an explicit request approach, where a router has to ask for the multicast feed with a PIM Join message. PIM Sparse Mode is indicated when you need more precise control, especially when you have large volumes of IP multicast traffic compared to your bandwidth. PIM Sparse Mode scales rather well, because packets only go where they are needed, and because it creates state in routers only as needed. Because of this, it has been written up as an Internet Experimental Protocol. See http://www.ietf.org/rfc/rfc2362.txt .
The price we pay for this extra control is mild extra complexity. PIM Sparse Mode uses a special router called a Rendezvous Point (RP) to connect the flow source or multicast tree to the router next to the wannabe receiver. The RP is typically used only temporarily, as we'll see below.
There can be different RP's for different multicast groups, which is one way to spread the load. There is usually one RP per multicast gropu. Redundancy of RP's is an advanced topic, and requires a little deeper expertise. One way to do this is with the MSDP protocol (possible later article in the series).
Recall that a PIM Join message is sent towards a Source (or for PIM-SM, possibly towards an RP), based on unicast routing. The Join message says in effect "we need a copy of the multicasts over here". It connects the sender of the Join and intervening routers to any existing multicast tree, all the way back to the target of the Join if necessary. A Prune message says in effect "we no longer need this over here". A router receiving a Prune sees whether it has any other interfaces requiring the multicast flow, and if not, sends its own Prune message. One advanced technique is to arrange a separate and perhaps different copy of the unicast routing information just for multicast purposes. This allows "steering" of the Join messages. MultiProtocol BGP, MBGP, for multicast, is one way to do this (possible later article in the series).
Let's talk through the basics of how the RP is used. Let's assume the source starts sending before there any receivers. If things happen the other way around, some of the details change slightly, but it's not very different.
So: the multicast source starts sending. As we've already noted, there is no protocol or anything for registering sources with IP multicast. The source sends and it is up to the neighboring router(s) to do the right thing. With PIM-SM, the neighboring router knows about the RP. (How it knows is a topic for a whole separate article.) The neighboring router forwards the multicast data to the RP by encapsulating it in a unicast Register message or messages. Normal routing delivers the Register to the RP. The RP de-encapsulates the multicast and forwards copies down any Shared Tree (there is one pre-built if there were receivers Joined up before the Source started sending). If there are receivers (Shared Tree state outbound interfaces), the RP sends a PIM Join back towards the Source. This connects the Source to the RP with a Source Tree, the (S, G) Shortest Path Tree (SPT). Once the RP receives multicasts along this SPT, it sends a Register-Stop to tell the router by the Source to stop sending Register packets. The reason for this behavior is that no multicast packets are lost, if there are receivers already present.
By the way, if there are no receivers present, the Register-Stop message is sent. Then when a receiver subsequently shows up (IGMP to neighbor router, PIM Join from neighbor router back to RP), then the RP sends the PIM Join to the Source at that time.
The following figure assumes there is a source and active receivers (not shown). The shown receiver sends a IGMP Report to router D. Router D then sends a PIM Join towards the RP. Since there are other receivers, the RP is already joined to the Source Tree (shown in blue) and is receiving the multicast flow. It passes the Source Tree flow packets on via the Shared Tree, shown in green.
Well, now we've got the packets going from the Source to the RP along the
Source Tree (Shortest Path Tree, SPT), and from the RP to the receiver along
the Shared Tree. When the aggregate (*, G) packet bit rate (from all sources)
exceeds a threshold in Kbps, this triggers the router nearest the receiver
to try to join the Source Tree. It sends a Join towards the source of the
multicast flow. Note that the prior Join it sent was towards the RP. The Join
towards the source goes router by router towards the Source until it encounters
a router that is already in the Source Tree. This adds the router near the
receiver to the Source Tree. When a packet is actually received along that
tree, a Prune is sent towards the RP. In effect, "thanks, but I'm now getting
my multicast wholesale, not retail", since this process cuts out the RP in
the middle.
The following figure shows how this works. The top left red arrows show the Join towards the Source. This gets the top blue flow going, packets being forwarded along the Source Tree. The lower right red arrows then are the Prunes, since the Shared Tree flow is no longer needed (shown as green dashed line). Note that the Source Tree packets arrive at the Receiver along a more direct path, generally with lower latency.
By the way, we control the threshold. It is configurable. Default is zero Kbps: receive one packet, and switch over to Source Tree. If we have many sources for a particular multicast group (think conference call, VoIP), then there is a (S, G) Source Tree entry for each one. If we set the threshold to never activate, then all packets go through the RP (sort of like a conference calling bridge), using only the (*, G) Shared Tree. The threshold is also used for switchback as well as switchover. Low rate (S, G) Source Trees are switched back over to the Shared Tree. The volume of traffic is checked every minute.
If a receiver wishes to join, and its neighbor router is on the SPT (Source
Tree), then the outgoing interface Shared Tree entry is copied to the Source
Tree entry, which protects against having to send traffic to the RP and then
"back" to the router on the SPT.
By the way, you may be wondering, what is the point of having the RP here? Because of the threshold mechanism, the RP gives us a way to use the Shared Tree, and control the explosive creation of state information in routers if many receivers join at the same time.
When a PIM-SM router receives a multicast packet, it checks the Source Tree for that particular source address and multicast group (destination) address. If there is no entry present, it then checks for a Shared Tree (*, G) entry for the multicast group. If entries are present for both trees, the inbound interface tells the router which tree to use. If both trees have the same inbound interface, then the RP bit for an (S, G) entry prevents duplicate packets: this indicates that the RPF interface lies along the Shared Tree.
For a multicast flow with at least one active receiver, the path between the source and the RP will be part of the Source Tree. (Note that "the path between" is a bit vague here, I'm trying to stay away from giving too much detail.)
Shared Tree entries will connect the RP to some of the receivers. The RPF interface for the (*, G) Shared Tree is the interface in the direction of the RP, not the multicast group source. That why there is the possibility of the (S, G) Source and (*, G) Shared Trees having different RPF interfaces.
The Shared Tree (*, G) entries show interfaces where a join to the RP was received, or interfaces with directly connected group members (configured or IGMP received). The Source Tree (S, G) entries show where a Join or a Prune or a Register was received.
ip multicast-routingYou also need to tell each router the RP, either for all groups or for selected groups (using access lists to specify which RP for which groups). This can be done statically with:interface ethernet 0
ip address 10.1.1.1 255.255.255.0
ip pim sparse-mode
interface ethernet 1
ip address 10.1.2.1 255.255.255.0
ip pim sparse-mode
ip pim rp-address 1.2.3.4We'll look at the other options for managing RP in the next article.
Dr. Peter J. Welcher (CCIE #1773, CCSI #94014) is a Senior Consultant with Chesapeake NetCraftsmen. NetCraftsmen is a high-end consulting firm and Cisco Premier Partner dedicated to quality consulting and knowledge transfer. NetCraftsmen has nine CCIE's, with expertise including large network high-availability routing/switching and design, VoIP, QoS, MPLS, network management, security, IP multicast, and other areas. See http://www.netcraftsmen.net for more information about NetCraftsmen. Pete's links start at http://www.netcraftsmen.net/welcher . New articles will be posted under the Articles link. Questions, suggestions for articles, etc. can be sent to pjw@netcraftsmen.net .