The Missing Link



  Peter J. Welcher
 
   


 

Introduction

Hello again! Last month we discussed some background info about Certificate Authorities. You can find the article at:  http://www.netcraftsmen.net/welcher/papers/certauth01.html

I planned to write about lab work with the Cisco  IOS Certificate Authority for this month's article. However, I didn't look ahead, and the Feature Navigator now tells me this feature is supported on 2620 XM but not 2620. It's the usual CIsco IOS image size sort of thing, my convenient lab router has enough RAM (64 M) but not enough flash (16 M, needs 32 M). Time to punt (seeing as it is football season as I write this).

So this month we'll talk about something else.

I'd like to thank Chespeake Netcraftsmen's David Yarashus for this month's topic.  He and I have been working in several large enterprise and government organizations' designs using 3 or 4 layers of campus hierarchy. We've had some good discussions about why or why not should distribution layer switches be linked. Since he and I have both recently had fun evenings troubleshooting situations where the lack of a link was part of the problem, it seems like a timely topic to talk about. The topic may start off sounding a bit trivial, but I hope by the time we're done you'll see that it really does matter! And I do hope that along the way I manage to make it interesting as well.

The rest of this article discusses the question at hand, and then goes through some Real World situations where we've seen this problem arise.

The Basic Setting

To explain what the fuss is about, see the following figure. Recall that L2/L3 Hierarchical Design uses a L3 switches at the distribution layer, which might be for the entire building or for a group of floors, part of the building, etc. This approach has the virtue of modularity, so that you can use a cookie-cutter design.

Missing Link, basic drawing

The big virtue to Hierarchical Design is limiting the size of spanning tree domains, so that L2 issues, broadcast storms, etc. don't affect too much of your network. I and my coworkers see security and some other folks now recommending disabling STP (Spanning Tree Protocol) on user and server ports. If you've ever had to troubleshoot a Spanning Tree Loop, you'll know why we strongly advise that you leave STP running on access ports. Think of it as insurance against having a really bad day (or week!). We've seen 3 or 4 enterprise melt-downs now, they aren't pretty, it takes considerable time to track down the cause, and they can be career-impacting events. We've seen too many shops say "we'll be careful and so we don't have to worry about spanning tree loops". Well, other people may connect or rearrange things. They might not be as careful. And it's too easy to have accidents. Conclusion: don't disable STP on switches!

Ok, if you have a couple of hundred users on two to four switches, this all may seem a bit abstract to you. I do recommend using a two layer hierarchy there, with say one or two core L3 switches and a bunch of access switches, sized to fit number of users, budget, and other needs. The key thing to realize in smaller networks is that networks tend to grow, sometimes quickly, and if you start out modular, you won't have to spend time reworking things. We've seen some ad hoc and daisy-chained switch networks that "just kept growing" over time. They may work, but they can also take substantial time to administer.

As more and more sites use L3 switches in wiring closets and elsewhere, the traditional link between distribution layer switches seems to be vanishing. We also see sites with L2 closets not using the link, which can be a rather bad idea as well, unless you have amazing levels of internal discipline. The next section explains why.

VLAN Black-Holing

For this section, please refer to the following figure.

VLAN black-holing figure

Suppose the "Missing Link" isn't there. In this story, the access switches are Layer 2 only. Suppose the blue links are say VLAN 10, all in one subnet. Perhaps all the users on the connected access switches are in VLAN 10 as well. The two distribution switches are connected to that subnet. Since they are L3 switches (routers), they advertise the connected subnet to R.O.N. (the Rest of The Network).

The problem I'm about to describe can happen even if VLAN 10 only goes to one closet, but some other port  somewhere on a distribution switch is in VLAN 10. 

Suppose a failure occurs, as indicated by the red X.  Because another port on switch A is in VLAN 10 and is up, the L3 interface VLAN 10 is also considered to be still up. So switch A continues advertising that subnet to RON. Switch B also advertises that subnet, since it too is connected to VLAN 10. As packets are sent from RON back towards users, the packets may go to A or B. But if they go to A, they have no way within VLAN 10 to reach switch C -- VLAN 10 has become discontiguous (not all in one piece). In that case, A has to drop the packets. This is known as "black-holing". They go into A and never come out. Even after the user MAC address ages out, flooding the packets through the parts of VLAN 10 that A is connected to doesn't reach switch C. Eventually, ARP ages out, and the ARP broadcast from A also fails to reach C.

The diagram was updated 1/16/2005 for clarity, since either a double outage is needed to create the problem, or there has to be only one closet switch (plus some other port on switch A) in VLAN10.  Even if there is a single outage in the above picture, you have traffic to C going via one of the other closet switches and switch B, which is undesirable. Better to add the dotted link!

Usually we design for this setting with the Missing Link a trunk between A and B. We then allow VLAN 10 across the trunk. Consider our failure scenario now. When the packet arrives at A, it is connected to the trunk, even after the outage.  Flooding frames in the VLAN still reaches C. After STP settles, packets from the user will probably update the MAC tables in A, B, and C, providing an optimal L2 path from A to C and the end user.

Yes, you can plan to absolutely never ever put more than one port on a distribution layer switch into any closet/user VLAN. But will everybody faithfully do that? Will new employees get the word and understand the importance? If somebody fails to follw the plan, you won't notice until there's an outage and you get to figure out why the user lost connectivity, despite the redundant connections. Or maybe there's a department server you hook up to the distribution switch, and put into the user VLAN. In short, planning this way fails to tolerate exceptions. And while we try not to have to create them, some business or networking situations require exceptions, time after time.

By the way, this can also happen with L3 switches if some other port on the distribution switch gets put into the same VLAN as the link to the L3 switch. That's why with the newer Cisco code we prefer to make switch-to-switch links routed links, i.e. put the address on the Gig interface rather than on a VLAN interface.

Please note that I still do believe in portfast (and specifying non-trunking) for user ports. Portfast short-circuits some of STP startup. Whereas BPDU filtering disables the ability of STP to function, almost the same thing as turning it off.

Summarization Misses That Link!

Change the story now. For now, suppose the routing protocol is OSPF. Suppose some or all of the access layer switches are L3 switches, i.e. routers. Suppose the distribution switches are OSPF Area Border Routers (ABR's). This is commonly done to allow the subnets in the building or L3 module to be summarized outside the building. Suppose you're summarizing like that.

Now suppose a link failure occurs. You may see traffic following a path like that in the following picture.

OSPF missing link figure

What happened is that the packet arrived at switch/router A due to the summary route. Router A learned the specific subnet route from D who learned it from B. And B is connected to the relevant subnet, if switch C is L2-only. So traffic takes the indicated round-about path. This can be particularly disconcerting if one intended all the access switches to be non-transit. If the volume of traffic clobbers switch D's links, this can be disconcerting, especially if D happens to connect to the mainframe (as in one outage we recently saw).

Nothing said above is really OSPF-specific, other than the picture. So the above might  in fact also happen with EIGRP, assuming you're doing route summarization at the switches A and B (sort of an EIGRP ABR setting). The problem is related to the summarization, not the routing protocol.

The reason the story started out with OSPF is that OSPF can take some really weird paths when there's an outage. The reason is OSPF's routing rules:

  • Traffic within a region cannot exit the region
  • Traffic between regions must exit the first region, transit area 0 without detours via another region, then exit and travel the rest of the way within the destination region

The way to prevent OSPF weird paths is to link A and B in the picture, preferably with a link on each side of the area border. That way traffic can get from A to B within area 0 and also within area 5.

What about with EIGRP? It seems like if you're summarizing, it's a good idea to have a link within the "area 5" part of the above picture, i.e. a link between A and B that does not summarize the building routes. And make any routers in the "area" EIGRP stubs (non-transit routers).

Links on both sides of the border

We've seen a lot of sites going to "passive-interface default" for large L3 switches. This keeps from forming a lot of OSPF or EIGRP adjacencies. If you do this, you do have to remember to explicitly "no passive-interface ..." for the link between A and B. This is a nice control for switches with enterprises with most interfaces connected to user or server VLAN's. You un-passive the interfaces that connect to the outside world or the other routers.

HSRP and What It Misses

Now we get to a situation I ran into recently at 4 AM. The following figure illustrates the problem.

HSRP problem

The network in question happened to have an MPLS VPN core running over optical gear, but while they added complexity and other factors to troubleshoot, they're irrelevant to the present story. A chain of events led to no OSPF adjacencies over the link with the red X (and one or two other places). My guess at the time was stuck OSPF state due to changes, probably taking out OPSF authentication. (The order in which you delete the commands may matter in some IOS releases).

Router A connected to a GSR and I didn't have permission to bounce the OSPF process on the GSR. The first focus was getting up and running. Router B was happily talking OSPF to the MPLS core network. It had an MPLS VPN configuration error at the time, but that's not really relevant here. For security and simplicity, the Customer Edge routers had static routes and we didn't have access to them at the time because another organization controlled them. But I guessed that the design had the HSRP address as next hop, and later it did turn out the static routes did in fact forward traffic to the HSRP primary, A. Some voice gateways in a separate VLAN had B as HSRP primary, and I was told that they were actually happily forwarding.

The problem was that the customer static routes were forwarding to A, as shown by the blue arrows. But A had no routes to anywhere but connected subnets. So it was dropping packets for other destinations. It had no way to pass them to B, because there was no OSPF on the customer side of routers A and B.

The solution at the time was to change the HSRP priorities in A and B to make B primary and allow B to preempt A. That swung the traffic from the customer edge routers over to B and got the customer up and running. That's how I determined that the guess about the HSRP address being the static route next hop was correct.

The interesting thing to me here is the design point it makes. There are two design methods that come to mind for avoiding this black-holing problem:

  • Connect A and B with a link inside area 0 (the provider side area for OSPF)
  • Have HSRP track state
The first of these is basically what we just went over in the Summarization section above. If A and B were connected and running OSPF, then A would be able to route outwards via B. One reason this hadn't been done was apparently that A and B were 7301 routers, with 3 Gig links. These had been used up, one to the MPLS side and two to the local switches and VLAN's. So creating a link inside OSPF area 0 would have required adding another VLAN and running backbone traffic through the customer-side trunks and switches. This is enough of an exception that one can imagine it leading to difficulties, possible security concerns, etc.

The other approach would be to use HSRP tracking. This feature in the past allowed you to configure HSRP to track the state of a link. If the link went down, the HSRP priority would be lowered, allowing the other router to become the primary for HSRP. In this particular incident, the routers considered the A to Core link to be up. In fact, we used it to telnet to A from the NOC (telnet to the core GSR, then to the directly connected interface). However, HSRP can now track state of objects. So loss of routes or connectivity could be used to trigger HSRP failover. In the case above, the customer routers needed to establish a GRE tunnel to run IBGP over (I'm not going to go into why). So loss of a route to the other tunnel endpoint might have been an interesting object to track. Tracking routing state would not have sufficed, routers A thought OSPF  was active but saw no neighbors.

We don't have space to go into details here. The object tracking feature was added beginning in Cisco  IOS 12.2(15) T, with enhancements in 12.3 as well. Things you can track:

  • IP routing state of an interface
  • Line protocol state of interface
  • IP route reachability
  • Threshold for IP route metrics
  • Lists of the above and boolean expressions
  • SAA round-trip information

There's a new acronym to go with this: FHRP = First Hop Routing Protocol (HSRP, GLBP, VRRP). They can all track objects in the new code.  Static and Policy-Based Routing can also do this!

Summary

I haven't seen most of this anywhere in print / web form, with the exception of the HSRP design issue noted above. Here are the links concerning FHRP's and tracking objects:
 
Tracking objects: http://www.cisco.com/univercd/cc/td/doc/product/software/ios122/122newft/122t/122t15/fthsrptk.htm

Standby track command:  http://www.cisco.com/univercd/cc/td/doc/product/software/ios122/122cgcr/fipras_r/1rfip2.htm#wp1021373

FHRP's and SAA:

http://www.cisco.com/en/US/products/sw/iosswrel/ps5207/products_feature_guide09186a00801d2d74.html

Objects and routing:

http://www.cisco.com/en/US/products/sw/iosswrel/ps5207/products_feature_guide09186a00801d1e95.html

http://www.cisco.com/en/US/about/ac123/ac114/ac173/Q2-04/department_techtips.html

http://www.cisco.com/en/US/about/ac123/ac114/ac173/Q2-04/department_techtips.html

This topic definitely feels like an article I'm going to want to write, after some lab time!

Your comments, questions, and suggestions for future articles are of course welcome! See below to decipher my email address.



Dr. Peter J. Welcher (CCIE #1773, CCSI #94014, CCIP) is a Senior Consultant with Chesapeake NetCraftsmen. NetCraftsmen is a high-end consulting firm and Cisco Premier Partner dedicated to quality consulting and knowledge transfer. NetCraftsmen has ten CCIE's, with expertise including large network high-availability routing/switching and design, VoIP, QoS, MPLS, IPSec VPN, wireless LAN and bridging,  network management, security, IP multicast, and other areas. See http://www.netcraftsmen.net for more information about NetCraftsmen. Pete's links start at http://www.netcraftsmen.net/welcher . New articles will be posted under the Articles link. Questions, suggestions for articles, etc. can be sent to pjw <at> netcraftsmen <dot> net (formatted this way to fool email harvesting software).

11/4/2004, updated 1/16/2005
Copyright (C)  2004  Peter J. Welcher