|Simplicity and Layer 2|
My intent today is to continue our quest for Simplicity in Network Design in a more systematic fashion, by tackling Simplicity and Layer 2.What is simple and what is not simple at Layer 2 of a network design?
Do you know where your VLANs are right now? Obviously, they're in your switches. But what I'm trying to get at is: could someone have extended a VLAN? Can you predict which VLANs go where? Are there boundaries that VLANs absolutely do not cross?
Layer 2 Closets
I've seen two Layer 2 extreme case, poles apart from each other. Looking at them may help get at this question of Layer 2 simplicity.
Design #1 is a Layer 3 only closet design we've done (as part of any of several complete network overhauls). Any VLANs are purely limited to the closet. In the most organized form, each closet has say VLANs 101-108, where 101 is data, 102 voice, 103 guest, etc. (in some organized fashion). And the addressing is of the form 10.(building bits).(L3 closet number bits)(3 bits of 0-7 = VLAN minus 101).(host bits). Yes, that addressing isn't all that simple. At least not until you see the troubleshooting benefits (or earn the secret decoder ring).But I digress. The boundary is the closet. Admittedly, we're not opening the can of worms known as the datacenter -- yet.
Design #2 is a common design. In the most feverishly flamboyant form, there are VLANs that run from multiple closets to distribution layer to core to datacenter core to datacenter access switches to certain servers. An example might be printers and print server. I give extra credit points to the site that had a huge wallchart with color-coded lines showing which VLANs go where. That's a LOT of work to build -- but could be very helpful in troubleshooting. It looked like a printed circuit diagram. More extra credit to sites using this "sprawling VLAN" approach that track and register VLAN numbers, insisting on global uniqueness. Chosen admin tool: Excel.
Which of these would you rather administer?
My answer: the first. It'll be rock solid. And if there's a problem, if I know the IP address I know fairly precisely where the offending device is, without much research. No astonishment (i.e. complies with our starting point, the Principle of Least Astonishment).
In the second example, of course most switches are connected with trunks, and if the site likes outages, they haven't limited which VLANs are allowed on which trunks. So if someone adds a VLAN 666 to a closet, then later in the datacenter, voila, they're connected to each other. Seems like there could be a lot of room for surprises there -- max astonishment?
What makes this second design not simple? I claim that you either need to have and maintain the giant printed circuit diagram, or every troubleshooting session starts by figuring out today's version of which VLANs go where. With sprawling VLANs that are not localized, traffic may have to cross a considerable portion of the network to get to its default gateway -- and then could get routed right back in some other VLAN to a server next to the first one. So you get poor precision of traffic delivery: complex traffic patterns. And lastly, you have the risk of a bridging loop, aka Spanning Tree meltdown. Also known as "the network is down, all day" in some cases.
You can find IP addresses in Design #2. All you have to do is look at the ARP table to find the MAC address, then find the edge port that MAC address is transmitting on. Bonus points if you have a tool like Cisco Campus Manager that learns and tracks such information. Otherwise, the process of tracking the MAC address to the access layer can be mildly time-consuming.
I see the difference between these two situations as Knowing versus Having to Research. In the first case, everything follows a simple pattern that can be learned and used. In the second case, there is no structure, so everything you do involves looking around, and not just locally but anywhere the VLANs might span to.
Layer 2 Boundaries
Limiting the scope of any VLAN can help, in two ways. First, it can help with the Astonishment factor by giving you predictable limits on how far the VLAN might sprawl. Second, the extent of a VLAN is the failure domain when a bridging loop occurs, so by setting boundaries you're limiting the impact of a "VLAN Event".
In general, I and my colleagues finding that closets can be purely Layer 3 if you are willing to make the effort. We've heard a number of excuses but most if not all seem to boil down to "we don't want to". Exception: medical sites with clinical VLANs or old applications where Layer 2 is a hard-coded program requirement, or security isolation is a requirement. (See however the prior blog about Segmentation, which might provide a Layer 3 alternative such as VRF Lite.)
I'm willing to live with pseudo-Layer 3 closets, where the closet VLAN(s) terminates in routed ports on the distribution switch. I insist on routed ports to make sure that the closet VLANs do not extend across more than one closet. And yes, clinical settings may require VLANs with larger extent. The design there turns to cost trade-offs: where do we put the boundary, how many separate servers and nurse stations will it take, that sort of thing.
If someone insists, I can live with Layer 2 closets and putting the routing SVI's at the distribution layer, although I much prefer to keep the STP domain size down to 10-15 switches max. Preferably that means the VLANs also stay within the building.
I am definitely a fan of not allowing VLANs to sprawl across the closet / distribution and core / datacenter boundaries. The core network should be highly stable (cf. various Cisco design guides and the ARCH course, which Carole and I wrote version 2.0 of). STP and highly stable do not belong together, to my mind.
Part of my point here is to have clear design boundaries, so everyone including operations has the same expectations as to where VLANs may go, where they may be found. If you go with Layer 3 closets, the VLANs are pretty much contained solely within the datacenter, which greatly simplifies life. The closets pretty much become maintenance-free for long periods of time. Now that's simple!
To FHRP or not to FHRP?
(With apologies to Shakespeare.) When you're doing VSS on 6500 switches, you don't need a First Hop Routing Protocol (FHRP -- HSRP, VRRP, or GLBP), since the dual Sup dual chassis approach provides redundancy for any default gateway on the VSS chassis cluster.
When you're doing VPC on Nexus switches, you do need a FHRP, since the "real" interface and MAC addresses cannot be shared across the two chassis, which do retain their separate identities. However, when you select a FHRP, you might as well choose HSRP or VRRP as GLBP, since both VPC peers will forward frames sent to the virtual MAC of the virtual IP default gateway. Which load balances outbound or upstream traffic.
Layer 2 Datacenters
This blog is getting a bit long, so I'm going to save the topic of VLANs in datacenters for another blog.
What's the Answer?
Minimize risk and maintenance hassle by keeping VLAN scopes small. Do Layer 3 closets if you can, or pseudo-Layer 3 closets to reduce cost.
As with teenagers, you need to set and enforce boundaries well before you experience problems. I prefer to contain VLANs in the datacenter (and maybe some in perimeter or DMZ areas -- which are often part of the datacenter too).
If you do not set boundaries, someday a bridging loop will find you. How big an outage can you withstand? One building? Entire campus? Entire datacenter? Multiple datacenters? Multiple campuses? Are you willing to bet your career on it? Because you may well have in fact done so!
Should your career survive STP meltdown day, how much do you want to have to rework? Over-engineering for robustness (i.e. Layer 3) means some bother now but you will not have to ever go back and do it over again. In a hurry.
Part of the reason I'm making this point (or ranting, depending on your Point of View) is that I see folks not pushing back, e.g. at server admins who want any VLAN anywhere in the datacenter. I see it as push back now or you may pay for not doing so later. Namely, when the entire datacenter has had 3 outage days in one quarter. The sheer magnitude of effort to clean up Layer 2 sprawl means that the network gets added to the list of "fragile apps that break when you look cross-eyed at them", the Permanent Repository of Pain and Cause of Late-Night Change Windows (PRPCALNCW). That'sa "we dug ourselves the hole and we now have to live with it" situation -- I personally would find it hard to be working constantly in such a situation.
Prior Blogs in this Series
Relevant Prior Blogs
Here are some somewhat on-topic blogs by myself and other Chesapeake NetCraftsmen people:
You can find many more at http://www.netcraftsmen.net/resources/blogs.html
See also the blog by Chesapeake NetCraftsmen's Augustine Traore, where he measured the benefit of various STP defensive measures, Protecting Switches Against Layer 2 Loops. Nifty idea, good stuff!