| Designing FabricPath |
|
What is FabricPath?FabricPath is a routed alternative to Spanning Tree Protocols (STP) and Cisco virtual port-channel (vPC) technology. Reasons for using FabricPath is that it provides routed protections against spanning tree loops, keeps all links forwarding (unlike STP), and is a bit easier to configure than vPC, especially as your datacenter becomes larger. The arguments against FabricPath: it might not yet be mature, and it is Cisco-proprietary, whereas TRILL will be the standard for FabricPath-like behavior. Scott Lowe has a nice basic writeup about FabricPath at http://blog.scottlowe.org/2011/07/12/brkdct-2081-cisco-fabricpath-technology-and-design/ What Does Fabric Path Do?FabricPath does MAC-in-MAC encapsulation to transport Layer 2 frames across a FabricPath network. The transport is based on routed forwarding to another FabricPath switch. As with SAN FSPF routing, the FabricPath routing is a link state protocol which tracks how to get data to one of the participating switches. When a L2 frame comes in a "classic Ethernet" port, a LAN MAC switching lookup occurs. If the switching lookup indicates that the destination MAC is reached via FabricPath, it will also indicate which FabricPath edge switch to send the frame to. That switch ID can then be looked up in the FabricPath routing table. A path is chosen, the frame is MAC-in-MAC encapsulated, and routed over to the destination FabricPath switch. That switch de-encapsulates the frame and forwards it in normal L2 fashion. FabricPath allows for multiple "topologies", i.e. separate layers of FabricPath operation. It also does multi-pathing, up to 16 paths, each of which can be a 16-fold 10 Gbps port-channel. FabricPath uses a time to live (TTL) to protect against short-lived or other routing problems (bugs?) that might somehow cause a routing loop. The underlying routing is based on IS-IS, as is TRILL. (Brocade used the program code they have, so their TRILL implementation is based on FSPF.) Why FabricPath not TRILL?FabricPath appears to have scaling benefits compared to TRILL. One is conversational learning, i.e. an edge device learns MAC address / switch mappings only for the MAC addresses some locally-attached system behind the FabricPath edge device is actually conversing with. The edge devices do not learn all source MACs seen via ARP flooding. Per the article at http://lamejournal.com/2011/05/16/layer-2-routing-sort-of-and-trill/, it sounds like TRILL can optionally learn all MAC addresses from edge devices. This seems rather undesirable to me. The article compares Cisco OTV, which tracks reachability of MAC addresses. Fair enough, that may be a limiting factor for OTV. Which begs the question, if I'm criticizing TRILL for promiscuous MAC learning, shouldn't I do the same for OTV? Probably. FabricPath allows for vPC+, which enables dual-active FHRP behavior at the edge. This is useful for scaling up routing off off FabricPath VLANs. FabricPath peers only on point-to-point links. To me, that's a distinct plus for bandwidth tracking and preserving the routing model end-to-end. I see only risk from having switches interconnecting FabricPath routing peers. Other than that, the web seems to have a lot of noise but little signal on the FabricPath versus TRILL topic. Designing FabricPathI'm amused by what I'm seeing in print. Most FabricPath designs show a spine-edge approach, as in the following diagram.
Note: the heavier links are dual-link vPC peer-link port-channels, drawn this way to reduce visual clutter. I like this design. It is a CLOS fabric, an optimal structure for maximizing bandwidth between arbitrary (or selected) endpoints. If you want more bandwidth, you can either add links, or add spine nodes. If you start exceeding the 16-fold multi-pathing limit, you can port-channel links between the same switch pairs to add bandwidth without pushing beyond 16 paths. What we do is turn the middle into a FabricPath routed domain. We do that by configuring the interfaces shown in red to be fabricpath links.
The other configuration step is to specify which VLANs are connected across the FabricPath "red zone" above. And to configure a low root bridge priority on the FabricPath switches, making them all equal as root bridge. In effect, the switches and red links above form one giant root bridge switch, interconnecting whatever edge switches are not shown at the bottom of the diagram. The following diagram may visually suggest that better:
Concerning TRILL design, a small percentage of what I've seen seems to have diagrams like the above. The rest seem to be thinking based on Radia Perlman's RBridge concept, which I would describe as "oatmeal with raisins" -- a gluey blob of Layer 2 oatmeal with RBridge "raisins" scattered throughout. For various flows, different RBridges forward between VLANs. How you troubleshoot that sort of design is what puzzles me, since it seems like you have a Layer 2 and encapsulated routing mix where it might be challenging to identify which device encapsulates a given flow, also requiring lucid thinking and good understanding of Layer 2 forwarding and TRILL. So maybe those lumpy diagrams are just conceptual and nobody really intends to do TRILL that way? Brocade does have pictures that look mighty familiar (and structured): http://www.brocade.com/company/news-events/newsletters/BA1209/0912_technology_showcase.html. Juniper doesn't like TRILL, but shows a structured diagram as well, in http://www.juniper.net/us/en/local/pdf/whitepapers/2000408-en.pdf. Congestion is easily managed in the above diagram, in the sense of monitoring a relatively small set of links between spine and edge, and adding bandwidth where needed. Load balancing should take care of un-evenness, unless there are small numbers of flows of vastly different magnitude. Migrating to FabricPathOne of the drawbacks to Juniper's QFabric is that it is apparently all-or-nothing. You can start with a small QFabric and then expand. If you buy it and don't like it, what's your alternative? I see FabricPath as being incremental. You can migrate vPC edge pairs to FabricPath one pair at a time. So you might try something like a FabricPath to a pod with two Nexus 5500's and some servers, and then gradually dial up the size of the FabricPath domain. There was a good talk at CiscoLive 2012 on this topic. It has a lot of diagrams, includes a couple of things I hadn't thought about (not that I'd worked through a FabricPath migration in detail), and includes cutover timing information so you can plan how long each step should take. The presentation can be found at https://ciscolive365.com/connect/search.ww#loadSearch%searchPhrase=fabricpath&searchType=session&tc=0&value(profileItem_10017)=10173. It includes topics like moving your vPC peer link from M1 ports to F1 ports to support FabricPath and vPC+. (That was session BRKDCT-2202. Also, session BRKDCT-2081 may be of interest for more fundamentals, e.g. how FabricPath works.) In general, CiscoLive 365 (Virtual) sessions are at https://ciscolive365.com/connect/search.ww#loadSearch%searchPhrase=&searchType=session&tc=0. Registration is free this year, as far as I know. And the San Diego CiscoLive presentations do seem to already be posted! Comments (13)
![]() written by Adam Raffe, March 18, 2013
One of the considerations if you are looking at FabricPath for DCI is how your multi-destination trees will be built. You will probably find that your broadcast / multicast / unknown unicast traffic is hairpinned between your sites due to the fact that the MDT root will reside on one site or the other (see http://adamraffe.com/2013/03/1...erconnect/).
It's also a bit more difficult to do FHRP localisation with FabricPath compared to OTV. written by Craig Johnson, March 18, 2013
I commented on the other blog above - FHRP localizaton is possible today with vPC+ on the edges as vPC+ will forward L3 to anything with the HSRP mac as destination. Ingress needs GSLB or LISP or something.
Also, it is absolutely very possible to cause an unknown unicast storm over fabricpath - very bad for DCI. Can't do that with OTV. written by Craig Johnson, March 18, 2013
Normally you are right - FP would be confused by the vmac. vPC+ changes this a little - the outbound vmac still lives only on one side, but vpc+ will intercept local traffic destined for a remote hsrp vmac (it installs a hardware rule - similar if it was just regular vpc and the hsrp active was on the other peer). Anycast HSRP (coming soon!) is another solution to this if you don't want vPC+
LISP is just one solution to the ingress solution. Really, you just need anything that can track server placement on one side or the other of the DCI and instantiate a /32 for wherever that server is, so that you will closest inbound route. LISP accomplishes this by tracking the IP on a local router, but lots of IP mobility solutions would work. Internet LISP is different than LISP in the enterprise, IMO. You can do a lot of cool things that don't require provider cooperation. Different blog post.
written by Adam Raffe, March 19, 2013
I had a chat with Craig about this earlier - I've made a small change to the post on my blog to make it slightly more accurate :-) So to summarise:
- You can do FHRP localisation on the 5K using mismatched passwords (you can't use port or VLAN ACLs for this purpose on the 5K). - You can't do FHRP localisation with FabricPath if you have other FP switches in the domain which aren't running L3 (such as at a third site). This is due to the MAC learning confusion mentioned above. - FHRP localisation with FabricPath on the 7K isn't currently supported. written by Hans, March 20, 2013
Thanks for all the input from all. It's pretty clear to me now the benefits of using OTV as opposed to FP for a DCI.
written by RonB, May 17, 2013
Am I correct in assuming that the SVIs are located on the spine nodes, as well as FHRP? Or, are they located on the next hop from the spine nodes, connected via vPC+?
Thanks in advance for the clarification. -rb written by Peter Welcher, May 18, 2013
I believe either way. I have a mental note to look for Cisco discussion of pros/cons of each approach. I believe I've seen something (Networkers last year?) indicating that ARP table size is a consideration. So Nexus with M1 card "off to the side" or ASR1K might have advantages compared to say putting the SVI's on the F1/F2 card that's doing FabricPath.
Write comment
|












I've seen gradually increasing interest in Cisco's FabricPath technology, so it seems time to talk about designing for FabricPath. I'm going to provide some opinions and overview, and then point you at some CiscoLive 2012 presentations. I see little point to re-inventing details that have been done well elsewhere, and do hope that I'm helping by pointing out resources people might not be aware of.





I can't see any particular reason why not if the fibre infrastructure is sufficient between data centers.