Last month we took a look at how easy it is to configure Frame Relay. Now let's talk about the design issues in Frame Relay. Design becomes particularly important as you scale up your Frame Relay network.
A little disclaimer: there's more to be said than I can possible cover in this article, so you should take a look at the Cisco Internetworking Design Guide, on UniverCD. The Cisco Design course (CID), which I teach, also has sections on WAN, Frame Relay, and ATM design.
Small Frame Relay networks of 10-20 sites are often built with a hub-and-spoke or star topology.
Figure 1: Star Topology
In such networks, Permanent Virtual Circuits (PVC's) are provisioned to a central site. This doesn't provide redundancy. But access lines are usually considered to be the component most likely to fail. And redundant access lines are costly, particularly if diversity is desired. If redundant lines are in the same trench, conduit, or overhead lines, they're vulnerable to both being taken out in an accident. Within the provider network, circuits are becoming fairly robust.
An alternative that doesn't seem to be getting much use is the double star topology, with PVC's to two hubs for redundancy or load balancing.
With small networks, the main design issue is what approach to use in configuration. I've been recommending that people use subinterfaces, since otherwise split horizon issues may become an issue. If using IP with the IGRP routing protocol, split horizon is disabled on Frame Relay interfaces, at least in recent IOS releases. So IP routing with IGRP works correctly whether or not you use subinterfaces. Creating the subinterfaces is good discipline, leaves an obvious place to add other features, and allows per-PVC access lists, bandwidth statements, descriptions, etc.It ensures that the IP addressing scheme won't have to change at some future date, to match subinterfaces added in support of other features such as bridging, IPX, or AppleTalk. So use subinterfaces.
With larger Frame Relay networks, the major design constraint is bandwidth. First, you have to ensure the access lines have enough bandwidth to handle the data being sent from other sites. Let's stick with the star topology a moment longer. If 10 sites have a CIR (Committed Information Rate) of 28Kbps, the central hub site may have 280 Kbps arriving on its access line. If the sites can burst to 56Kbps, the central site might burst to 560Kbps. Usually we figure bursts won't come from all remote sites at the same time, and provide the central site with an intermediate capacity. Depending on local rate structure, access lines may well be either 56K or T1, with no point to anything in between. If so, we might go with a T1 from the Frame Relay service provider to the central site rate, figuring there will be traffic growth anyway.
But that's not the only way bandwidth enters into things.
Frame Relay is not a broadcast medium. But a router participating in dynamic routing needs to send updates to its peers, at the other end of the PVC's emanating from itself. There are really only two ways to do this. One is with switch multicast, where the router sends on a special DLCI and the switch simulates a broadcast. There is a standard for this, but I have not heard of a switch or service provider who implements this. The other approach is what's currently used: the router duplicates the broadcast and sends it directly to each intended recipient.
One effect is on the CPU: there is some use of the CPU in duplicating the broadcast packet(s). One copy per peer router. A 10 packet routing update going to 50 peer routers results in 500 outgoing packets.
These packets might hit the outbound serial interface queue at one time. This can demonstrably tie up the queue (and interface) for a while, obstructing normal traffic. A smarter strategy might be to dribble out a few packets at a time. Cisco has a broadcast queue that can be used for this.
Another effect is that the packets may take a while to feed out the serial interface, the serialization delay in emitting bits on a slow link.
A related effect is on bandwidth. The outgoing copies may consume a fair amount of bandwidth.
Various Cisco documents suggests the latter is an important sizing criterion. Total bandwidth consumed by routing updates and other overhead activity should be less than 20% of the link bandwidth. Otherwise resources and time are wasted outputting these overhead packets.
Some estimation is thus in order in designing a Frame Relay network of any size. How chatty is the routing protocol? How often do updates go out? How big are they? Are there hellos? Keepalives?
One way to estimate overhead bandwidth is with a spreadsheet. For each source of overhead packets, list the bytes per update and how often the updates are sent. Divide and scale the result to obtain bits per second (bps). For example, suppose 100 IP RIP routes result in 4 packets. Each packet has 38 bytes header plus 20 bytes per route, or 38 * 4 + 100 * 20 bytes, 2152 bytes. Say there are 30 peer Frame Relay routers. That's 64560 bytes every 30 seconds, or 17216 bps (bits per second). Do this on a 56K line and about 30% of the bandwidth is just IP RIP updates -- not good!
The nastiest consumer of bandwidth may be Novell IPX SAP updates. You get 7 services per packet, consuming a lot of bandwidth.
Suppose you add up all the numbers and too much bandwidth is being consumed by overhead activities. What are the alternatives?
Well, more bandwidth is the easy one. Another choice: less chatty protocols: IGRP or EIGRP or OSPF instead of RIP. NLSP or EIGRP for IPX instead of Novell RIP/SAP. Static routing is the ultimate in low bandwidth, also in high administrative hassle. Cisco's snapshot routing (fairly new) is a possible alternative. In reviewing protocol "chattiness" don't make assumptions: the topology may alter things.
A less obvious alternative is fewer peers. This introduces more routers, perhaps on a common LAN backbone. Or use a hierarchical structure, with access routers tied via X.25 or Frame Relay to a regional hub router, and Frame Relay tying these back to backbone routers at HQ. There is additional latency from passing packets through multiple routers, also from multiple serial/Frame Relay hops. But the resulting design is scalable and manageable. The Cisco UniverCD Design text discusses alternative topologies, as does the CID course.
Recently I saw a question on the Internet, asking whether a Cisco router could handle 500 or 1000 PVC's at the hub of a Frame Relay star. Interesting question! One reply was that the access line had better be a T3. If the remote sites have 56K access lines, T3 is in the right ballpark. Some other questions come to mind:
What is the overhead bandwidth, per the calculations described above? Assume 1 LAN outboard of each remote router. What if there are another 100 LAN segments somewhere? Static routing in the hub, default routes at the remote sites?
The CPU impact of pseudo-broadcast is a small amount per PVC. Multiply by roughly 1000 PVC's and you might have a serious impact. One approach: add DLCI's and plot the CPU load, to extrapolate to 1000 PVC's. With something like this, I'd want to talk to some Cisco engineers who'd tried something like this before assuming anything. Maybe the CPU is ok but packet buffers or RAM are a concern?
But more than that, is this good modular, scalable design? It certainly has a single point of failure! If the remote sites go to 128K, how do you scale up the central site? How do you upgrade technology? How do you provide redundancy, another jumbo star?
Since I never claim to have all the answers (just the questions), I'd better change the subject.
Redundancy is a design concern to Frame Relay users. Dial backup, either over modems or ISDN, is a common solution. ISDN has the advantage of using a single 25xx BRI port at a remote site, and MBRI on a 4000 or PRI on a 7000. It also save playing around with modems, nobody's idea of fun. Dial backup is now available on a per-PVC basis with subinterfaces (IOS release 10.3).
Using older releases or using the NBMA model (non-broadcast multi-access) may require floating static routes, static routes with a high administrative distance. Dial-on-demand (DDR) is also used. Triggering ISDN dialing on a native BRI interface with an older IOS release may also require this. The idea behind use of a floating static route: it remains inactive as long as dynamic routing provides a route. When a PVC goes down, the dynamic route eventually goes away, and then the floating static route kicks in, sending traffic out the backup interface. This traffic can then be used with dial-on-demand routing (DDR) to trigger modem or ISDN dialing. A neat trick!
A final part of design is planning for network management. The Cisco routers support the Frame Relay DTE MIB (RFC 1315). This allows for a good amount of per-PVC information to be collected. Do allow bandwidth to cover your polling!
In conclusion, design is the harder part of Frame Relay. Careful design will insure that your Frame Relay networks work as intended. The above suggests some of the issues to consider.
8/95, figure updated 1/99
Copyright (C) 1995, Peter J. Welcher