| Configuring Cisco Optimized Edge Routing (OER) |
| Tuesday, 07 February 2006 11:26 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
IntroductionThis month's article is a follow-up to last month's, where we looked at the basic ideas of Cisco Optimized Edge Routing. In this month's article we'll look at the configuration I used in my lab, and at some actual show command output for OER. We'll also talk briefly about some of the more advanced capabilities of OER. We do need to review a little of the information in last month's article. I've tried to keep the amount of repeated material to a minimum.Last month's article: Basics of Cisco Optimized Edge Routing (OER). OER links in last month's article: Links in Basics of Cisco Optimized Edge Routing (OER). Lab TopologyThis was discussed at length in last month's article. As a memory refresh, here are the high points:
![]() Other lab info: the lab routers both have static default routes pointing at the Linksys' inside interface (192.168.1.1). These are redistributed into EIGRP. Both do PAT of inside lab addresses to the address of their external interface (192.168.3 and 4, respectively). The Linksys does PAT again, to whatever address it picked up from Comcast (my ISP) via DHCP. The lab 2600 acts as DHCP server for the inside lab network. I can wirelessly connect to the Linksys or to a WAP inside the lab network, so I made sure I was connected to the lab network (with appropriate static routes on my PC). I also made sure my PC was set with only default gateway being the OER router, since normally my lab DHCP supplies both router inside addresses as default gateways to the PC. Testing OERI tested OER failover using several web browser windows to observe connectivity or non-connectivity. The web sites used: www.netcraftsmen.net, www.cnn.com, www.wjla.com, and www.abc.com. Nslookup results for these:
To cause a failure, I disallowed the external VLAN on the trunk from the lab switch to the OER router. This caused packet black-holing while leaving the interface up. Since I was using static default routing, there was no way for normal routing to discover the outage. OER to the rescue! Normal OER with two routers uses redistribution of the OER static routes. I had to deliberately NOT do that in this lab, to avoid a routing loop when I induced VLAN failure as described above. Test ResultsAfter I induced the packet black holing, the web pages would not refresh for a while, until OER re-routed the outbound traffic. I eventually saw most OER prefixes shift correctly to the other interface, the one that still worked. This was using passive monitoring, via NetFlow data. Some prefixes did not shift. I believe this was due to changes of host addresses, for example with www.cnn.com. These sites do DNS load balancing, so that at different times the hostname resolves to different addresses. In particular, CNN appears to be using several /24 blocks, so when DNS name resolution switched to a new block, OER would have to go through its multi-minute learning and holddown cycle all over again. Specifying this prefix or an aggregate CNN prefix manually would presumably have helped with this. I did not test this, as time has been a bit tight for me lately. I also tried active monitoring, to see if failover went faster. My non-quantitative sense is that it did go faster. OER appears to distribute the workload somewhat, causing it to determine that different prefixes are out of policy at different times. Note that I could not ping www.cnn.com from the OER router, which meant that OER could not find an in-policy route. Using a web GET would have been a better active monitor. But do you want to have to tune your OER monitors this way? Maybe for crucial sites you would do such tuning if you had to? Note that I am not condemning OER here, just noting that the defaults aren't going to work well with some sites.Configuring OERIn working with OER, I learned about at least two gotchas. Gotchas:
Relevant portions of the configuration from my lab OER router:
Show OutputHere is captured routing information:rtr1841#show ip route
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2 E1 - OSPF external type 1, E2 - OSPF external type 2 i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2 ia - IS-IS inter area, * - candidate default, U - per-user static route o - ODR, P - periodic downloaded static route Gateway of last resort is 192.168.1.1 to network 0.0.0.0 64.0.0.0/24 is subnetted, 1 subnets S 64.236.42.0 [1/0] via 10.20.77.1 10.0.0.0/8 is variably subnetted, 8 subnets, 2 masks ... <snip> ... C 10.20.2.0/24 is directly connected, FastEthernet0/0.2 C 10.20.3.0/24 is directly connected, FastEthernet0/0.3 C 10.20.1.0/24 is directly connected, FastEthernet0/0.1 C 10.20.77.0/24 is directly connected, FastEthernet0/0.77 D 10.20.254.1/32 [90/156160] via 10.20.77.1, 00:28:44, FastEthernet0/0.77 [90/156160] via 10.20.3.1, 00:28:44, FastEthernet0/0.3 [90/156160] via 10.20.1.1, 00:28:44, FastEthernet0/0.1 C 10.20.254.15/32 is directly connected, Loopback0 S 216.250.231.0/24 [1/0] via 192.168.1.1 S 198.133.219.0/24 [1/0] via 192.168.1.1 63.0.0.0/24 is subnetted, 1 subnets S 63.208.176.0 [1/0] via 10.20.77.1 C 192.168.1.0/24 is directly connected, FastEthernet0/1.192 S* 0.0.0.0/0 [1/0] via 192.168.1.1 [1/0] via 10.20.77.1 The static routes to 10.20.77.1 (except for 0/0) were inserted by OER upon detecting the other route was down (via 192.168.1.1 directly). The routes with next hop 192.168.1.1 are the ones that did not fail over (e.g. CNN, as discussed above). Here is sample output from the "show oer master prefix" command. This is from a different time than the above capture, so they won't match up. I edited the table to cut down its size by about 10 rows. rtr1841#show oer mast prefix OER Prefix Statistics:Pas - Passive, Act - Active, S - Short term, L - Long term, Dly - Delay (ms), Los - Packet Loss (packets-per-million), Un - Unreachable (flows-per-million), E - Egress, I - Ingress, Bw - Bandwidth (kbps), N - Not applicable U - unknown, * - uncontrolled, + - control more specific, @ - active probe all Prefix State Time Curr BR CurrI/F Protocol PasSDly PasLDly PasSUn PasLUn PasSLos PasLLos ActSDly ActLDly ActSUn ActLUn EBw IBw -------------------------------------------------------------------------------- 66.35.208.0/24 DEFAULT* 444 U U 64.154.80.0/24 HOLDDOWN 223 10.20.254.15 Fa0/0.77 STATIC U U 0 0 0 0 U U 0 0 1 0 64.158.176.0/24 INPOLICY* 0 10.20.254.15 Fa0/0.77 U U 28 0 0 0 0 U 28 0 0 0 0 63.208.176.0/24 INPOLICY* @3 10.20.254.15 Fa0/0.77 U 28 35 0 0 0 0 37 26 0 0 2 0 209.62.176.0/24 DEFAULT* 477 U U 64.236.24.0/24 DEFAULT* 47 U U 64.236.16.0/24 DEFAULT* 75 10.20.254.15 Fa0/1.192 U U 42 1000000 648648 0 0 U U 0 1000000 1 0 64.236.42.0/24 INPOLICY* 0 10.20.254.15 Fa0/0.77 U U U 0 0 0 0 U 52 0 0 0 0 69.25.43.0/24 INPOLICY* 0 10.20.254.15 Fa0/0.77 U U 64 0 0 0 0 U 64 0 0 0 0 63.208.55.0/24 DEFAULT* 35 10.20.254.15 Fa0/1.192 STATIC U U 1000000 1000000 0 0 U U 0 0 1 0 65.214.39.0/24 DEFAULT* 504 U U 198.87.215.0/24 DEFAULT* 456 U U 64.233.167.0/24 DEFAULT* @55 10.20.254.15 Fa0/0.77 U U U 0 0 0 0 U U 0 0 1 0 64.12.174.0/24 DEFAULT* 28 U U 64.236.22.0/24 DEFAULT* 57 U U 216.250.230.0/24 INPOLICY* 0 10.20.254.15 Fa0/0.77 U U U 0 0 0 0 26 26 0 0 1 0
The "Pas" or "Act" indicates passive or active measurements. The table uses "S" for short-term, "L" for long-term measurements. Where you see Fa0/0.77 above, the prefix is being routed via the alternative path. Where you see Fa0/1.192, the traffic is being routed directly to the exit router at 192.168.1.1 (which was down at this point). HOLDDOWN is indicating that a new prefix has been learned and the current next hop is being tested for policy compliance. INPOLICY indicates that the route is in policy compliance. When the next hop for INPOLICY is a interface that is a known problem (like Fa0/1.192), this presumably indicates that the alternative route monitoring indicated similar failure rates. Note that when the next hop is via Fa0/1.192, there are usually many short-term (S) and long-term (L) unreachables, visible as big numbers four columns over (in the second row in the block of 3 rows for that prefix). I haven't found a great reference on interpreting this show command output. The command reference is enough to get you started. Here is a sample of output from the "show oer master prefix detail" command, edited for brevity to show part of one prefix block: rtr1841#show oer master pre det Here is a table describing some of the fields in the above show command output:
From the history records at the bottom of each block, you can see what OER thinks the monitoring measurement results are. In the above, you can see that Fa0/1.192 experienced 1 unreachable, whereas routing via Fa0/0.77 transmitted more bytes. The above sample is from passive monitoring. Here is one from active monitoring: Prefix: 64.158.176.0/24
State: INPOLICY* Time Remaining: 0 Policy: Default Most recent data per exit Border Interface PasSDly PasLDly ActSDly ActLDly *10.20.254.15 Fa0/0.77 0 0 22 22 10.20.254.15 Fa0/1.192 0 0 0 0 Latest Active Stats on Current Exit: Type Target TPort Attem Comps DSum Min Max Dly echo 64.158.176.214 N 2 2 43 19 24 21 Prefix performance history records Current index 4, S_avg interval(min) 5, L_avg interval(min) 60 Age Border Interface OOP/RteChg Reasons Pas: DSum Samples DAvg PktLoss Unreach Ebytes Ibytes Pkts Flows Act: Dsum Attempts DAvg Comps Unreach 00:03:39 10.20.254.15 Fa0/0.77 0 0 0 0 0 0 0 0 0 43 2 21 2 0 00:05:00 10.20.254.15 Fa0/0.77 0 0 0 0 0 80 0 2 2 43 2 21 2 0 00:06:09 10.20.254.15 Fa0/0.77 0 0 0 0 0 0 0 0 0 47 2 23 2 0 00:10:17 10.20.254.15 Fa0/0.77 0 0 0 0 0 3318 0 18 2 47 2 23 2 0 -------------------------------------------------------------------------------- Note the differences: data under "latest active stats" plus numbers under the "Act" rather than the "Pas" columns. Fancier Variations on OEROER works by re-routing prefixes to use an alternative exit. But OER also supports link policies, which can be used for load balancing. With a link policy, OER gradually uses prefix re-routing to shift traffic off a link until the link conforms with the configured policy. OER Application Aware Routing is an enhancement introduced in Cisco IOS Release 12.4(2) T. It allows very fine-grained control based on some rather than all traffic to a prefix. The policy is applied using an extended access list, so among other things it can take into account:
As I read the documentation, OER AAR is doing two things:
Another recent OER feature is GRE over IPsec tunnel optimization, which is supported as of 12.3(11) T. The GRE tunnel interface is configured as an external interface. You can then use OER to select between using a WAN interface and the VPN tunnel interface. Or between two tunnel interfaces (e.g. with dual Internet providers). You can filter prefixes based on the VPN site prefixes, so as to optimize VPN traffic to remote VPN sites. Dead Box / Link DetectionI've been thinking about the issue of dead box detection for a while. This started a while back when I observed that a lot of boxes such as firewalls, content switches, and encryption engines don't seem to be very smart about failover. They may react to one link going down but not another one going down. What one wants is failover if any of several interfaces goes down.But more than that, such devices often only react to local outages. So if a connecting box starts behaving oddly, but the link stays up, they do not fail over. If you start getting an asymmetric flow, they probably do not react (example: "zombie switch" where the link is up, and packets are black-holed in one direction, with no return traffic). Some boxes (for example Cisco content switches) allow you to configure monitors to be more aware of outages or odd conditions. Configuring this on a box by box basis can be tricky. Doing the failure analysis to make sure you've covered the various failure modes can get quite complex. ("Takes a CCIE to build, maintain, and troubleshoot" can be an indication that something is overly complex.) I find myself wondering whether inbound OER (from Internet routers or their neighborhood towards servers) might help with such situations. It would require a different design: having alternative "stovepipes" of firewall/switch/content switch, with OER in effect selecting between stovepipes. This is contrary to current design, which tends to do things like HSRP wherever possible. I think of HSRP and most failover mechanisms as providing single box alternatives. My concern with HSRP and similar approaches is that I've seen or heard anecdotes that indicate that boxes do not fail over when they "should". My thought is that treating a "stovepipe" as a single unit that either passes traffic bi-directionally or does not might provide simpler failover. On the other hand, I'm not at all sure using OER in this fashion would be a good idea. Among the drawbacks: much slower failover. And some additional complexity. Perhaps OER might be better applied for choosing between two alternative "inbound complexes" of devices, where each complex has box redundancy within it. I've seen this sort of design at e-commerce sites with multiple Internet connections. SummaryPlease let me know if you're using OER and are impressed by it (positively OR negatively). I'd also be interested in hearing about non-standard OER setups, where you're using it "creatively". The documentation has some brief comments about OER and multicast traffic. Due to the multicast RPF check, some interesting things might happen to multicast in the presence of OER. On the other hand, how likely are we to be using multicast over the Internet? OER links in last month's article: Links in Basics of Cisco Optimized Edge Routing (OER) As often is the case, I have some questions that don't seem to be answered by the documentation. If you happen to be testing OER or quizzing your Cisco SE and find the answers, please let me know! Unanswered questions:
Your comments, questions, and suggestions for future articles are of course welcome! See below to decipher Pete's email address.
Dr. Peter J. Welcher (CCIE #1773, CCSI #94014) is a Senior Consultant with Chesapeake NetCraftsmen. NetCraftsmen is a high-end consulting firm and Cisco Premier Partner dedicated to quality consulting and knowledge transfer. NetCraftsmen has eleven CCIE's (4 of whom are double-CCIE's, R&S and Security). NetCraftsmen has expertise including large network high-availability routing/switching and design, VoIP, QoS, MPLS, network management, security, IP multicast, and other areas. See http://www.netcraftsmen.net for more information about NetCraftsmen. Pete's links start at http://www.netcraftsmen.net/about-us/bios/staff-articles-and-blogs/pete-welcher.html . New articles will be posted under the Articles link. Questions, suggestions for articles, etc. can be sent to This e-mail address is being protected from spambots. You need JavaScript enabled to view it . 2/7/2006 |













