|
||||||||||||
IntroductionThis month's article is a follow-up to last month's, where we looked at the basic ideas of Cisco Optimized Edge Routing. In this month's article we'll look at the configuration I used in my lab, and at some actual show command output for OER. We'll also talk briefly about some of the more advanced capabilities of OER. We do need to review a little of the information in last month's article. I've tried to keep the amount of repeated material to a minimum.Last month's article: http://www.netcraftsmen.net/welcher/papers/oer01.html. OER links in last month's article: http://www.netcraftsmen.net/welcher/papers/oer01.html#links.
Lab TopologyThis was discussed at length in last month's article. As a
memory refresh, here are the high points:
![]() Other lab info: the lab routers both have static default
routes pointing at the Linksys' inside interface (192.168.1.1). These
are redistributed into EIGRP. Both do PAT of inside lab addresses to
the address of their external interface (192.168.3 and 4,
respectively). The Linksys does PAT again, to whatever address it
picked up from Comcast (my ISP) via DHCP. The lab 2600 acts as
DHCP server for the inside lab network. I can wirelessly connect to the Linksys or to a WAP inside the
lab network, so I made sure I was connected to the lab network (with
appropriate static routes on my PC). I also made sure my PC was set
with only default gateway being the OER
router, since normally my lab DHCP supplies both router inside
addresses as default gateways to the PC. Testing OER
I tested OER failover using several web browser windows
to observe connectivity or non-connectivity. The web sites used: www.netcraftsmen.net, www.cnn.com, www.wjla.com,
and www.abc.com. Nslookup results for these:
To cause a failure, I disallowed the external VLAN on the trunk from the lab switch to the OER router. This caused packet black-holing while leaving the interface up. Since I was using static default routing, there was no way for normal routing to discover the outage. OER to the rescue! Normal OER with two routers uses redistribution of the OER
static routes. I had to deliberately NOT do that in this lab, to avoid
a routing loop when I induced VLAN failure as described above. Test ResultsAfter I induced the packet black holing, the web
pages would not refresh for a while, until OER re-routed the outbound
traffic. I eventually saw most OER prefixes shift correctly to the
other interface, the one that still worked. This was using passive
monitoring, via NetFlow data. Some prefixes did not
shift. I believe this was due to changes of host addresses, for example
with www.cnn.com. These sites do DNS load balancing, so that at
different times the hostname resolves to
different addresses. In particular, CNN appears to be using
several /24 blocks, so when DNS name resolution switched to a new
block,
OER would have to go through its multi-minute learning and holddown
cycle all over again. Specifying this prefix or an aggregate CNN prefix
manually would presumably
have helped with this. I did not test this, as time has been a bit
tight for me lately. I also tried active monitoring, to see if failover went faster. My non-quantitative sense is that it did go faster. OER appears to distribute the workload somewhat, causing it to determine that different prefixes are out of policy at different times. Note that I could not ping www.cnn.com from the OER router, which meant that OER could not find an in-policy route. Using a web GET would have been a better active monitor. But do you want to have to tune your OER monitors this way? Maybe for crucial sites you would do such tuning if you had to? Note that I am not condemning OER here, just noting that the defaults aren't going to work well with some sites. Configuring OERIn working with OER, I learned about at least two gotchas. Gotchas:
Relevant portions of the configuration from my lab OER router:
Show OutputHere is captured routing information:rtr1841#show ip route
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2 E1 - OSPF external type 1, E2 - OSPF external type 2 i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2 ia - IS-IS inter area, * - candidate default, U - per-user static route o - ODR, P - periodic downloaded static route Gateway of last resort is 192.168.1.1 to network 0.0.0.0 64.0.0.0/24 is subnetted, 1 subnets S 64.236.42.0 [1/0] via 10.20.77.1 10.0.0.0/8 is variably subnetted, 8 subnets, 2 masks ... <snip> ... C 10.20.2.0/24 is directly connected, FastEthernet0/0.2 C 10.20.3.0/24 is directly connected, FastEthernet0/0.3 C 10.20.1.0/24 is directly connected, FastEthernet0/0.1 C 10.20.77.0/24 is directly connected, FastEthernet0/0.77 D 10.20.254.1/32 [90/156160] via 10.20.77.1, 00:28:44, FastEthernet0/0.77 [90/156160] via 10.20.3.1, 00:28:44, FastEthernet0/0.3 [90/156160] via 10.20.1.1, 00:28:44, FastEthernet0/0.1 C 10.20.254.15/32 is directly connected, Loopback0 S 216.250.231.0/24 [1/0] via 192.168.1.1 S 198.133.219.0/24 [1/0] via 192.168.1.1 63.0.0.0/24 is subnetted, 1 subnets S 63.208.176.0 [1/0] via 10.20.77.1 C 192.168.1.0/24 is directly connected, FastEthernet0/1.192 S* 0.0.0.0/0 [1/0] via 192.168.1.1 [1/0] via 10.20.77.1 The static routes to 10.20.77.1 (except for 0/0) were inserted by OER upon detecting the other route was down (via 192.168.1.1 directly). The routes with next hop 192.168.1.1 are the ones that did not fail over (e.g. CNN, as discussed above). Here is sample output from the "show oer master prefix" command.
This is from a different time than the above capture, so they won't
match up. I edited the table to cut down its size by about 10 rows. rtr1841#show oer mast prefix OER Prefix Statistics:Pas - Passive, Act - Active, S - Short term, L - Long term, Dly - Delay (ms), Los - Packet Loss (packets-per-million), Un - Unreachable (flows-per-million), E - Egress, I - Ingress, Bw - Bandwidth (kbps), N - Not applicable U - unknown, * - uncontrolled, + - control more specific, @ - active probe all Prefix State Time Curr BR CurrI/F Protocol PasSDly PasLDly PasSUn PasLUn PasSLos PasLLos ActSDly ActLDly ActSUn ActLUn EBw IBw -------------------------------------------------------------------------------- 66.35.208.0/24 DEFAULT* 444 U U 64.154.80.0/24 HOLDDOWN 223 10.20.254.15 Fa0/0.77 STATIC U U 0 0 0 0 U U 0 0 1 0 64.158.176.0/24 INPOLICY* 0 10.20.254.15 Fa0/0.77 U U 28 0 0 0 0 U 28 0 0 0 0 63.208.176.0/24 INPOLICY* @3 10.20.254.15 Fa0/0.77 U 28 35 0 0 0 0 37 26 0 0 2 0 209.62.176.0/24 DEFAULT* 477 U U 64.236.24.0/24 DEFAULT* 47 U U 64.236.16.0/24 DEFAULT* 75 10.20.254.15 Fa0/1.192 U U 42 1000000 648648 0 0 U U 0 1000000 1 0 64.236.42.0/24 INPOLICY* 0 10.20.254.15 Fa0/0.77 U U U 0 0 0 0 U 52 0 0 0 0 69.25.43.0/24 INPOLICY* 0 10.20.254.15 Fa0/0.77 U U 64 0 0 0 0 U 64 0 0 0 0 63.208.55.0/24 DEFAULT* 35 10.20.254.15 Fa0/1.192 STATIC U U 1000000 1000000 0 0 U U 0 0 1 0 65.214.39.0/24 DEFAULT* 504 U U 198.87.215.0/24 DEFAULT* 456 U U 64.233.167.0/24 DEFAULT* @55 10.20.254.15 Fa0/0.77 U U U 0 0 0 0 U U 0 0 1 0 64.12.174.0/24 DEFAULT* 28 U U 64.236.22.0/24 DEFAULT* 57 U U 216.250.230.0/24 INPOLICY* 0 10.20.254.15 Fa0/0.77 U U U 0 0 0 0 26 26 0 0 1 0
The "Pas" or "Act" indicates passive or active measurements.
The table uses "S" for short-term, "L" for long-term measurements. Where you see Fa0/0.77 above, the prefix is being routed via
the
alternative path. Where you see Fa0/1.192, the traffic is being routed
directly to the exit router at 192.168.1.1 (which was down at this
point). HOLDDOWN is indicating that a new prefix has been learned and
the current next hop is being tested for policy compliance. INPOLICY
indicates that the route is in policy compliance. When the next hop for
INPOLICY is a interface that is a known problem (like Fa0/1.192), this
presumably indicates that the alternative route monitoring indicated
similar failure rates. Note that when the next hop is via Fa0/1.192, there are
usually many short-term (S) and long-term (L) unreachables,
visible as big numbers four columns over (in the second row in the
block of 3 rows for that prefix). I haven't found a great reference on
interpreting this show command output. The command reference is enough
to get you started. Here is a sample of output from the "show oer master prefix detail" command, edited for brevity to show part of one prefix block: rtr1841#show
oer master pre det Here is a table describing some of the fields in the above show command output:
From the history records at the bottom of each block, you can see what OER thinks the monitoring measurement results are. In the above, you can see that Fa0/1.192 experienced 1 unreachable, whereas routing via Fa0/0.77 transmitted more bytes. The above sample is from passive monitoring. Here is one from
active monitoring: Prefix: 64.158.176.0/24
State: INPOLICY* Time Remaining: 0 Policy: Default Most recent data per exit Border Interface PasSDly PasLDly ActSDly ActLDly *10.20.254.15 Fa0/0.77 0 0 22 22 10.20.254.15 Fa0/1.192 0 0 0 0 Latest Active Stats on Current Exit: Type Target TPort Attem Comps DSum Min Max Dly echo 64.158.176.214 N 2 2 43 19 24 21 Prefix performance history records Current index 4, S_avg interval(min) 5, L_avg interval(min) 60 Age Border Interface OOP/RteChg Reasons Pas: DSum Samples DAvg PktLoss Unreach Ebytes Ibytes Pkts Flows Act: Dsum Attempts DAvg Comps Unreach 00:03:39 10.20.254.15 Fa0/0.77 0 0 0 0 0 0 0 0 0 43 2 21 2 0 00:05:00 10.20.254.15 Fa0/0.77 0 0 0 0 0 80 0 2 2 43 2 21 2 0 00:06:09 10.20.254.15 Fa0/0.77 0 0 0 0 0 0 0 0 0 47 2 23 2 0 00:10:17 10.20.254.15 Fa0/0.77 0 0 0 0 0 3318 0 18 2 47 2 23 2 0 -------------------------------------------------------------------------------- Note the differences: data under "latest active stats" plus numbers under the "Act" rather than the "Pas" columns. Fancier Variations on OEROER works by re-routing prefixes to use an alternative exit. But OER also supports link policies, which can be used for load balancing. With a link policy, OER gradually uses prefix re-routing to shift traffic off a link until the link conforms with the configured policy. OER Application Aware Routing is an enhancement introduced in Cisco IOS Release 12.4(2) T. It allows very fine-grained control based on some rather than all traffic to a prefix. The policy is applied using an extended access list, so among other things it can take into account:
As I read the documentation, OER AAR is doing two things:
Another recent OER feature is GRE over IPsec tunnel
optimization, which is supported as of 12.3(11) T. The GRE tunnel
interface is configured as an external interface. You can then use OER
to select between using a WAN interface and the VPN tunnel
interface. Or between two tunnel interfaces (e.g. with dual
Internet providers). You can filter prefixes based on the VPN
site prefixes, so as to optimize VPN traffic to remote VPN sites. Dead Box / Link DetectionI've been thinking about the issue of dead box detection for a while. This started a while back when I observed that a lot of boxes such as firewalls, content switches, and encryption engines don't seem to be very smart about failover. They may react to one link going down but not another one going down. What one wants is failover if any of several interfaces goes down.But more than that, such devices often only react to local
outages. So if a connecting box starts behaving oddly, but the link
stays up, they do not fail over. If you start getting an asymmetric
flow, they probably do not react (example: "zombie switch" where the
link is up, and packets are black-holed in one direction, with no
return traffic). Some boxes (for example Cisco content switches) allow
you to configure monitors to be more aware of outages or odd
conditions. Configuring this on a box by box basis can be tricky.
Doing the failure analysis to make sure you've covered the various
failure modes can get quite complex. ("Takes a CCIE to build, maintain,
and troubleshoot" can be an indication that something is overly
complex.) I find myself wondering whether inbound OER (from Internet
routers or their neighborhood towards servers) might help with such
situations. It would require a different design: having
alternative "stovepipes" of firewall/switch/content switch, with OER in
effect selecting between stovepipes. This is contrary to current
design, which tends to do things like HSRP wherever possible. I think
of HSRP and most failover mechanisms as providing single box
alternatives. My concern with HSRP and similar approaches is that
I've seen or heard anecdotes that indicate that boxes do not fail over
when they "should". My thought is that treating a "stovepipe" as a
single unit that either passes traffic bi-directionally or does not
might provide simpler failover. On the other hand, I'm not at all sure using OER in this fashion would be a good idea. Among the drawbacks: much slower failover. And some additional complexity. Perhaps OER might be better applied for choosing between two alternative "inbound complexes" of devices, where each complex has box redundancy within it. I've seen this sort of design at e-commerce sites with multiple Internet connections. SummaryPlease let me know if you're using OER and are impressed by it (positively OR negatively). I'd also be interested in hearing about non-standard OER setups, where you're using it "creatively". The documentation has some brief comments about OER and
multicast traffic. Due to the multicast RPF check, some
interesting things might happen to multicast in the presence of OER. On
the other hand, how likely are we to be using multicast over the
Internet? OER links in last month's article: http://www.netcraftsmen.net/welcher/papers/oer01.html#links. As often is the case, I have some questions that don't seem to
be answered by the documentation. If you happen to be testing OER or
quizzing your Cisco SE and find the answers, please let me know!
Unanswered questions:
Your comments, questions, and suggestions for future articles are of course welcome! See below to decipher Pete's email address.
Dr. Peter J. Welcher (CCIE #1773, CCSI #94014, CCIP) is a
Senior Consultant with Chesapeake NetCraftsmen. NetCraftsmen is a
high-end consulting firm and Cisco Premier Partner dedicated to quality
consulting and knowledge transfer. NetCraftsmen has ten CCIE's, with
expertise including large network high-availability routing/switching
and design, VoIP, QoS, MPLS, IPSec VPN, wireless LAN and
bridging, network management, security, IP multicast, and other
areas. See
http://www.netcraftsmen.net for more information about
NetCraftsmen. Pete's links start at
http://www.netcraftsmen.net/welcher . New articles will be posted
under the Articles link. Questions, suggestions for articles, etc. can
be sent to pjw
<at> netcraftsmen <dot> net (formatted this
way to fool email harvesting software).
2/7/2006 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||