Configuring Cisco Optimized Edge Routing (OER)

icon Configuring Cisco Optimized Edge Routing (OER)

Introduction

This month's article is a follow-up to last month's, where we looked at the basic ideas of Cisco Optimized Edge Routing. In this month's article we'll look at the configuration I used in my lab, and at some actual show  command output for OER. We'll also talk briefly about some of the more advanced capabilities of OER.  We do need to review a little of the information in last month's article. I've tried to keep the amount of repeated material to a minimum.

Last month's article: Basics of Cisco Optimized Edge Routing (OER).

OER links in last month's article:  Links in Basics of Cisco Optimized Edge Routing (OER).

Lab Topology

This was discussed at length in last month's article. As a memory refresh, here are the high points:

  • I configured OER on only one router, matching one of the expected ways in which OER might be used.
  • My two OER router outbound interfaces were certain vlan subinterfaces on two trunks. One went to the LAN connecting my lab exit points to my home Linksys router and the Internet. The other went to another lab router (2600) with a connection to the home LAN.
  • See the following diagram.
lab diagram

Other lab info: the lab routers both have static default routes pointing at the Linksys' inside interface (192.168.1.1). These are redistributed into EIGRP. Both do PAT of inside lab addresses to the address of their external interface (192.168.3 and 4, respectively).  The Linksys does PAT again, to whatever address it picked up from Comcast (my ISP) via DHCP.  The lab 2600 acts as DHCP server for the inside lab network.

I can wirelessly connect to the Linksys or to a WAP inside the lab network, so I made sure I was connected to the lab network (with appropriate static routes on my PC). I also made sure my PC was set with only default gateway being the OER router, since normally my lab DHCP supplies both router inside addresses as default gateways to the PC.

Testing OER

I tested OER failover using several web browser windows to observe connectivity or non-connectivity. The web sites used: www.netcraftsmen.net, www.cnn.com, www.wjla.com, and www.abc.com. Nslookup results for these:

Website
Sample IP addresses from "nslookup"
www.netcraftsmen.net
63.208.176.22
www.cnn.com
64.236.16.20, 64.236.16.52, 64.236.16.84, 64.236.16.116, 64.236.24.12, 64.236.24.20, 64.236.24.20, 64.236.24.28, 64.236.29.120
www.wjla.com
216.250.230.21
www.abc.com
199.181.132.250

To cause a failure, I disallowed the external VLAN on the trunk from the lab switch to the OER router. This caused packet black-holing while leaving the interface up. Since I was using static default routing, there was no way for normal routing to discover the outage. OER to the rescue!

Normal OER with two routers uses redistribution of the OER static routes. I had to deliberately NOT do that in this lab, to avoid a routing loop when I induced VLAN failure as described above.

Test Results

After I induced the packet black holing, the web pages would not refresh for a while, until OER re-routed the outbound traffic. I eventually saw most OER prefixes shift correctly to the other interface, the one that still worked. This was using passive monitoring, via NetFlow data.

Some prefixes did not shift. I believe this was due to changes of host addresses, for example with www.cnn.com. These sites do DNS load balancing, so that at different times the hostname resolves to different addresses.  In particular, CNN appears to be using several /24 blocks, so when DNS name resolution switched to a new block, OER would have to go through its multi-minute learning and holddown cycle all over again. Specifying this prefix or an aggregate CNN prefix manually would presumably have helped with this. I did not test this, as time has been a bit tight for me lately.

I also tried active monitoring, to see if failover went faster. My non-quantitative sense is that it did go faster. OER appears to distribute the workload somewhat, causing it to determine that different prefixes are out of policy at different times. Note that I could not ping www.cnn.com from the OER router, which meant that OER could not find an in-policy route. Using a web GET would have been a better active monitor. But do you want to have to tune your OER monitors this way? Maybe for crucial sites you would do such tuning if you had to? Note that I am not condemning OER here, just noting that the defaults aren't going to work well with some sites.

Configuring OER

In working with OER, I learned about at least two gotchas. Gotchas:
  • Documented: VLAN interfaces can only be internal.
  • Live labwork (and somewhat obvious): If you're using learned prefixes, you need some live TCP traffic. No traffic, no prefixes, no learning, then no OER activity.
Another implementation thought is that you want to shake out your routing and NAT and so on before adding the complexities of OER. I would recommend verifying external connectivity, failing the main link, and seeing if the backup default route also provides connectivity. If  you have two edge routers, test to make sure things work when either external link is failed. After doing this, then add OER.

Relevant portions of the configuration from my lab OER router:

Configuration Commands
Comments
ip cef
!
interface FastEthernet0/0
...
ip route-cache flow
!
ip flow-top-talkers
top 30
sort-by bytes
match source address 0.0.0.0 0.0.0.0
Setting up to use NetFlow. May not be necessary but I wanted to be able to use show commands to look at NetFlow data anyway.
key chain oerchain
key 1
key-string 7 141817191F012939213C
Security is pretty important -- you don't want an outsider messing with your exit routing!
oer master
logging
!
border 10.20.254.15 key-chain oerchain
interface FastEthernet0/1.192 external
interface FastEthernet0/0.77 external
interface FastEthernet0/0.1 internal
interface FastEthernet0/0.2 internal
interface FastEthernet0/0.3 internal
!
learn
delay
protocol tcp
protocol udp
periodic-interval 3
monitor-period 1
mode route control
mode monitor passive
Most of the configuration of policy is on the Master Controller. You need to identify the border router(s) and the participating interfaces. The inside interfaces are used for NetFlow statistics and passive monitoring. If you want OER to learn prefixes, you have to configure that, as in the "learn" block of configuration to the left. You also need to specify the monitoring mode (passive or active) and the route mode (control or observe).
oer border
local Loopback0
master 10.20.254.15
key-chain oerchain
The configuration of Border Routers is fairly simple: tell them who to talk to (the Master Controller), what their local identity is, and how to authenticate.
router eigrp 1
network 10.0.0.0
auto-summary
NOTE: I had NO redistribution of static routes due to the special topology I ! was working with. If you have two OER routers, you probably do need to be doing redistribution.

Show Output

Here is captured routing information:

rtr1841#show ip route
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
E1 - OSPF external type 1, E2 - OSPF external type 2
i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
ia - IS-IS inter area, * - candidate default, U - per-user static route
o - ODR, P - periodic downloaded static route

Gateway of last resort is 192.168.1.1 to network 0.0.0.0

64.0.0.0/24 is subnetted, 1 subnets
S       64.236.42.0 [1/0] via 10.20.77.1
10.0.0.0/8 is variably subnetted, 8 subnets, 2 masks
... <snip> ...
C       10.20.2.0/24 is directly connected, FastEthernet0/0.2
C       10.20.3.0/24 is directly connected, FastEthernet0/0.3
C       10.20.1.0/24 is directly connected, FastEthernet0/0.1
C       10.20.77.0/24 is directly connected, FastEthernet0/0.77
D       10.20.254.1/32
[90/156160] via 10.20.77.1, 00:28:44, FastEthernet0/0.77
[90/156160] via 10.20.3.1, 00:28:44, FastEthernet0/0.3
[90/156160] via 10.20.1.1, 00:28:44, FastEthernet0/0.1
C       10.20.254.15/32 is directly connected, Loopback0
S    216.250.231.0/24 [1/0] via 192.168.1.1
S    198.133.219.0/24 [1/0] via 192.168.1.1
63.0.0.0/24 is subnetted, 1 subnets
S       63.208.176.0 [1/0] via 10.20.77.1
C    192.168.1.0/24 is directly connected, FastEthernet0/1.192
S*   0.0.0.0/0 [1/0] via 192.168.1.1
[1/0] via 10.20.77.1

The static routes to 10.20.77.1 (except for 0/0) were inserted by OER upon detecting the other route was down (via 192.168.1.1 directly).  The routes with next hop 192.168.1.1 are the ones that did not fail over (e.g. CNN, as discussed above).

Here is sample output from the "show oer master prefix" command. This is from a different time than the above capture, so they won't match up. I edited the table to cut down its size by about 10 rows.

rtr1841#show oer mast prefix

OER Prefix Statistics:
Pas - Passive, Act - Active, S - Short term, L - Long term, Dly - Delay (ms),
Los - Packet Loss (packets-per-million), Un - Unreachable (flows-per-million),
E - Egress, I - Ingress, Bw - Bandwidth (kbps), N - Not applicable
U - unknown, * - uncontrolled, + - control more specific, @ - active probe all

Prefix                  State     Time Curr BR         CurrI/F         Protocol
PasSDly  PasLDly   PasSUn   PasLUn  PasSLos  PasLLos
ActSDly  ActLDly   ActSUn   ActLUn      EBw      IBw
--------------------------------------------------------------------------------
66.35.208.0/24       DEFAULT*      444 U               U
64.154.80.0/24       HOLDDOWN      223 10.20.254.15    Fa0/0.77        STATIC
U        U        0        0        0        0
U        U        0        0        1        0
64.158.176.0/24     INPOLICY*        0 10.20.254.15    Fa0/0.77        U
U       28        0        0        0        0
U       28        0        0        0        0
63.208.176.0/24     INPOLICY*       @3 10.20.254.15    Fa0/0.77        U
28       35        0        0        0        0
37       26        0        0        2        0
209.62.176.0/24      DEFAULT*      477 U               U
64.236.24.0/24       DEFAULT*       47 U               U
64.236.16.0/24       DEFAULT*       75 10.20.254.15    Fa0/1.192       U
U       42  1000000   648648        0        0
U        U        0  1000000        1        0
64.236.42.0/24      INPOLICY*        0 10.20.254.15    Fa0/0.77        U
U        U        0        0        0        0
U       52        0        0        0        0
69.25.43.0/24       INPOLICY*        0 10.20.254.15    Fa0/0.77        U
U       64        0        0        0        0
U       64        0        0        0        0
63.208.55.0/24       DEFAULT*       35 10.20.254.15    Fa0/1.192       STATIC
U        U  1000000  1000000        0        0
U        U        0        0        1        0
65.214.39.0/24       DEFAULT*      504 U               U
198.87.215.0/24      DEFAULT*      456 U               U
64.233.167.0/24      DEFAULT*      @55 10.20.254.15    Fa0/0.77        U
U        U        0        0        0        0
U        U        0        0        1        0
64.12.174.0/24       DEFAULT*       28 U               U
64.236.22.0/24       DEFAULT*       57 U               U
216.250.230.0/24    INPOLICY*        0 10.20.254.15    Fa0/0.77        U
U        U        0        0        0        0
26       26        0        0        1        0


Here are the key fields in the above show command output:

Output Field
Description
Prefix
IP prefix in OER table
State
State of the prefix
Curr BR
Current Border Router
Curr I/F
Current outgoing interface
Dly
Delay (milliseconds)
EBw
Egress bandwidth
IBw
Ingress bandwidth

The "Pas" or "Act" indicates passive or active measurements. The table uses "S" for short-term, "L" for long-term measurements.

Where you see Fa0/0.77 above, the prefix is being routed via the alternative path. Where you see Fa0/1.192, the traffic is being routed directly to the exit router at 192.168.1.1 (which was down at this point). HOLDDOWN is indicating that a new prefix has been learned and the current next hop is being tested for policy compliance. INPOLICY indicates that the route is in policy compliance. When the next hop for INPOLICY is a interface that is a known problem (like Fa0/1.192), this presumably indicates that the alternative route monitoring indicated similar failure rates.

Note that when the next hop is via Fa0/1.192, there are usually many  short-term (S) and long-term (L) unreachables, visible as big numbers four columns over (in the second row in the block of 3 rows for that prefix). I haven't found a great reference on interpreting this show command output. The command reference is enough to get you started.

Here is a sample of output from the "show oer master prefix detail" command, edited for brevity to show part of one prefix block:

rtr1841#show oer master pre det
... <snip> ...
Prefix: 63.208.176.0/24
State: HOLDDOWN    Time Remaining: 137    
Policy: Default

Most recent data per exit
Border          Interface         PasSDly  PasLDly  ActSDly  ActLDly
*10.20.254.15    Fa0/0.77               31       31        0        0
10.20.254.15    Fa0/1.192               0        0        0        0

Latest Active Stats on Current Exit:
Type     Target          TPort Attem Comps    DSum     Min     Max     Dly

Prefix performance history records
Current index 8, S_avg interval(min) 5, L_avg interval(min) 60

Age       Border          Interface       OOP/RteChg Reasons                 
Pas: DSum  Samples  DAvg  PktLoss  Unreach   Ebytes   Ibytes     Pkts    Flows
Act: Dsum Attempts  DAvg    Comps  Unreach
00:00:10  10.20.254.15    Fa0/0.77                                           
60        2    30        0        0    20313        0       48        2
0        0     0        0        0
00:01:29  10.20.254.15    Fa0/0.77        Unreachable/Monitor                
64        2    32        0        0    20439        0       54        8
0        0     0        0        0
00:03:03  10.20.254.15    Fa0/1.192                                          
0        0     0        0        1      144        0        3        1
0        0     0        0        0
--------------------------------------------------------------------------------
... <snip> ...

 

Here is a table describing some of the fields in the above show command output:

Output Field
Description
Prefix
IP prefix in OER table
State
State of the prefix
Time remaining
Time remaining in current prefix learning cycle.
Policy
OER state of the prefix  (default, in-policy, out-of-policy, choose, holddown)
Most recent data per exit
Recent statistics. The asterisk (*) indicates the current exit.
Latest active stats
Active probe statistics for the current exit.
Type
Type of active probe
Prefix performance history records
Historical statistics. Updated every minute, stored for 1 hour.

From the history records at the bottom of each block, you can see what OER thinks the monitoring measurement results are.  In the above, you can see that Fa0/1.192 experienced 1 unreachable, whereas routing via Fa0/0.77 transmitted more bytes.

The above sample is from passive monitoring. Here is one from active monitoring:

Prefix: 64.158.176.0/24
State: INPOLICY*   Time Remaining: 0
Policy: Default

Most recent data per exit
Border          Interface         PasSDly  PasLDly  ActSDly  ActLDly
*10.20.254.15    Fa0/0.77                0        0       22       22
10.20.254.15    Fa0/1.192               0        0        0        0

Latest Active Stats on Current Exit:
Type     Target          TPort Attem Comps    DSum     Min     Max     Dly
echo     64.158.176.214      N     2     2      43      19      24      21

Prefix performance history records
Current index 4, S_avg interval(min) 5, L_avg interval(min) 60

Age       Border          Interface       OOP/RteChg Reasons
Pas: DSum  Samples  DAvg  PktLoss  Unreach   Ebytes   Ibytes     Pkts    Flows
Act: Dsum Attempts  DAvg    Comps  Unreach
00:03:39  10.20.254.15    Fa0/0.77
0        0     0        0        0        0        0        0        0
43        2    21        2        0
00:05:00  10.20.254.15    Fa0/0.77
0        0     0        0        0       80        0        2        2
43        2    21        2        0
00:06:09  10.20.254.15    Fa0/0.77
0        0     0        0        0        0        0        0        0
47        2    23        2        0
00:10:17  10.20.254.15    Fa0/0.77
0        0     0        0        0     3318        0       18        2
47        2    23        2        0
--------------------------------------------------------------------------------

Note the differences: data under "latest active stats" plus numbers under the "Act" rather than the "Pas" columns.

Fancier Variations on OER

OER works by re-routing prefixes to use an alternative exit. But OER also supports link policies, which can be used for load balancing. With a link policy, OER gradually uses prefix re-routing to shift traffic off a link until the link conforms with the configured policy.

OER Application Aware Routing is an enhancement introduced in Cisco IOS Release 12.4(2) T. It allows very fine-grained control based on some rather than all traffic to a prefix. The policy is applied using an extended access list, so among other things it can take into account:

  • Source and/or destination IP
  • Soure and/or destination port number
  • DSCP, IP Precedence, or TOS value
  • Protocol flags
  • Protocol type  and number

As I read the documentation, OER AAR is doing two things:

  1. Conducting per-prefix measurements filtered by the extended ACL
  2. Policy-based routing based on the results of such monitoring
In the first release of OER Application Aware Routing, the Border Routers must be one hop from each other.

Another recent OER feature is GRE over IPsec tunnel optimization, which is supported as of 12.3(11) T.  The GRE tunnel interface is configured as an external interface. You can then use OER to select between using a WAN interface and the VPN tunnel interface.  Or between two tunnel interfaces (e.g. with dual Internet providers).  You can filter prefixes based on the VPN site prefixes, so as to optimize VPN traffic to remote VPN sites.

Dead Box / Link Detection

I've been thinking about the issue of dead box detection for a while.  This started a while back when I observed that a lot of boxes such as firewalls, content switches, and encryption engines don't seem to be very smart about failover. They may react to one link going down but not another one going down. What one wants is failover if any of several interfaces goes down.

But more than that, such devices often only react to local outages. So if a connecting box starts behaving oddly, but the link stays up, they do not fail over. If you start getting an asymmetric flow, they probably do not react (example: "zombie switch" where the link is up, and packets are black-holed in one direction, with no return traffic). Some boxes (for example Cisco content switches) allow you to configure monitors to be more aware of outages or odd conditions. Configuring this on a box by box basis can be tricky.  Doing the failure analysis to make sure you've covered the various failure modes can get quite complex. ("Takes a CCIE to build, maintain, and troubleshoot" can be an indication that something is overly complex.)

I find myself wondering whether inbound OER (from Internet routers or their neighborhood towards servers) might help with such situations.  It would require a different design: having alternative "stovepipes" of firewall/switch/content switch, with OER in effect selecting between stovepipes. This is contrary to current design, which tends to do things like HSRP wherever possible. I think of HSRP and most failover mechanisms as providing single box alternatives.  My concern with HSRP and similar approaches is that I've seen or heard anecdotes that indicate that boxes do not fail over when they "should". My thought is that treating a "stovepipe" as a single unit that either passes traffic bi-directionally or does not might provide simpler failover.

On the other hand, I'm not at all sure using OER in this fashion would be a good idea.  Among the drawbacks: much slower failover.  And some additional complexity. Perhaps OER  might be better applied for choosing between two alternative "inbound complexes" of devices, where each complex has box redundancy within it. I've seen this sort of design at e-commerce sites with multiple Internet connections.

Summary

Please let me know if you're using OER and are impressed by it (positively OR negatively). I'd also be interested in hearing about non-standard OER setups, where you're using it "creatively".

The documentation has some brief comments about OER and multicast traffic.  Due to the multicast RPF check, some interesting things might happen to multicast in the presence of OER. On the other hand, how likely are we to be using multicast over the Internet?

OER links in last month's article: Links in Basics of Cisco Optimized Edge Routing (OER)

As often is the case, I have some questions that don't seem to be answered by the documentation. If you happen to be testing OER or quizzing your Cisco SE and find the answers, please let me know! Unanswered questions:

  1. If you have two routers, when OER stuff the static route for a prefix, does it only do it on the exit Border Router, or does it do it on both? (Conjecture: probably just on the exit Border Router, so that redistribution "pulls" packets for that prefix to the correct BR).
  2. How can one provide for master controller redundancy or failover? (If you think about it, two active "boss (master controller) routers" stuffing  static routes into one BR would be a Bad Thing).
  3. What is the full set of issues in multicast-OER co-existence?

Your comments, questions, and suggestions for future articles are of course welcome! See below to decipher Pete's email address.

 

Dr. Peter J. Welcher (CCIE #1773, CCSI #94014) is a Senior Consultant with Chesapeake NetCraftsmen. NetCraftsmen is a high-end consulting firm and Cisco Premier Partner dedicated to quality consulting and knowledge transfer. NetCraftsmen has eleven CCIE's (4 of whom are double-CCIE's, R&S and Security). NetCraftsmen has expertise including large network high-availability routing/switching and design, VoIP, QoS, MPLS, network management, security, IP multicast, and other areas. See http://www.netcraftsmen.net for more information about NetCraftsmen. . New articles will be posted under the Articles link. Questions, suggestions for articles, etc. can be sent to This email address is being protected from spambots. You need JavaScript enabled to view it. .

2/7/2006
Copyright (C)  2006  Peter J. Welcher