|Protecting a Cisco Catalyst 6500 Switch Against Layer 2 Loops|
Should you be concerned about a Layer 2 loop in your campus network? My answer is absolutely. In this article, I explain why you should take steps in protecting your campus network. This article also provides practical steps in reducing the effects of a Layer 2 loop on a Cisco Catalyst 6500. Before delving into details, let’s first define what a Layer 2 loop is. A Layer 2 loop occurs in a campus network when more than one Layer 2 forwarding path exists between two given switches. In this scenario, a switch that receives a broadcast frame sends it to all its trunk ports and access ports (same VLAN). In the presence of a loop, when campus switches forward broadcast frames to all their ports, this creates an amplification phenomenon for broadcast frames trapped indefinitely within the loop. This phenomenon is also known as a broadcast storm. It leads to an exhaustion of bandwidth, and CPU overutilization due to the presence of large volumes of broadcast frames. A broadcast storm brings a network to an unusable state, and in certain cases network administrators may lose the capability to access devices by console.
The Spanning Tree Protocol (STP) was designed to ensure loop-free Layer 2 topologies. Despite the use of STP, some situations can create Layer 2 loops such as wiring mistakes, misconfigured hosts (bridged interfaces), switch configuration mistakes, and loss of BPDU keepalives.
You may have a good campus network design, but human errors are still possible. It is critical to look beyond what can be prevented, and ask yourself the following question. If a Layer 2 loop were to occur, would my campus network be able to sustain it? If the answer to this question is no, or if you are not sure, I encourage you to explore a suggested solution presented in this article to mitigate the impact of a Layer 2 loop.
I got involved in exploring solutions to protect the Catalyst 6500 because one of our customers had experienced Layer 2 issues, and they wanted to leverage Cisco existing tools to alleviate the effects of Layer 2 loops on their Cisco Catalyst 6500 distribution switches. Their campus network consists of a three-layer model with Layer 2 connections between access and distribution switches, and Layer 3 connections between distribution and core switches.
This article presents the steps I took to develop a solution to protect the Catalyst 6500 in the presence of a Layer 2 loop. These steps include the following:
Four switches were used in a lab environment to simulate a Layer 2 loop, as illustrated in the figure below. VLAN 10 was created as a user VLAN to simulate communication between two users’ PCs.
The configuration shown below was implemented on all switches to create a Layer 2 loop, specifically a spanning tree loop on VLAN 10.
Distribution-Sw1(config)no spanning-tree VLAN 10.
Within one minute after issuing the command above, the CPU utilization had risen to 99% for all four switches. To display the current CPU utilization, the command “show processes cpu” was used, as shown below.
Distribution-Sw1# show processes cpu sorted | exclude 0.00%__0.00%__0.00%
With the CPU reaching 99%, routing adjacencies were flapping, and communication within the user VLAN 10 between PC1 and PC2 was no longer possible. A set of measurements were taken and the results are shown on the figure below.
To develop a good understanding of the type of traffic responsible for high CPU utilization, it was essential to monitor the traffic processed by the control plane.
The figure below illustrates the monitoring context for analyzing traffic that is processed by the Catalyst 6500 control plane. What is needed is a Switched Port Analyzer (SPAN), a host running a network protocol analyzer, and an Ethernet cable.
The configuration shown below defines port Gigabit 5/24 as a spanned port for CPU traffic. The traffic processed by the Route Processor (RP) and the Switch Processor (SP) is duplicated on Gigabit 5/24.
Distribution-Sw1(config)#monitor session 2 type local
The configuration shown below checks the status of a created SPAN.
Distribution-Sw1# show monitor session 2
It is important to note that after the CPU SPAN is created, it defaults to “Admin Disabled”. To make it operational a “no shut” command is needed on the SPAN.
Distribution-Sw1(config)#monitor session 2
After the “no shut” command is issued, the SPAN is put in Admin Enabled mode as shown in the output below.
Distribution-Sw1# show monitor session 2
Now that the Layer 2 loop had been simulated, and that the traffic processed by the control plane was being monitored, the next step was to select what tools, or combination of tools could be used to mitigate the impact of the Layer 2 loop. The possible tools considered for the mitigation were:
Control Plane Policing is a feature in Cisco routers and switches that enable administrators to configure QoS policies to protect the control plane against reconnaissance, denial-of-service (DoS) attacks, and other scenarios that can lead to exhaustion of CPU resources.
To limit the impact on the switches CPU, control plane policing was applied to limit traffic that could potentially affect CPU utilization and to protect other traffic such as routing against CPU resource starvation. Control plane policing uses the same class-map and policy-map commands that you may be familiar with when you configure Quality of Service. The process of creating and applying control plane policing consists of the following three steps:
The traffic targeted for classification included: EIGRP, HSRP, SSH, SNMP, TACACS, DHCP, IGMP, and PIM. I created the following class-map:
class-map match-all class-eigrp
I created a policy-map named copp-policy to apply policy restriction to traffic processed by the control place. As an example, for dhcp traffic I created the following policy.
In the policy statement above, the switch processes DHCP traffic up to a threshold of 32000 bits/sec, but any excess above this threshold is dropped by switch.
The values used for policing were based on measurement of actual traffic in the operational network(using the SPAN port). Below is a complete of list of policing statements applied to classified traffic.
After the creation of the policy map, I applied it to the control plane as shown below
After applying the policy map to distribution switch control planes, the CPU utilization was reduced from 99% to an average of 92% as shown in the figure below.
With control policing applied, communication was now possible within user VLAN 10 between PC1 and PC2. Despite the CPU utilization reduction, intermittent packet losses were observed.
To alleviate further the impact of the Layer 2 loop, the next step was to use an additional tool. Given the nature of Layer 2 loops, large volumes of broadcast and multicast traffic get amplified, as explained earlier, and storm control was a logical selection for the tool to use.
Traffic storm control is a feature in Cisco switches that can be used to monitor broadcast, multicast, and unicast traffic levels entering a given interface over a 1-second interval. Traffic gets dropped during the monitoring interval when configured thresholds are exceeded.
After observing the amount of multicast and broadcast traffic in the operational network (using the SPAN port) it was estimated that 5% of the total bandwidth was sufficient to accommodate all broadcast and multicast traffic. The following configuration was applied to all switch trunk ports to set the storm control threshold to 5%.
Distribution-Sw1(config)#int range f0/21 - 24
Combining control policing with storm control resulted in a significant improvement of the CPU utilization as shown in the figure below.
Communication within the user VLAN 10 between PC1 and PC2 became normal with no packet losses. Although the CPU was less than 40% on average, CPU spikes for up to 95% were observed for short duration of time. An observation of the traffic processed by distribution switch control planes revealed large amount of BPDU, VTP, STP, and ARP traffic. To reduce the volume of this traffic, the hardware rate-limiting tool available on the Catalyst 6500 was selected as the next improvement step.
The Cisco Catalyst 6500 switches with Supervisor 720 or Supervisor 32 engine provide hardware rate limiters that can be used to limit specific Layer 2 and Layer 3 traffic that are processed by the control plane. The advantage of hardware rate limiters is that their use does not impact CPU utilization.
The following configuration was implemented on distribution switches. The selected parameters were based on measurement of actual traffic in the operational network (using the SPAN port).
Distribution-Sw1(config)#mls rate-limit layer2 pdu 620 10
The command “mls rate-limit layer2 pdu 620 10” rate limits Layer 2 PDU protocol packets (including BPDUs, DTP, PAgP, CDP, STP, and VTP packets) to a maximum of 620 packets per second with a burst of 10 packets per seconds.
The command “mls qos protocol ARP police 64000 2000” rate limits ARP packets to a maximum of 64000 bps with a burst of 2000 bps
By combining control policing, storm control, and hardware rate-limiting, the CPU utilization was reduced to an average of 5% with a maximum peak of 28% as shown in the figure below.
With such a reduction of CPU utilization, no perceptible impact was observed on the performance of switches and the user VLAN 10 despite the presence of the Layer 2 loop.
Results from All Scenarios
The figure below depicts the results from all scenarios. This diagram provides a representation that enables a visual comparison of the results achieved for these scenarios.
The following are some basic recommendations that can help reduce the risks of occurrence of Layer 2 loops:
The results that I have presented show that by combining control plane policing, storm control, and hardware rate limiters, significant and consistent reduction of CPU utilization can be achieved in the presence of Layer 2 loops. Configuration parameters used to define control plane policing, storm control, and hardware rate limiters were based on a 72-hour measurement of traffic processed by switch control planes in the operational network using the SPAN port. These results are based on tests performed in lab settings, but the conclusions reached are representative of what can be achieved in operational environments.
Even in a well-designed campus network, Layer 2 loops may occur due to wiring mistakes, misconfigured hosts (bridged interfaces), switch configuration mistakes, and loss of BPDU keepalives. By implementing mitigating tools presented in this article, you can prevent the Catalyst 6500 from being overwhelmed in the presence of a Layer 2 loop.
You should follow Cisco design guidelines in campus network design to minimize risks of spanning tree loop occurrence. One important goal in the campus network design is to minimize, as much as possible, the span of broadcast domains by reducing VLANs to a single wiring closet, and using high-speed Layer 3 switching for the core layer instead Layer 2 switching. By reducing the scope of broadcast domains within a campus, a shorter diameter can be obtained for spanning-tree, which inherently reduces the risk and scope of Layer 2 loops.
1. Protecting the Cisco Catalyst 6500 Series Switches Against Denial-Of-Service Attacks
2. Configuring Control Plane Policing (CoPP)
3. Protecting Cisco Catalyst 6500 Series Switches Using Control Plane Policing, Hardware Rate Limiting, and Access-Control Lists
4. Configuring Traffic-Storm Control