Case Study: IPSec VPN Performance



  Peter J. Welcher and Israel Engle
 
   


 

Introduction

The last two articles looked at EAP and 802.1x, and at the issue of verifying user identity before granting any access to the network. I had hoped to show you how to get user login-driven dynamic VLAN's working, for wired switches and for wireless access points, but ran out of room. We'll save that and NAC for when I have  more time to work on the article and do those topics justice. 

In the mean time, I've been having fun working hands-on with various customers. At one.collegiate site we tested and built initial configuration templates. Staff and students are now busily deploying switches, with WAP's to follow. At another, we've been testing IPSec VPN throughput in routers, to reduce risks in a deployment currently underway. The main concern was verifying throughput. My host and lab partner, Israel, agreed to serve as co-author this article. Since I'm doing the writeup, any errors are strictly mine.

About the Network Design

Israel has built a large Cisco wireless bridged infrastructure in Dorchester County, Maryland, benefitting from ownership of towers and flat terrain. There have been some adventures along the way, but you'll have to talk to him about that. Israel is a reference account for Cisco wireless bridges.

The current project starts with trying to secure traffic across the wired and wireless core infrastructure. Let's simplify the story by saying there are (or will be) two groups using the shared infrastructure, schools and everybody else. The plan is to run IPSec across the core, schools to Board of Education HQ, and county to County HQ.

The main wireless network uses 802.11A for speed. The links to schools and buildings use 802.11B. We are figuring maximum throughput on 802.11B at about 6 Mbps total at best, due to overhead. See also the following links:

http://www.cisco.com/warp/public/779/smbiz/community/ent_select_right_wireless.html

http://keenansystems.com/store/80211b_throughput_test_nl-2511-cd-plus_nl-5354cb_orinoco.htm

There are about 10-12 sites going back to each HQ site. The topology is classic hub and spoke. So that means 60-72 Mbps at the central locations. Most traffic goes to servers at the central location(s), or to the Internet, so we need not be concerned about site-to-site traffic, which is very low.

This design is an example of what I'm calling "compartmentalized networking", for lack of a better term.  This occurs whenever you have distinct user populations that have to be kept separate, usually for security reasons. These groups use a common less-trusted network. If you think of hallways and badge card readers that let you into some rooms and not others, that's the present way of doing this.  You use access lists (ACL's) to control the exits from the shared network. As long as you're thorough, that works.

I believe there is a design trade-off to be made in general situations like  this. IPSec is useful for Internet access, moderate numbers of users doing WLAN access, etc. The limiting factor with IPSec is throughput. You still end up paying for the hardware to do accelerated encryption, and while the prices keep dropping, it still isn't cheap. For high-speed throughput  in a compartmentalized network, there's MPLS VPN. That has the drawback of requiring bigger routers or switches (6500 + 720 engine), and also of technical complexity. So far, the shops that need the performance seem to be the ones with the big boxes to support  MPLS anyway. An intermediate ground is GRE tunnels. You'll be seeing more about automatic mGRE tunnels and the Wireless LAN Services Module shortly. 

Where this leaves us with the present design is on slightly treacherous ground. MPLS VPN's are pretty clearly out: too costly, too complex. We could perhaps have used GRE tunnels but desired higher security and better confidentiality.IPSec devices are usually sized to WAN links. IPSec throughput is currently more suited to remote access users than to wired networks.The approximate speeds shown above are on the low end of LAN speeds. Please do not attempt this with larger numbers of users or higher speeds!

The design is site-to-site VPN. So VPN Concentrators are not appropriate. We then looked at the various PIX and router models, to determine how to deploy the IPSec.

Factors Loading an  IPSec Router

There are a wide variety of things that can trip you up when you're  trying to figure out reliable IPSec performance.  See the SRND link below for a good flow chart relating to IPSec network planning.

Our testing tool was a Fluke hand-held tester, the Fluke OneTouch with the Internet Throughput Option (ITO). It can be set up to send ICMP ping packets to/from a specifed address. The ICMP packets get replied to (mostly), so it is a two-way flow. That effectively doubles the encryption load on the router. Another way to think of this is that you pay to encrypt or decrypt, i.e. you have to add what goes in to what goes out to figure throughput in terms of impact on the router or PIX with accelerator module. When considering measurements, you need to know if the streams were one-way or two-way. In our design, the WLAN links are half duplex, so all the throughput figures represent the maximum total throughput, i.e. they do not need to be doubled due to full duplex links.

So what are the factors to consider?

  • Total bits per second, as noted above.
  • Size of the average packet. The more packets per second, the more separate calls to the acceleration hardware there  probably are. Our testing confirmed that 64B packets are the worst case for a given number of Mbps. Real-world traffic rarely consists solely of 64B packets.
  • Number of GRE tunnels.  There's both tunnel and routing protocol overhead to consider at the head end, which has a lot of tunnels coming into it. See the SRND link below for some more information along these lines, e.g. the need to do an IPSec SPI lookup before decoding.
  • Fragmentation. Large packets may need fragmentation. If this is done before encryption, you get good performance. If the encrypted packet is fragmented, the fragments need to be reassembled before decryption. Ths is process switched and burdens the CPU. We chose to use GRE MTU so that any fragmentation occurs before the encryption.

Published Specifications

There are several sets of numbers floating around as to Cisco IPSec performance. I happen to like the slightly aged document, the IPSec site-to-site VPN SRND,
http://www.cisco.com/application/pdf/en/us/guest/netsol/ns142/c649/ccmigration_09186a00801e12ca.pdf. I note that "Data Only" seems to have been added to the title, presumably because any-to-any connectivity is preferable for VoIP. For the latest Dynamic Multipoint VPN design info, see also http://www.cisco.com/warp/public/732/Tech/security/ipsec/docs/dmvpntechnical.pdf.

Please be aware that IPSec performance can be very design and configuration dependent.  So please don't take any numbers you see here or below and extrapolate. You really need to check performance in your environment with your configuration and features.

We (I) had decided to adapt the GRE approach in the above SRND. I like it because its simple, and because there's a reasonable way to scale up performance if the central site device becomes a bottleneck. In general we prefer routers for applications like this. They have more  flexibility and options, and if the design changes, you can generally use routers in other ways. PIX are great, especially in classic firewalling situations. But it may be a bit harder to re-purpose a pile of PIX than a pile of routers.

I've reproduced some key performance numbers from the SRND, below. Note that some of these were tested  by Cisco using older hardware, and the newer hardware performs better. In general this is why Cisco has become rather uptight about publishing performance specs: people read only ones and think performance is uncompetitive, or people don't take time to understand the assumptions, haul off and buy the hardware, then discover it doesn't perform as needed. Perhaps we should sum that up as YMMV -- Your Mileage May Vary.

The following figures are for around 50-55% CPU utilization with 240 tunnels coming into a head end. A good bit of the CPU loading comes from the number of tunnels, not the IPSec traffic. The traffic mix including small VoIP packets, making the figures mildly conservative. There was no fragmentation occurring. The document does not appear to indicate whether the flows were one-way or two-way, i.e. what the Mbps represents (aggregate traffic or one way).

Head-End Model
Mbps
CPU%
Cisco 7200 VXR with NPE-400, SA-ISA hardware accelerator
40
50-55
7200 w/ NPEG1, SA-VAM
45
80
3745, AIM-HPII [43 tunnels]
11
50
PIX 535, VAC Plus
68
50

And the figures for "branch models":

Branch Model
Mbps
CPU%
3660,  AIM-HP
16
60-65
3620, AIM-MP
1.8
60-65
2691, AIM-EPII
4.9
26
2651, AIM-BP
2.8
60-65
2621, AIM-BP
2.4
60-65
2611, AIM-BP
2.0
60-65
1750, VPN Module
2.6
60-65
3745, AIM-HPII
16.5
32
3725, AIM-EPII
16.7
60
831
1.2
85

From another reliable source, we heard that the 1711 can do about  4 Mbps of 256 B packets at full CPU. For other devices tested and more complete results, see the URL above.

By the way, we also have a couple of things "up our sleeve" if we need more performance:

  • Some of the traffic, perhaps 50%, is Internet traffic. There's really no reason to encrypt this traffic.
  • The Internet traffic is going to be throttled by the size of the Internet link(s), and will be mostly from the Internet towards users. Self-limiting!
  • We're running 10 to 12 not 240 tunnels.

Our Test Results

We did a number of measurements and examination of the routers, initially for our own understanding. Here are sample measurements for 1711 and 3725 model routers. We plan on actually using a 7200 model for the HQ central site router.

Packet size Packets/Second Mbps Percent CPU Percent CPU



1711 3725
1518 100 1.22 23 3
1518 200 2.44 50 7
1518 300 3.66 76 11
1518 400 4.88 100 14







Kbps 1711 3725
64 100 57.6 18 2
64 200 115 32 4
64 300 173 46 7
64 400 230 63 9
64 800 460 99 16

Here's the graphs of that data:

CPU Percent, 1518B packets

CPU Percent, 64B Packets

We used the following commands to watch things. It takes quite a while for five-minute averages to show the full loading. CPU load stabilizes before that point. The command "show proc cpu | inc five" was useful for just getting CPU load. The one minute average is what's reported above. The command "show int tunn 0 | inc Kbit" was useful for checking what the router thought the throughput was. The command "show crypto engine accel status" and variants were mildly interesting for sanity checking and  so on.

My co-author Israel gathered some data for a folloup experiment, varying the packet size while holding the data rate constant. One would expect the CPU load to go up as the packet size drops. Having said that, one expects the CPU to depend on some mix of the Mbps rate (constant) and the packet/second rate (non-linear). The following charts show this, varying the frame size with approximately constant bit rate. No, I haven't attempted to fit a formula to the data.

CPU Load at 500 Mbps

CPU Load at Approx 3.5 Mbps


Missing Packets

Overall, we were surprised at the ability of the routers to take a lot of traffic and keep working, even under extreme CPU loading.  Not that we recommend doing so. It might be a good idea to think about rate limiting to protect stability and keep CPU below say 80 or  90 percent.

We did crank up the throughput to where we figured the 1711 had to be overwhelmed, either the CPU or the IPSec hardware accelerator. We started looking for where the packets were being  dropped. It turns out that the drops showed up as input drops on the GRE tunnel and on the VLAN interface. Upon consideration, that made sense. Normal fast/CEF switching was receiving the packets and trying to queue them. If either the CPU or accelerator card was having trouble keeping up, the input queue would gradually fill. When it became too full, a drop would occur.

With the higher data rates and 1711 CPU up around 95-100%, we  did notice occasional (3-5 minute?) momentary drops of the EIGRP neighbor relationship. This might have been due to CPU too busy to send EIGRP hellos, but may also have been due to delayed receipt due to full input queue.

Summary

We hope this excursion into a real world setting was interesting to you.

IPSec can be useful in many settings.  But you do have to know how  to scale it up. And you do need to do your homework as far as throughput.

Who knows what next month's article will bring? But you can influence it by sending me email with your suggestions for topic.



Dr. Peter J. Welcher (CCIE #1773, CCSI #94014, CCIP) is a Senior Consultant with Chesapeake NetCraftsmen. NetCraftsmen is a high-end consulting firm and Cisco Premier Partner dedicated to quality consulting and knowledge transfer. NetCraftsmen has eight CCIE's, with expertise including large network high-availability routing/switching and design, VoIP, QoS, MPLS, IPSec VPN, wireless LAN and bridging,  network management, security, IP multicast, and other areas. See http://www.netcraftsmen.net for more information about NetCraftsmen. Pete's links start at http://www.netcraftsmen.net/welcher . New articles will be posted under the Articles link. Questions, suggestions for articles, etc. can be sent to pjw <at> netcraftsmen <dot> net.

Israel Engle is the IT Director for the Technology Department of Dorchester County, Maryland. He enjoys forming a technology vision and budget, communicating that to the decision-makers, then implementing it. His current deployment is used by Cisco as a model reference for their wireless bridge product line. Israel has an MCSE and knows Cisco wireless technology inside-out. He is rapidly learning more about Cisco routers and security services. If you spot a car cruising around with a mobile router in it, that may be him, testing for next year's project.

6/15/2004

Copyright (C)  2004  Peter J. Welcher