Introduction
In our Check Point for Beginners series, we cover most of the basic topics related to all aspects of working with various Check Point security offerings and products. Some of these topics are more challenging than others; performance-related concerns definitely fall into this category.
How do we identify performance issues? What is the terminology related to performance? How to locate a specific performance-related issue? What tools are available to help us in this task?
We intend to answer all of these questions in our new CP4B series covering Performance Best Practices.
Terminology
The first step is to define some terms we will be using when talking about performance.
Usually, when someone is complaining about performance degradation, the word “slow” is used. Internet is slow. Web application is slow. Mail is slow. Yet, this “slowness” needs to be described in well-defined technical terms instead of vague pronouncements. The following terms need to be used and quantified: bandwidth, throughput, and latency. Two other terms that are essential are covered later: packet loss and jitter.
So, how do we define these terms? What to they mean and when do we use them to further quantify “slowness”?
Performance Terms
Bandwidth – The maximum number of bits per second that can be transmitted and received over a medium. Most modern networks are full-duplex which means that both sides of the link can send and receive simultaneously. Full-duplex communication became common with the introduction of network switches. Network hubs predated switches and required half-duplex communication, where only one station could send traffic at a time. If more than one station attached to a hub tried to send at the exact same moment, a collision would occur. Collisions do not occur on a full-duplex network. On a firewall, the amount of bandwidth directly available is dictated by the link speed of the network interfaces on the firewall. Note however that while a firewall’s external interface facing the Internet might have a link speed of 10Gbps, the available upstream bandwidth to the Internet may well be lower, such as 1Gbps or 2Gbps.
Bandwidth (LANs) – Typically the transmit bandwidth and receive bandwidth are symmetrical, which means that both transmit and receive operate at the same speed. 1 Gigabit and 10 Gigabit speeds (or higher) for an organization’s LAN are typical.
Bandwidth (Internet) – Connections to and from the Internet can also have symmetrical bandwidth, but consumer-grade Internet connectivity will tend to have asymmetric bandwidth. The download/receive speed is always faster than the upload/transmit speed in this case. Examples: Digital Subscriber Lines (DSL) and Cable Modems.
Throughput – The actual amount of data that can sent and received through a network. The observed throughput will always be lower than the total bandwidth due to Ethernet Frame, IP Packet, and TCP segment overhead required to send data through the network. As an example, to send 1460 bytes of data through a network, an additional 66 bytes of frame/packet/segment overhead is required, making the final frame size transmitted 1526 bytes. To send a single byte of data all by itself, approximately 66 bytes of overhead are still required. Note that the amount of processing overhead for a packet containing only 1 data byte vs 1500 bytes of data is not as varied as you might suspect, with only a minimal difference. On a firewall, total available throughput depends on a number of factors including:
- Overall core/CPU utilization
- Individual core/CPU utilization
- Rate of new connections per second
- Total number of Packets per Second (PPS)
- What features or “blades” are enabled on the firewall and how they are configured. Sample blades include Application Control, URL Filtering & the various Threat Prevention blades.
- Whether traffic needs to be encrypted or decrypted by the firewall with IPSec and/or TLS.
Latency – The round-trip delay incurred by the intervening network links and devices between two endpoints that are communicating. This is usually expressed in milliseconds (1/1000 of a second) and can be measured by utilities such as ping. Low latency is desirable while high latency can reduce throughput for some network protocols. Typical latency in a single VLAN/subnet is usually < 5ms, intra-VLAN traffic in an organization can normally be up to 30 ms, and Internet communication will typically incur a latency of at least 30-100+ms. Increased latency for traffic passing through a network or firewall can be caused by many factors, among them:
- Congestion Latency – an intervening network between the endpoints is fully utilizing all its bandwidth, and packets must wait their turn in a firewall queue to be transmitted. Check Point firewalls usually employ First-In First-Out (FIFO) for transmission of packets.
- Processing Latency – even if a packet is successfully queued on the firewall for transmission, the firewalls’s CPU may be so busy that packets are not transmitted promptly even when network bandwidth is available
- Quality of Service (QoS) or Traffic Shaping – the firewall is enforcing traffic prioritization, and some packets are deliberately delayed to make way for higher-priority traffic
Throughput vs. Latency: the best way to establish the difference between Throughput and Latency is to look at a real-world example: Internet connectivity provided by satellites orbiting the earth. Typically employed in rural areas where terrestrial wired Internet is not otherwise available, satellite providers such as HughesNet advertise impressive-sounding bandwidth capabilities such as 25Mbps download and 3Mbps upload Internet speeds via satellite. However the minimum roundtrip delay incurred by using a satellite link is 500ms! This value may be even higher depending on congestion in other parts of the terrestrial network. While the bandwidth may be plenty for watching something like Netflix which is essentially one-way video, it will be an absolute disaster trying to play any kind of real-time online game that requires fast reflexes such as a First-Person Shooter (FPS). A 500ms (½ second) delay during an audio or video conference will be noticeable but is probably workable. Any application that sends data, then must wait for some kind of acknowledgment before sending any more data will seem slow and unresponsive. So the bottom line is this: even with seemingly large amounts of bandwidth, high latency can cripple usable throughput for certain applications.
Packet Loss – Loss of packets that occurs in transit; if using a protocol such as TCP the packets will need to be retransmitted after a brief delay, which substantially reduces throughput. Packet loss can be caused by:
- Packet Corruption – Packets may be damaged in transit by faulty cabling, electromagnetic interference, or other physical network problems. Examples of corruption include Cyclic Redundancy Check (CRC) errors and framing errors.
- Buffering Miss on a router – A packet arrives at an intermediate router, but there is no space in the input queue to store the packet; the packet is lost. This is called a Receive Drop (RX-DRP) on a Check Point firewall.
- Quality of Service (QoS) or Traffic Shaping – the firewall is enforcing bandwidth consumption limits, and packets are deliberately dropped (instead of just delayed) to make the sending system slow down.
- Traffic was explicitly dropped by a firewall Access Control policy or Threat Prevention protection.
- Traffic was routed incorrectly, or no route was available at all.
Jitter – Variances in round-trip delay between communicating systems. Even with high latency, as long as it is consistent, even delay-sensitive connections such as voice and video can adapt to the latency and continue to work acceptably. The congestion control algorithm utilized by TCP can also readily adapt to a high latency situation, but only if latency is fairly consistent. Wild swings in inter-packet delay (jitter) will most noticeably cause issues with voice and video connections, which will skip or freeze. Even if a large amount of bandwidth is available, high jitter will substantially lower the usable throughput across the network. Causes of jitter include:
- Different paths being taken by traffic through the network – some paths congested and some not
- Network path being shared with “bursty” traffic that intermittently consumes large amount of bandwidth
- High CPU load on a firewall or router
The TCP Congestion Control Algorithm – Most traffic on the Internet today uses TCP.
TCP ensures reliable delivery of data between communicating systems, and also guarantees that data successfully reaches its destination in the order that it was originally sent. When being used for a connection that has a large amount of data to send, TCP will slowly send data at a faster and faster rate until packet loss (or high jitter) is encountered. Once packet loss or high jitter is encountered, TCP will “back off” its send rate slightly trying to find the ideal send rate that allows data to be sent as fast as possible, but avoid congesting the network. While intervening devices such as routers and firewalls are simply passing IP packets containing TCP through them, it is still important to understand how TCP attempts to consume network bandwidth through a firewall. TCP-oriented traffic encountering high levels of jitter in the network may even retransmit packets inappropriately, wasting bandwidth.
Our mission
Firewalls are a big part of our networking environment. When optimizing network performance on firewalls, our goal is to ensure that our firewall provides low jitter, high throughput, and low round trip times for packets while still inspecting those packets for security threats.
About the author
Performance Optimization Series are written for you by Timothy Hall.