Part 2 - Check Point Firewall Performance Solutions

_Val_ · ‎2020-06-22

Introduction

In our first article “Basic Terms: General Networking Performance”, we covered the basic concepts related to performance. In this second article, those terms will be leveraged to describe the technologies added to Check Point firewalls over the years to address various performance limitations. As available Internet and network bandwidth increased, Check Point sought new technological solutions for optimal firewall performance.

From 1993-2003, new Check Point firewall software versions primarily focused on adding new features to complement the Stateful Inspection paradigm which was at the heart of a Check Point firewall. All traffic inspection was handled on the firewall by a single kernel driver known as the INSPECT driver (also sometimes called the “INSPECT engine”, or the “firewall kernel”). The single INSPECT driver handled practically all aspects of Check Point firewall inspection operations such as rule matching, packet inspection and Network Address Translation (NAT). The performance of a single INSPECT driver was almost always enough given the relatively limited amount of Internet bandwidth during this time period.

Interface Bonding (802.3ad)

Interface Bonding (802.3ad) – First introduced in 2000 as an industry standard (not devised by Check Point), and later enhanced by the Link Aggregation Control Protocol (LACP), 802.3ad allows aggregating multiple physical network interfaces into one logical interface for purposes of redundancy and increasing available interface bandwidth. This technique is used to avoid “Congestion Latency” that occurs when a single interface’s bandwidth becomes fully saturated. It also provides link redundancy in the event of a NIC hardware or configuration failure. Check Point gateways provide full support for 802.3ad/LACP which is fully compatible with other vendors such as Cisco and Juniper. Prior to being ratified as an industry standard, most networking vendors devised similar proprietary schemes to aggregate interfaces together; Cisco’s version of this was called “Cisco EtherChannel”.

802.3ad/LACP is provided directly by the Gaia Linux-based OS and as such is not technically part of the Check Point-authored firewall software such as the INSPECT driver. Interface Bonding is not enabled by default; it must be configured by the firewall administrator.

SecureXL

SecureXL – Introduced in 2004, SecureXL was a new kernel driver called SIM (which stands for SecureXL Implementation Module) that worked underneath the existing INSPECT driver. The SIM driver was a simplified, highly optimized subset of the INSPECT driver’s functions that provided three main performance advantages:

The ability to completely process certain types of traffic all in the optimized SIM driver itself, without ever needing to send it to the more overhead-intensive INSPECT driver at all. This capability is known as “Packet Acceleration” or “Throughput Acceleration” and is still used today on modern Check Point gateways. This helps reduce CPU load and latency, thus increasing throughput through the firewall.
The ability to “cache” certain rule-base lookup results and apply them to future new connections, thus avoiding a computationally-expensive rulebase lookup operation in the INSPECT driver. This capability is known as “Session Rate Acceleration” or “Accept Templates” and is still used today on modern Check Point gateways. This helps reduce CPU load on the firewall and reduces jitter as well.
If present on the system, offload traffic processing to a dedicated hardware accelerator card thus saving valuable main CPU overhead that would otherwise be expended by SIM or INSPECT. This hardware offload feature is rarely used today on Check Point security gateways. When utilized this helps reduce CPU load and latency, thus increasing overall throughput through the firewall.

SecureXL is enabled by default on all Check Point firewalls. Up through and including version R80.10 released in 2017, SecureXL only received relatively minor capability and feature updates. However, with the release of version R80.20 in 2018, SecureXL was heavily overhauled with many longstanding limitations removed. This seismic event dramatically increased the performance advantages provided by SecureXL.

CoreXL

CoreXL – Introduced in 2009 with version R70, CoreXL allows more than one INSPECT driver to exist at the same time on a single Check Point security gateway. This capability allows Check Point gateways to utilize multiple CPUs (“cores”) simultaneously to dramatically increase throughput through the firewall. CoreXL helps avoid packet latency caused by “Processing Latency” discussed in the prior chapter. Each separate INSPECT driver is usually executing on its own dedicated core and is referred to as a “kernel instance” or “firewall worker”. CoreXL is enabled by default on all Check Point firewalls with at least 2 physical cores present.

A secondary benefit of CoreXL is the ability to have more than one instance of SecureXL running simultaneously on separate cores as well. Under CoreXL, each separate instance of SecureXL is referred to as a “Secure Network Dispatcher” or SND. On firewalls with at least four cores, each individual core is dedicated to one and only one of these functions, this division of operations is frequently referred to as the CoreXL “split”. As as example on an 8-core firewall, by default there will be two cores dedicated to SND functions and six cores dedicated for INSPECT operations (a 2/6 “split”).

Multi-Queue

Multi-Queue – Fully-integrated to the Check Point software for version R76 released in 2013 (but available as a hotfix earlier), this feature became necessary with the introduction of 10Gbps and faster interfaces. On a Check Point gateway, the cores dedicated to SND functions have one other critical responsibility not mentioned earlier: taking Ethernet frames that have arrived at a NIC port and passing them up to the SIM driver for handling; this process is called “Soft Interrupt Request” (SoftIRQ). There is a memory storage area known as a “ring buffer” that is used to store incoming frames that have arrived from a NIC but not yet been processed by SoftIRQ and handed up to SIM. If frames are arriving from the NIC at a high rate of speed, it is possible that the ring buffer will completely fill up with frames before SoftIRQ can “empty” any frames from the buffer. Any new frames that arrive while the ring buffer is completely full will be discarded as they have nowhere to go, this frame loss is referred to as a “buffering miss” or “receive drop” (RX-DRP). Note that the use of the word “drop” in this context is not referring to a drop action in a security policy or some kind of enforcement action, it is just an overlap in terminology.

Prior to the advent of Multi-Queue, only one SND core could empty the ring buffer of a particular NIC port. So even if there were eight SND cores allocated but only two very busy 10Gbps+ interfaces present, only two of the SND cores could actually be used to empty the network interface ring buffers (one for each interface); the other six SND cores were unable to help at all and just sat uselessly idle.

When Multi-Queue is enabled for an interface, all SND cores are now eligible to empty the ring buffer of that interface; Multi-Queue is essentially calling in “reinforcements” (all the other SND cores) for help to ensure the interface’s ring buffer is emptied fast enough to avoid “buffering misses” and “receive drops”. This helps avoid packet loss caused by the “Buffering Miss” issue introduced in an earlier chapter. On Check Point firewalls utilizing the older 2.6.18 Gaia kernel (usually version R80.30 and earlier), Multi-Queue must be enabled manually by the firewall administrator on busy interfaces as needed, and Multi-Queue cannot be enabled on more than 5 interfaces simultaneously due to an OS limitation. On Check Point firewalls using the newer Gaia 3.10 kernel (R80.40+), Multi-Queue is enabled on all interfaces by default except for the defined “management interface” of the firewall.

Symmetric Multithreading

Symmetric Multithreading (SMT – also sometimes called “Hyperthreading”) - An Intel-based processor extension that doubles the number of available processing cores by creating two logical threads of execution on each physical core. This increases the amount of CPU power available and helps avoid “Processing Latency”. Introduced to various server-class Intel processor architectures starting in 2002, Check Point announced formal support for SMT on its firewall appliances commencing with the R77 software release in 2014. Initially called “HyperSPECT” upon release, the feature is now just called “SMT” by Check Point. SMT is enabled by default on all Check Point appliances whose underlying Intel processors support it, and on the vast majority of firewalls increases overall CPU performance by approximately 30%. A Check Point firewall appliance that has SMT enabled appears to have twice as many cores available as there are physical cores in the hardware. For example, if SMT is enabled on a firewall with 8 physical cores, 16 physical cores appear to be present; the default CoreXL “split” in this case would be 2/14.

Support: Check Point Appliances vs. Open Hardware/Servers - Note that except for SMT, all the features mentioned in this chapter are supported on both Check Point-branded firewall appliances, and so-called “open hardware” or “open servers”. These latter two terms refer to the use of commodity Intel-based hardware (Dell, HP, Lenovo, etc.) instead of Check Point- branded appliances for the security gateway.

About the author

Performance Optimization Series are written for you by Timothy Hall.

Timothy has continuously worked with Check Point products starting in 1997, been an instructor for official Check Point training classes since 2004, and is the author of the book "Max Power 2020: Check Point Firewall Performance Optimization".