Introduction
SecureXL is enabled by default on all Check Point firewalls. Up through and including version R80.10 released in 2017, SecureXL only received relatively minor capability and feature updates. However with the release of version R80.20 in 2018, SecureXL was heavily overhauled with many longstanding limitations removed. This document assumes your firewall is using at least version R80.20 as a result.
What is SecureXL? The ability for the firewall to process traffic more efficiently using less CPU thus decreasing latency and increasing throughput. Prior to the advent of SecureXL in the early 2000’s, all traffic passing through the firewall was sent through the full INSPECT engine for handling; this is known as the F2F (Forward to Firewall) Path or is sometimes referred to as the “slowpath”. In addition, a matching rule had to be found in the Access Control rulebase for every new connection that attempted to start through the firewall.
Introduced in the early 2000’s, SecureXL was a new kernel driver called SIM (which stands for SecureXL Implementation Module) that works underneath the existing INSPECT engine. The SIM driver is a simplified, highly optimized subset of the INSPECT engine’s functions. It should be emphasized that even if traffic is able to be “accelerated” and fully handled by the SIM driver, it in no way reduces the level of security enforced.
SecureXL Packet/Throughput Acceleration
Imagine that the firewall is an assembly-line based factory. Packets try get from one side of the factory (the entrance) to the other side (the exit). At each stop along the assembly line a laborer must look at the packet and decide whether it needs to discarded because it is defective, and if it isn’t defective the laborer might need to make slight modifications to the packet in some cases, and then pass it on to the next laborer on the line.
When SecureXL is enabled, there are five different “assembly lines” of varying length through the firewall, with a different number of “laborers” along each line.
- Accelerated Path (sometimes called “fastpath” or SXL) – 8 laborers
- F2V (Forward to Virtual Machine) Path – 12 laborers
- PSLXL Path (also called the “Medium Path” or “Passive Streaming”) - 20 laborers
- CPASXL Path (also called “Active Streaming”) – 35 laborers
- Firewall Path (also called “slowpath”, “non-accelerated”, or F2F) – 55 laborers
When SecureXL is disabled (or is not present), all packets must go through the Firewall Path (F2F) which is the longest and least efficient path. In this case a packet that really only needs 20 laborers to handle it for proper inspection must still go through all 55 of the laborers in the F2F line anyway. Even though 35 of the irrelevant laborers on the F2F line merely glance at this packet and simply pass it on to the next laborer without needing to do anything or even take a closer look, even that quick “glance” still takes time and resources, and increases latency. Wouldn’t it have been more efficient to process the packet on a line with only the minimum number of laborers needed to inspect that packet (20)?
Ideally, we’d like to see as much traffic as possible being handled in the more efficient paths with the fewest number of laborers such as SXL (8) and F2V (12), and avoid the paths with the most laborers such as CPASXL (35) and F2F (55). On a real firewall what determines how much traffic is handled in each of these five paths? There are two main factors:
- How many firewall features or “blades” are enabled. Examples of “blades” would be Application Control, URL filtering, IPS, Anti-virus, etc.
- How the enabled blades are configured
Generally speaking the more blades that are enabled on the firewall, the lesser amount of traffic will be handled in the most efficient paths/lines such as SXL and F2V. While which blades are enabled is usually dictated by an organization’s security policy and even regulatory requirements, the specific configuration of the various blades opens many possible avenues for performance tuning. The authoritative work in this area is the book “Max Power 2020: Check Point Firewall Performance Optimization” by Timothy C. Hall.
The expert mode command fwaccel stats -s can be used to view the percentages of traffic in each of the paths (“lines”) on a live firewall. This command is safe to run during production. It is not unusual to see at least 50% of the total traffic on a real-world firewall be processed in the PSLXL Path.
SecureXL Session Rate Acceleration/Accept Templates
Whenever a packet arrives at the entrance to the factory, the firewall looks at the packet’s attributes to see if it corresponds to a known approved connection that was started earlier and accepted. If it does, the packet is allowed to proceed into the factory for further inspection (where it might still be dropped by one of the laborers eventually). An area of memory on the firewall known as the “Connections State Table” tracks all the accepted connections currently passing through firewall.
But what if the packet isn’t associated with an existing connection stored in the Connections State Table? The most likely case is that this packet is starting a new stream of packets that will be classified as a new, separate connection. This first packet must be sent through the longest path/line (F2F) to find a matching rule in the Access Control rule base; the matching rule will either accept or drop the packet. If the packet is accepted by a matching rule, it continues through the factory for further processing, and a new entry is created in the Connections State Table referencing this new connection. If the packet matches a rule that is configured to drop the packet, it is discarded and not allowed to proceed through the factory; in this case no update is made to the Connections State Table.
Note that an Access Control rule base lookup in the F2F path is one of the most computationally expensive operations that a firewall has to perform; we would like to try to avoid this large amount of overhead whenever we can to save CPU processing time and decrease latency. The “Accept Templates Table” discussed in the next paragraph is part of a mechanism to essentially “cache” prior rulebase lookups, and avoid the overhead of a full rule base lookup in the F2F path.
When a packet is accepted by a rule base lookup in the F2F path as discussed earlier, SIM (SecureXL) makes a few notes about the attributes of that first packet that is starting a new connection. The “notes”
taken are stored by SIM/SecureXL in an area of memory known as the “Accept Template Table”. The attributes noted would be the following:
- Source IP address
- Destination IP address
- Source Port Number
- Destination Port Number
- IP Protocol Number
Now suppose another packet arrives at the factory entrance. The firewall looks in the Connections State Table and determines that this packet is not associated with an existing approved connection. Normally the packet would be sent to the F2F path for a rule base lookup (thus consuming CPU time and increasing latency), but before doing so the Accept Template Table is consulted. Does this packet have similar attributes to a prior packet that was accepted? The firewall looks at the attributes of prior connections that were allowed by a F2F rule base lookup in the Accept Template Table. If most attributes of this new packet match an entry in the Accept Template Table, the connection is immediately accepted and a full-fledged rule base lookup in the F2F path was avoided, thus reducing latency and CPU overhead. The Connection State Table is then updated with the new connection’s attributes that was matched by an Accept Template.
But what if there is no template matching this packet in the Accept Template Table? The packet is sent to the F2F path for a full Access Control rule base lookup.
Throughput Acceleration vs. Accept Templates
While Packet/Throughput Acceleration and Session Rate Acceleration/Accept Templates are both considered part of SecureXL, it is important to understand that they are two separate functions with different goals, and different applicable techniques for performance tuning. While the purpose of Packet/Throughput Acceleration is to reduce CPU overhead and latency as each packet is processed by sending packets to the most efficient path (“factory line”), Session Rate Acceleration/Accept Templates seeks to avoid whenever possible one of the most computationally expensive operations undertaken by a Check Point firewall: finding a matching rule for a new connection in the Access Control policy.
The expert mode commands fwaccel stats -s and fwaccel stat can be used to view the percentages of traffic that are matching templates, and how much of the Access Control rulebase can potentially have its traffic templated. These commands are safe to run during production.
Multi-CPU Capability
The introduction of a separate SIM driver to handle some traffic alongside the INSPECT engine allowed more than one CPU to be simultaneously utilized by the firewall (assuming multiple CPUs were present), thus reducing latency while improving throughput. However more than 2 CPUs could not be actively used on the firewall with this arrangement, so if there were 3 or more CPUs available they could not be utilized to further increase firewall throughput. This was due to the fact that only one instance of the INSPECT engine and one instance of the SIM driver could exist on a single firewall.
In the next chapter we will cover CoreXL, which allows multiple instances of SIM and INSPECT on the firewall to potentially use as many CPUs (“cores”) as are available in the hardware to dramatically reduce latency and increase firewall throughput. CoreXL essentially allows multiple “factories” to run simultaneously on the firewall; the more physical cores are available in the firewall hardware, the more “factories” that can be created on the firewall.
About the author
Performance Optimization Series are written for you by Timothy Hall.