CoreXL is enabled by default on all Check Point firewalls and was introduced for version R70 in 2009.
What is CoreXL? The ability of the firewall to utilize multiple cores simultaneously. When we use the term “core” here, for all intents and purposes it means a Central Processing Unit (CPU) manufactured by a company such as Intel or AMD. Over the years CPU manufacturers kept increasing the CPU clock rate (measured in Megahertz and Gigahertz) to boost performance, but eventually a thermal limit was reached. Instead of continuing to go “upwards” with the clock rate to improve performance, CPU manufacturers started going “outward” with multiple cores/CPUs clocked at slower but more sustainable speeds. Most of the CPU performance increases touted by the manufacturers over the last 10-15 years are a result of packing more and more cores into a single system.
CoreXL allows utilization of all available cores on the firewall simultaneously, thus increasing throughput and decreasing latency. Prior to the introduction of CoreXL, only two cores could be utilized; one for SecureXL/sim and one for the INSPECT module.
CoreXL Components and the “Split”
CoreXL allows the creation of multiple instances of the INSPECT module, which we will call “Firewall Workers”; Check Point also sometimes calls these “Firewall Instances”. Each Firewall Worker is essentially a separate “factory” (to borrow an analogy from the prior article) that can inspect and process traffic on its own dedicated core.
CoreXL also allows multiple instances of SecureXL to exist, these are called Secure Network Dispatchers (SND).
On firewall hardware with four or more cores, each individual core is assigned one (and only one) of these functions: either SND or Firewall Worker. So essentially all the available cores are “split” between SND and Firewall Worker functions. On R80.30 and earlier firewalls, this split is static and does not change unless the firewall administrator changes it; a reboot is required after changing the split to make it take effect. On R80.40 and later firewalls, the split can be dynamic and automatically adjusted by the firewall as traffic loads dictate; more on this later.
The following table (taken from sk98737: ATRG: CoreXL) shows the default CoreXL split for a system with 8 cores, which is 2 SND cores and 6 Firewall Workers:
CoreXL Static Split (R80.30 and earlier)
On every new firewall using R80.30 or earlier, there will be a static core split assigned by default. As an example on firewall hardware with 8 cores, by default there will be two cores dedicated to SNDs and six cores dedicated to Firewall Workers. There will be 2 SND instances on two cores and six Firewall Workers on the remaining six cores, called a “2/6 split”. This assignment is static and can only change if the firewall administrator manually changes it and reboots the firewall.
CoreXL Dynamic Split (R80.40+)
Starting in R80.40, the split of SNDs and Firewall Workers can be adjusted automatically in response to traffic loads through the firewall. This function is called “Dynamic Split” which is NOT enabled by default; this feature is further detailed in sk164155: Dynamic Split for CoreXL. Out of the box, an R80.40 system will employ a static split unless Dynamic Split is enabled by the administrator.
CoreXL Default Split Values
The following table (taken from sk98737: ATRG: CoreXL) shows what the default static splits will be “out of the box”; the exact split is dictated by the total number of available cores:
So for a system with 32 cores, the CoreXL split would be 4/28 (4 SND cores and 28 Firewall Workers).
Note that in R80.10 and later no more than 40 cores can be utilized by CoreXL, even if there are more than 40 cores available in the hardware. This limitation can be surpassed by the use of the feature User-Space Firewall (USFW) which is described in the next article.
CoreXL: Adjusting the Split
Under what circumstances might we want to adjust the default static split of SND vs. Firewall Workers? The most common adjustment that needs to be made is adding more cores as SNDs, thus reducing the number of Firewall Workers. Example: an 8-core firewall with a 2/6 split (2 SNDs, 6 Firewall Workers) is experiencing terrible performance and users are complaining. Using the Linux tool “top”, the individual CPU loads appear as follows, keep in mind that the lowest-numbered cores are typically assigned as SNDs, and the highest-numbered cores are assigned as Firewall Workers:
- Core 0 (SND): 99% utilized, 1% idle
- Core 1 (SND): 100% utilized, 0% idle
- Core 2 (FW Worker #5): 27% utilized, 73% idle
- Core 3 (FW Worker #4): 37% utilized, 63% idle
- Core 4 (FW Worker #3): 22% utilized, 78% idle
- Core 5 (FW Worker #2): 25% utilized, 75% idle
- Core 6 (FW Worker #1): 29% utilized, 71% idle
- Core 7 (FW Worker #0): 29% utilized, 71% idle
Clearly the 2 SND cores are totally overloaded, while the cores assigned as Firewall Workers are relatively idle. Keep in mind that SND and Firewall Worker cores have completely different functions and capabilities, and cannot “help” each other should one set of them become overloaded.
Currently there are six Firewall Workers, so using the cpconfig command we can reduce the number of Firewall Workers, thus allocating more SND cores. Therefore if we want to change from our current 2/6 split to a 4/4 split, using cpconfig we reduce the number of Firewall Workers from 6 to 4 and reboot the system. Once the system finishes rebooting, we once again measure CPU loads and see this:
- Core 0 (SND): 41% utilized, 59% idle
- Core 1 (SND): 35% utilized, 65% idle
- Core 2 (SND): 37% utilized, 63% idle
- Core 3 (SND): 40% utilized, 60% idle
- Core 4 (FW Worker #3): 44% utilized, 54% idle
- Core 5 (FW Worker #2): 50% utilized, 50% idle
- Core 6 (FW Worker #1): 58% utilized, 42% idle
- Core 7 (FW Worker #0): 58% utilized, 42% idle
Clearly CPU load is much more evenly distributed, and latency (and user complaints!) has decreased dramatically.
CoreXL: Core Type Responsibilities
In looking at our prior example, you might be wondering exactly why the SND cores were so overloaded yet the Firewall Worker cores were relatively idle. In the prior article we covered the multiple paths (or assembly “lines”) that traffic can take through the firewall:
- Accelerated Path (sometimes called “fastpath” or SXL) – 8 laborers
- F2V (Forward to Virtual Machine) Path – 12 laborers
- PSLXL Path (also called the “Medium Path” or “Passive Streaming”) - 20 laborers
- CPASXL Path (also called “Active Streaming”) – 35 laborers
- Firewall Path (also called “slowpath”, “non-accelerated”, or F2F) – 55 laborers
Of the five paths listed, only traffic in the Accelerated Path (first in the list) is handled primarily by an SND core; the remaining four paths are processed by a Firewall Worker. In our prior example, a very high percentage of traffic was being handled in the Accelerated Path, thus overloading the SND cores and requiring the addition of more of them.
SND cores also have another very important responsibility, a feature called the Dynamic Dispatcher which attempts to keep the load balanced among the Firewall Workers. The Dynamic Dispatcher is enabled by default in R80.10+ and you should never need to disable it. In addition SND cores receive packets from the Network Interface Cards (NICs) hardware, overloaded SND cores may cause packet loss and necessitate the use of “Multi-Queue” which is described in the next article.
About the author
Performance Optimization Series are written for you by Timothy Hall.