Now I have developed a tool that automatically check the most of the points in this article.
Use this tool to show quickly an overview of status information of all your gateways with only one CLI command "eview".
It shows the most important performance relevant information of all gateways, which are briefly summarized in this article:
Easy View Tool - View System Info for All Gateways Simultaneously
SecureXL is a software acceleration product installed on Security Gateways. SecureXL network acceleration techniques deliver wire-speed performance for Security Gateways. Performance Pack uses SecureXL technology and other innovative network acceleration techniques to deliver wire-speed performance for Security Gateways. The SecureXL device minimizes the connections that are processed by the INSPECT driver. SecureXL accelerates connections on two ways.
SecureXL is implemented either in software or in hardware:
- SAM cards on Check Point 21000 appliances
- Falcon cards (new in R80.20) on different appliances
Tuning Tip: From R80.20 SecureXL is always enabled and can no longer be disabled completely.
|Tip 2 - SecureXL Connection Templates
Feature that accelerates the speed, at which a connection is established by matching a new connection to a set of attributes. When a new connection matches the Connection Template (old name "Accept Template") , subsequent connections are established without performing a rule match and therefore are accelerated. Connection Templates are generated from active connections according to policy rules.
Tuning Tip: Accept Templates are enabled by default.
|Tip 3 - SecureXL NAT Templates
Using SecureXL Templates for NAT traffic is critical to achieve high session rate for NAT. SecureXL Templates are supported for Static NAT and Hide NAT using the existing SecureXL Templates mechanism.
Tuning Tip: Enable NAT Templates depending on the situation.
|Tip 4 - SecureXL Drop Templates
Optimized Drops feature in R76 and above. Heavy load of traffic that should be dropped causes an increase in the Security Gateway's resource consumption. SecureXL Drop Templates are not created, although this option was checked in SmartDashboard.
Tuning Tip: Enable Drop Templates depending on the situation
|Tip 5 - SecureXL Fast Acceleration
The Fast Acceleration (picture 1 green) feature lets you define trusted connections to allow bypassing deep packet inspection on R80.20 JHF103 and above gateways. This feature significantly improves throughput for these trusted high volume connections and reduces CPU consumption.
The CLI of the gateway can be used to create rules that allow you to bypass the SecureXL PSLXL path to route all connections through the fast path.
Tuning Tip: Use this function to exclude IP's or networks from deep inspection.
|Tip 6 - SecureXL Penalty Box
The SecureXL penalty box is a mechanism that performs an early drop of packets arriving from suspected sources. This mechanism is supported starting in R75.40VS.
The purpose of this feature is to allow the Security Gateway to cope better under high load, possibly caused by a DoS/DDoS attack.
A client that sends packets that are dropped by the firewall rulebase or performs violations of the IPS policy is reported to this mechanism. If a client is reported frequently, it would be put in a penalty box. Any packet arriving from this IP address would be dropped by the performance pack at a very early stage.
Tuning Tip: Use the SecureXL penalty box if you have DDoS attacks
Association of a particular network interface with a CPU core (either 'Automatic' (default), or 'Static' / 'Manual'). Interfaces are bound to CPU cores via SMP IRQ affinity setting. SIM Affinity in Automatic mode may make poor decisions on multi-core platforms. In addition, some multi-core hardware platforms suffer from an inability to assign IRQs to use all the CPU cores efficiently.
Tuning Tip: In special cases the SIM affinity should be set manually.
sk61962 - SMP IRQ Affinity on Check Point Security Gateway
sk33250 - Automatic SIM Affinity on Multi-Core CPU Systems
Performance Tuning R80.30 Administration Guide – Affinity Settings
CoreXL is a performance-enhancing technology for Security Gateways on multi-CPU-core processing platforms. CoreXL enhances Security Gateway performance by enabling the processing CPU cores to concurrently perform multiple tasks.
CoreXL provides almost linear scalability of performance, according to the number of processing CPU cores on a single machine. The increase in performance is achieved without requiring any changes to management or to network topology.
On a Security Gateway with CoreXL enabled, the Firewall kernel is replicated multiple times. Each replicated copy, or FW instance, runs on one processing CPU core. These FW instances handle traffic concurrently, and each FW instance is a complete and independent FW inspection kernel. When CoreXL is enabled, all the FW kernel instances in the Security Gateway process traffic through the same interfaces and apply the same security policy.
|Tip 9 - CoreXL - Dynamic split of CoreXL FW and CoreXL SND
Dynamic split of CoreXL changes the assignment of CoreXL SND's and CoreXL firewall workers automatically without reboot in R80.40+. Now, let's assume the CoreXL SNDs are overloaded, a mathematical formula is used to calculate that a further CoreXL SND is added. In this case a CoreXL firewall worker will not get any new Connections and the connections are distributed to another CoreXL firewall worker. If there are no more connections running through this CoreXL firewall worker, the core will be used for a new CoreXL SND instance. It also works the other way round.
- Adding and removing a CoreXL firewall worker
- Adding and removing a CoreXL SND
- Balance between CoreXL SND and CoreXL firewall worker
- GAIA 3.10 kernel
- only Check Point appliances with 8 cores or more
Tuning Tip: Use this function from R80.40 on appliances with 8 cores or more.
|Tip 10 - MultiCore IPsec VPN
R80.10 and above introduced MultiCore support for IPsec VPN. Starting in R80.10 Security Gateway, IPsec VPN MultiCore feature allows CoreXL to inspect VPN traffic on all CoreXL FW instances. This feature is enabled by default, and it is not supported to disable it.
Tuning Tip: MultiCore IPsec VPN is enabled by default on R80.x gateways.
sk104760 - VPN Core
sk105119 - Best Practices - VPN Performance
sk118097 - MultiCore Support for IPsec VPN in R80.10 and above
|Tip 11 - MultiCore Support for SSL
Introduced in R77.20, SSL MultiCore feature improves SSL performance of Security Gateway. SSL MultiCore feature is based on Check Point CoreXL technology, which enhances Security Gateway / VSX Gateway performance by enabling the CPU processing cores to concurrently perform multiple tasks.
Tuning Tip: MultiCore SSL is enabled by default on R80.x gateways.
sk101223 - MultiCore Support for SSL in R77.20 and above
Intel‘s AES New Instructions AES-NI is a encryption instruction set that improves on the Advanced Encryption Standard (AES) algorithm and accelerates the encryption of data in many processor familys. Better throughput can be achieved by selecting a faster encryption algorithm. For a comparison of encryption algorithm speeds. Relative speeds of algorithms for IPsec and SSL. AES-NI is Intel's dedicated instruction set, which significantly improves the speed of Encrypt-Decrypt actions and allows one to increase AES throughput for: Site-to-Site VPN, Remote Access VPN, Mobile Access, HTTPS Interception
The general speed of the system depends on additional parameters. Check Point supports AES-NI on many appliances, only when running Gaia OS with 64-bit kernel. On these appliances AES-NI is enabled by default. AES-NI is also supported on Open Servers. Comprised of seven new instructions, AES-NI gives your environment faster, more affordable data protection and greater security.
Tuning Tip: Enable AES-NI in the BIOS.
|Tip 13 - Firewall Priority Queues
Packets could be dropped by Firewall when CPU cores, on which Firewall runs, are fully utilized. Such packet loss might occur regardless of the connection's type (for example, local SSH or connection to Security Management Server server). The Firewall Priority Queues are disabled by default. The Priority Queues (PrioQ) mechanism is intended to prioritize part of the traffic, when we need to drop packets because the Security Gateway is stressed (CPU is fully utilized).
Tuning Tip: Use it depending on the situation.
sk105762 - Firewall Priority Queues in R77.30 / R80.10 and above
By default, each network interface has one traffic queue handled by one CPU. You cannot use more CPU cores for acceleration than the number of interfaces handling traffic. Multi-Queue lets you configure more than one traffic queue for each network interface. For each interface, more than one CPU core is used for acceleration. Multi-Queue is relevant only if SecureXL is enabled. Since R80.40 and R81 Multi Queue is enabled by default on all supported interfaces.
Tuning Tip: Enable multi-queueing on 10/40/100 Gbit/s interfaces.
|Tip 15 - Dynamic Dispatcher
CoreXL is a performance-enhancing technology for Security Gateways on platforms with multiple CPU cores. CoreXL enhances Security Gateway performance by enabling the processing CPU cores to concurrently perform multiple tasks.
On a Security Gateway with CoreXL enabled, the Firewall kernel is replicated multiple times. Each replicated copy, or Firewall instance, runs on one processing CPU core. These Firewall instances handle traffic concurrently, and each Firewall instance is a complete and independent Firewall inspection kernel. When CoreXL is enabled, all the Firewall kernel instances in the Security Gateway process traffic through the same interfaces and apply the same security policy.
The CoreXL software architecture includes the Secure Network Distributor (SND). The SND is responsible for:
- Processing incoming traffic from the network interfaces
- Securely accelerating authorized packets (if SecureXL is running)
- Distributing non-accelerated packets or Medium Path packets among CoreXL FW kernel instances - this functionality is also referred to as dispatcher
Traffic received on network interface cards (NICs) is directed to a processing core running the SND.
The dispatcher is executed when a packet should be forwarded to a CoreXL FW instance (in Slow path and Medium path - see sk98737 for details) and is in charge of selecting the CoreXL FW instance that will inspects the packet.
In R77.20 and lower versions, traffic distribution between CoreXL FW instances is statically based on Source IP addresses, Destination IP addresses, and the IP 'Protocol' type. Therefore, there are possible scenarios where one or more CoreXL FW instances would handle more connections, or perform more processing on the packets forwarded to them, than the other CoreXL FW instances.
This may lead to a situation, where the load is not balanced across the CPU cores, on which the CoreXL FW instances are running.
Tuning Tip: Use Dynamic Dispatcher depending on the situation.
sk105261 - CoreXL Dynamic Dispatcher in R77.30 / R80.10 and above
|Tip 16 - SMT (Hyper Threading)
Hyper Threading Technology is a form of Simultaneous Multithreading Technology (SMT) introduced by Intel. Architecturally, a processor with Hyper-Threading technology consists of two logical processors per core, each of which has its own processor architectural state. Each logical processor can be individually halted, interrupted or directed to execute a specified thread, independently from the other logical processor sharing the same physical core.
SMT (also called HyperThreading or HT) is a feature that is supported on Check Point appliances running Gaia OS. When enabled, SMT doubles the number of logical CPUs on the Security Gateway, which enhances physical processor utilization. When SMT is disabled, the number of logical CPUs equals the number of physical cores.
SMT improves performance up to 30% on NGFW software blades such as IPS, Application & URL Filtering and Threat Prevention by increasing the number of CoreXL FW instances based on the number of logical CPUs.
Tuning Tip: Enable SMT on appliances and disable SMT on open server.
|Tip 17 - HTTPS Interception vs. SNI
With enabled HTTPS interception:
If the https interception is enabled, the parameter host from http header can be used for the url because the traffic is analyzed by active streaming. Check Point Active Streaming (CPAS) allow the changing of data, we play the role of “man in the middle”. CPAS breaks the connection into two parts using our own stack – this mean, we are responsible for all the stack work (dealing with options, retransmissions, timers etc.). An application is register to CPAS when a connection start and supply callbacks for event handler and read handler. CPAS breaks the HTTPS connection and others into two parts using our own stack – this mean, we are responsible for all the stack work (dealing with options, retransmissions, timers etc.)
Without enabled HTTPS interception (SNI is used):
If the https interception is disabled, SNI is used to recognize the virtual URL for application control and url filtering. It is less resource intensive than HTTPS interception
Tuning Tip: Prefer SNI to HTTPS interception, if you only use application control and url filtering.
|Tip 18 - Network Interfaces and Server Hardware
Only use certified hardware for open server and network cards. Prevent network and packet errors on the network cards.
Tuning Tip: Use supported hardware only and avoid network card issus.
|Tip 19 - Interface Interface
RX-ERR: Should be zero. Caused by cabling problem, electrical interference, or a bad port. Examples: framing errors, short frames/runts, late collisions caused by duplex mismatch.
Tuning Tip: First and easy check duplex mismatch
RX-OVR: Should be zero. Overrun in NIC hardware buffering. Solved by using a higher-speed NIC, bonding multiple interfaces, or enabling Ethernet Flow Control (controversial).
Tuning Tip: Use higher speed NIC's or bond interfaces
RX-DRP: Should be less than 0.1% of RX-OK. Caused by a network ring buffer overflow in the Gaia kernel due to the inability of SoftIRQ to empty the ring buffer fast enough. Solved by allocating more SND/IRQ cores in CoreXL (always the first step), enabling Multi-Queue, or as a last resort increasing the ring buffer size.
Tuning Tip: Use more SND/IRQ cores in CoreXL
|Tip 20 - Interface (Heavy Connections)
n computer networking, an elephant flow (heavy connection) is an extremely large in total bytes continuous flow set up by a TCP or other protocol flow measured over a network link. Elephant flows, though not numerous, can occupy a disproportionate share of the total bandwidth over a period of time. When the observations were made that a small number of flows carry the majority of Internet traffic and the remainder consists of a large number of flows that carry very little Internet traffic (mice flows).
All packets associated with that elephant flow must be handled by the same firewall worker core (CoreXL instance). Packets could be dropped by Firewall when CPU cores, on which Firewall runs, are fully utilized. Such packet loss might occur regardless of the connection's type.
What typically produces heavy connections:
- System backups
- Database backups
- VMWare sync.
Evaluation of heavy connections (epehant flows)
A first indication is a high CPU load on a core if all other cores have a normal CPU load. This can be displayed very nicely with "top". Ok, now a core has 100% CPU usage. What can we do now? For this there is a SK105762 to activate "Firewall Priority Queues". This feature allows the administrator to monitor the heavy connections that consume the most CPU resources without interrupting the normal operation of the Firewall. After enabling this feature, the relevant information is available in CPView Utility. The system saves heavy connection data for the last 24 hours and CPDiag has a matching collector which uploads this data for diagnosis purposes.
Heavy connection flow system definition on Check Point gateways:
- Specific instance CPU is over 60%
- Suspected connection lasts more than 10s
- Suspected connection utilizes more than 50% of the total work the instance does. In other words, connection CPU utilization must be > 30%
Tuning Tip: Check for heavy connections on the situation
In “Kernel Mode Firewall” KMFW, the maximum number of running cores is limited to 40 because of the Linux/Intel limitation of 2GB kernel memory, and because CoreXL architecture needs to load a large driver (~42MB) dozens of times (according to the CPU number, and up to 40 times). Newer platforms that contain more than 40 cores e.g., 23900 or open server are not fully utilized.
The solution of the problem is a firewall in the user mode of the Linux operating system.
USFW “User Space Firewall” or UMFW stands for “User Mode Firewall”, and it is based on proven VSX code. This mode was introduced in R80.10.
From a performance point of view I could not see any differences between UMFW and KMFW. I noticed that the process fwk0_dev_0 generates a very high CPU load in the UMFW. My guess as to the purpose of the fwk0_dev_0 is that it acts as the liaison between the multiple fwk firewall worker processes (fw instance thread that takes care for the packet processing) and the single fwmod kernel driver instance and the process for high priority cluster thread.
If you want to change the mode from UMFW to KMFW this can be done by changing the registry parameter FwIsUsermode by cpprod_util command. In UMFW the fw instances are threads of the fwk0_dev_0 so by default the top shows all the threads cpu utilization under the main thread. Top has the option to present the utilization per thread as well.
Tuning Tip: R80.10 to R80.30: With less then 35 cores use KMFW and with more then 35 cores use UMFW.
An interesting point, in performance tuning are BIOS settings. Here we have to distinguish whether we are talking about open servers or applications.
With Check Point appliances the BIOS settings are set correctly and we don't have to do anything. This article (sk120915) provides the list of Check Point appliances and the available BIOS versions. If there are problems, the TAC can make settings on the appliance.
The situation is different with Open Server. Here the BIOS settings are described in the HCL's if necessary.
In principle, various BIOS settings can be performed on Open Server for the following points. The names of the settings may be different depending on the hardware and processor generation.
Here is an overview of the most important BIOS points:
- Intel Turbo Boost Technology (old name Turbo Mode)
- Intel SpeedStep settings
- Energy/Performance Bias:
- Energiy saving settings
- Minimum Processor Idle Power C-States
- Minimum Processor Idle Power Package C-States
- Hyperthreading (SMT) settings (It is only supported from R80.40 on open servers)
- X2APIC Support
- AES-NI Support
Tuning Tip: Enable the correct BIOS settings
|Tip 23 - Management Data Plane Separation
Management Data Plane Separation allows a security gateway to have isolated management and data networks. The network system of each domain (plane) is independent and includes interfaces, routes, sockets, and processes. This has the performance advantage that some processes run separately from the firewall core daemon. Thus it reduces the load on the firewall processes, e.g. during the policy installation.
The management plane is a domain whose purpose is to access, provision, and monitor the gateway. This includes:
- Routing separation
- Resource Separation
- Access: SSH, FTP, and more
- Provisioning: Policy installation, GAIA Portal, RestAPI's, and more
- Monitoring: Logs, SNMP, and more
When resource separation is enabled, the security gateway will separat the management instance. Here is an example:
CPU core 0
CPU core 1
CPU core 2
CPU core 3
CPU core 4
CPU core 5
CPU core 6
CPU core 7
Tuning Tip: Enable MDPS if possible.
|Tip 24 - CPU Spike Detective
The CPU Spike Detective is a tool running only on Gaia OS 3.10 that monitors the system CPU usage and checks for CPU utilization spikes. This tool is introduced starting from R80.40 JHF 69.
How does the spike detective work:
A spike in a CPU core utilization is considered when these conditions are met:
- CPU utilization is over 80% (this threshold is configurable)
- CPU utilization of the specific CPU core is at least 1.5 times higher than the entire system average usage (this threshold is configurable).
This ensures that a highly utilized system (for example, during a performance testing) will not detect all CPU cores as "spiked".
A thread/process is considered as "spiked" if it meets the below conditions:
- Running on a "spiked" CPU core
- Utilization is over 70% (this threshold is configurable)
- Utilization is at least 1.5 times higher than the system average (this threshold is configurable)
Tuning Tip: Use CPU Spike Detective to monitors the system CPU usage.
➜ CCSM Elite, CCME, CCTE