Who rated this post

cancel
Showing results for 
Search instead for 
Did you mean: 
HeikoAnkenbrand
Champion Champion
Champion

R81 - Top 25 Gateway Tuning Tips

 

Tip  - use this new tool

Now I have developed a tool that automatically check the most of the points in this article.
Use this tool to show quickly an overview of status information of all your gateways with only one CLI command "eview".
It shows the most important performance relevant information of all gateways, which are briefly summarized in this article:

Easy View Tool - View System Info for All Gateways Simultaneously

Tip 1 - SecureXL

SecureXL is a software acceleration product installed on Security Gateways. SecureXL network acceleration techniques deliver wire-speed performance for Security Gateways. Performance Pack uses SecureXL technology and other innovative network acceleration techniques to deliver wire-speed performance for Security Gateways. The SecureXL device minimizes the connections that are processed by the INSPECT driver. SecureXL accelerates connections on two ways.

SecureXL is implemented either in software or in hardware:

  •       SAM cards on Check Point 21000 appliances
  •       Falcon cards (new in R80.20) on different appliances

Tuning Tip: From R80.20 SecureXL is always enabled and can no longer be disabled completely.

sk98722 - SecureXL for R80.10 and below 
sk98348 - Best Practices - Security Gateway Performance
sk32578 - SecureXL Mechanism 
sk153832 - SecureXL for R80.20 and above 

R80.x - Security Gateway Architecture (Logical Packet Flow)
R80.x - Security Gateway Architecture (Acceleration Card Offloading)


Tip 2 - SecureXL Connection Templates

Feature that accelerates the speed, at which a connection is established by matching a new connection to a set of attributes. When a new connection matches the Connection Template (old name "Accept Template") , subsequent connections are established without performing a rule match and therefore are accelerated. Connection Templates are generated from active connections according to policy rules.

Tuning Tip: Accept Templates are enabled by default.

sk32578 - SecureXL Mechanism
Performance Tuning R80.30 Administration Guide - Connection Templates
R80.x - Security Gateway Architecture (Logical Packet Flow)

 

Tip 3 - SecureXL NAT Templates

Using SecureXL Templates for NAT traffic is critical to achieve high session rate for NAT. SecureXL Templates are supported for Static NAT and Hide NAT using the existing SecureXL Templates mechanism.

Tuning Tip: Enable NAT Templates depending on the situation.

sk71200 - SecureXL NAT Templates  R80.x - Security Gateway Architecture (Logical Packet Flow)

 

Tip 4 - SecureXL Drop Templates

Optimized Drops feature in R76 and above. Heavy load of traffic that should be dropped causes an increase in the Security Gateway's resource consumption. SecureXL Drop Templates are not created, although this option was checked in SmartDashboard.

Tuning Tip: Enable Drop Templates depending on the situation

sk90861 - Optimized Drops feature in R76 and above 
sk66402 - SecureXL Drop Templates 

R80.x - Security Gateway Architecture (Logical Packet Flow)

 

Tip 5 - SecureXL Fast Acceleration

The Fast Acceleration (picture 1 green) feature lets you define trusted connections to allow bypassing deep packet inspection on R80.20 JHF103 and above gateways. This feature significantly improves throughput for these trusted high volume connections and reduces CPU consumption.

The CLI of the gateway can be used to create rules that allow you to bypass the SecureXL PSLXL path to route all connections through the fast path.

Tuning Tip: Use this function to exclude IP's or networks from deep inspection.

sk156672 - SecureXL Fast Accelerator (fw fast_accel) for R80.20 and above

R80.x - Performance Tuning Tip - SecureXL Fast Accelerator (fw ctl fast_accel)

Tip 6 - SecureXL Penalty Box

The SecureXL penalty box is a mechanism that performs an early drop of packets arriving from suspected sources. This mechanism is supported starting in R75.40VS.

The purpose of this feature is to allow the Security Gateway to cope better under high load, possibly caused by a DoS/DDoS attack.

A client that sends packets that are dropped by the firewall rulebase or performs violations of the IPS policy is reported to this mechanism. If a client is reported frequently, it would be put in a penalty box. Any packet arriving from this IP address would be dropped by the performance pack at a very early stage.

Tuning Tip: Use the SecureXL penalty box if you have DDoS attacks

sk74520 - What is the SecureXL penalty box mechanism for offending IP addresses?  R80.x - Performance Tuning Tip - DDoS „fw sam“ vs. „fwaccel dos“

 

Tip 7 - SIM Affinity

Association of a particular network interface with a CPU core (either 'Automatic' (default), or 'Static' / 'Manual'). Interfaces are bound to CPU cores via SMP IRQ affinity setting. SIM Affinity in Automatic mode may make poor decisions on multi-core platforms. In addition, some multi-core hardware platforms suffer from an inability to assign IRQs to use all the CPU cores efficiently.

Tuning Tip: In special cases the SIM affinity should be set manually.

sk61962 - SMP IRQ Affinity on Check Point Security Gateway 
sk33250 - Automatic SIM Affinity on Multi-Core CPU Systems 
Performance Tuning R80.30 Administration Guide – Affinity Settings 

Tip 8 - CoreXL

CoreXL is a performance-enhancing technology for Security Gateways on multi-CPU-core processing platforms. CoreXL enhances Security Gateway performance by enabling the processing CPU cores to concurrently perform multiple tasks.

CoreXL provides almost linear scalability of performance, according to the number of processing CPU cores on a single machine. The increase in performance is achieved without requiring any changes to management or to network topology.

On a Security Gateway with CoreXL enabled, the Firewall kernel is replicated multiple times. Each replicated copy, or FW instance, runs on one processing CPU core. These FW instances handle traffic concurrently, and each FW instance is a complete and independent FW inspection kernel. When CoreXL is enabled, all the FW kernel instances in the Security Gateway process traffic through the same interfaces and apply the same security policy.

sk98737 – CoreXL 
sk98348 - Best Practices - Security Gateway Performance
R80.x - Security Gateway Architecture (Logical Packet Flow)
R80.x - Security Gateway Architecture (Content Inspection)

 

Tip 9 - CoreXL - Dynamic split of CoreXL FW and CoreXL SND

Dynamic split of CoreXL changes the assignment of  CoreXL SND's and CoreXL firewall workers automatically without reboot in R80.40+.  Now, let's assume the CoreXL SNDs are overloaded, a mathematical formula is used to calculate that a further CoreXL SND is added. In this case a CoreXL firewall worker  will not get any new Connections and the connections are distributed to another CoreXL firewall worker. If there are no more connections running through this CoreXL firewall worker, the core will be used for a new CoreXL SND instance. It also works the other way round.

  • Adding and removing a CoreXL firewall worker
  • Adding and removing a CoreXL SND
  • Balance between CoreXL SND and CoreXL firewall worker
  • GAIA 3.10 kernel
  • only Check Point appliances with 8 cores or more

Tuning Tip: Use this function from R80.40 on appliances with 8 cores or more.

No SK is available yet. R80.40 - Dynamic split of CoreXL 

 

Tip 10 - MultiCore IPsec VPN

R80.10 and above introduced MultiCore support for IPsec VPN. Starting in R80.10 Security Gateway, IPsec VPN MultiCore feature allows CoreXL to inspect VPN traffic on all CoreXL FW instances. This feature is enabled by default, and it is not supported to disable it.

Tuning Tip: MultiCore IPsec VPN is enabled by default on R80.x gateways.

sk104760 - VPN Core 
sk105119 - Best Practices - VPN Performance 
sk118097 - MultiCore Support for IPsec VPN in R80.10 and above 

Tip 11 - MultiCore Support for SSL

Introduced in R77.20, SSL MultiCore feature improves SSL performance of Security Gateway. SSL MultiCore feature is based on Check Point CoreXL technology, which enhances Security Gateway / VSX Gateway performance by enabling the CPU processing cores to concurrently perform multiple tasks.

Tuning Tip: MultiCore SSL is enabled by default on R80.x gateways.

sk101223 - MultiCore Support for SSL in R77.20 and above 

Tip 12 - AES-NI

Intel‘s AES New Instructions AES-NI is a encryption instruction set that improves on the Advanced Encryption Standard (AES) algorithm and accelerates the encryption of data in many processor familys. Better throughput can be achieved by selecting a faster encryption algorithm. For a comparison of encryption algorithm speeds. Relative speeds of algorithms for IPsec and SSL. AES-NI is Intel's dedicated instruction set, which significantly improves the speed of Encrypt-Decrypt actions and allows one to increase AES throughput for: Site-to-Site VPN, Remote Access VPN, Mobile Access, HTTPS Interception

The general speed of the system depends on additional parameters. Check Point supports AES-NI on many appliances, only when running Gaia OS with 64-bit kernel. On these appliances AES-NI is enabled by default. AES-NI is also supported on Open Servers. Comprised of seven new instructions, AES-NI gives your environment faster, more affordable data protection and greater security.

Tuning Tip: Enable AES-NI in the BIOS.

sk73980 - Relative speeds of algorithms for IPsec and SSL R80.x - Performance Tuning Tip - AES-NI

 

Tip 13 - Firewall Priority Queues

Packets could be dropped by Firewall when CPU cores, on which Firewall runs, are fully utilized. Such packet loss might occur regardless of the connection's type (for example, local SSH or connection to Security Management Server server). The Firewall Priority Queues are disabled by default. The Priority Queues (PrioQ) mechanism is intended to prioritize part of the traffic, when we need to drop packets because the Security Gateway is stressed (CPU is fully utilized).

Tuning Tip: Use it depending on the situation.

sk105762 - Firewall Priority Queues in R77.30 / R80.10 and above 

Tip 14 - Multi-Queue

By default, each network interface has one traffic queue handled by one CPU. You cannot use more CPU cores for acceleration than the number of interfaces handling traffic. Multi-Queue lets you configure more than one traffic queue for each network interface. For each interface, more than one CPU core is used for acceleration. Multi-Queue is relevant only if SecureXL is enabled. Since R80.40 and R81 Multi Queue is enabled by default on all supported interfaces.

Tuning Tip: Enable multi-queueing on 10/40/100 Gbit/s interfaces.

Performance Tuning R80.30 Administration Guide – Multi-Queue R80.x - Performance Tuning Tip - Multi Queue
R81.x  - Multi Queue (what is new) 

 

Tip 15 - Dynamic Dispatcher

CoreXL is a performance-enhancing technology for Security Gateways on platforms with multiple CPU cores. CoreXL enhances Security Gateway performance by enabling the processing CPU cores to concurrently perform multiple tasks.

On a Security Gateway with CoreXL enabled, the Firewall kernel is replicated multiple times. Each replicated copy, or Firewall instance, runs on one processing CPU core. These Firewall instances handle traffic concurrently, and each Firewall instance is a complete and independent Firewall inspection kernel. When CoreXL is enabled, all the Firewall kernel instances in the Security Gateway process traffic through the same interfaces and apply the same security policy.

The CoreXL software architecture includes the Secure Network Distributor (SND). The SND is responsible for:

  • Processing incoming traffic from the network interfaces
  • Securely accelerating authorized packets (if SecureXL is running)
  • Distributing non-accelerated packets or Medium Path packets among CoreXL FW kernel instances - this functionality is also referred to as dispatcher

Traffic received on network interface cards (NICs) is directed to a processing core running the SND.

The dispatcher is executed when a packet should be forwarded to a CoreXL FW instance (in Slow path and Medium path - see sk98737 for details) and is in charge of selecting the CoreXL FW instance that will inspects the packet.

In R77.20 and lower versions, traffic distribution between CoreXL FW instances is statically based on Source IP addresses, Destination IP addresses, and the IP 'Protocol' type. Therefore, there are possible scenarios where one or more CoreXL FW instances would handle more connections, or perform more processing on the packets forwarded to them, than the other CoreXL FW instances.

This may lead to a situation, where the load is not balanced across the CPU cores, on which the CoreXL FW instances are running.

Tuning Tip: Use Dynamic Dispatcher depending on the situation.

sk105261 - CoreXL Dynamic Dispatcher in R77.30 / R80.10 and above 

Tip 16 - SMT (Hyper Threading)

Hyper Threading Technology is a form of Simultaneous Multithreading Technology (SMT) introduced by Intel. Architecturally, a processor with Hyper-Threading technology consists of two logical processors per core, each of which has its own processor architectural state. Each logical processor can be individually halted, interrupted or directed to execute a specified thread, independently from the other logical processor sharing the same physical core.

SMT (also called HyperThreading or HT) is a feature that is supported on Check Point appliances running Gaia OS. When enabled, SMT doubles the number of logical CPUs on the Security Gateway, which enhances physical processor utilization. When SMT is disabled, the number of logical CPUs equals the number of physical cores.

SMT improves performance up to 30% on NGFW software blades such as IPS, Application & URL Filtering and Threat Prevention by increasing the number of CoreXL FW instances based on the number of logical CPUs.

Tuning Tip: Enable SMT on appliances and disable SMT on open server.

sk93000 - SMT (HyperThreading) Feature Guide R80.x - Performance Tuning Tip - SMT (Hyper Threading)

 

Tip 17 - HTTPS Interception vs. SNI

With enabled HTTPS interception:

If the https interception is enabled, the parameter host from http header can be used for the url because the traffic is analyzed by active streaming. Check Point Active Streaming (CPAS) allow the changing of data, we play the role of “man in the middle”. CPAS breaks the connection into two parts using our own stack – this mean, we are responsible for all the stack work (dealing with options, retransmissions, timers etc.). An application is register to CPAS when a connection start and supply callbacks for event handler and read handler. CPAS breaks the HTTPS connection and others into two parts using our own stack – this mean, we are responsible for all the stack work (dealing with options, retransmissions, timers etc.) 

Without enabled HTTPS interception (SNI is used):

If the https interception is disabled, SNI is used to recognize the virtual URL for application control and url filtering. It is less resource intensive than HTTPS interception

Tuning Tip: Prefer SNI to HTTPS interception, if you only use application control and url filtering.

sk108202 - HTTPS Inspection 
URL Filtering using SNI for HTTPS websites.pdf
R80.20 - SNI vs. enabled HTTPS Interception 

 

Tip 18 - Network Interfaces and Server Hardware

Only use certified hardware for open server and network cards. Prevent network and packet  errors on the network cards.

Tuning Tip: Use supported hardware only and avoid network card issus.

HCL R80.x - Performance Tuning Tip - Intel Hardware

 

Tip 19 - Interface Interface

 

RX-ERR: Should be zero.  Caused by cabling problem, electrical interference, or a bad port.  Examples: framing errors, short frames/runts, late collisions caused by duplex mismatch.
Tuning Tip:  First and easy check duplex mismatch 

RX-OVR: Should be zero.  Overrun in NIC hardware buffering.  Solved by using a higher-speed NIC, bonding multiple interfaces, or enabling Ethernet Flow Control (controversial). 

Tuning Tip:  Use higher speed NIC's or bond interfaces

RX-DRP: Should be less than 0.1% of RX-OK.  Caused by a network ring buffer overflow in the Gaia kernel due to the inability of SoftIRQ to empty the ring buffer fast enough.  Solved by allocating more SND/IRQ cores in CoreXL (always the first step), enabling Multi-Queue, or as a last resort increasing the ring buffer size.

Tuning Tip:  Use more SND/IRQ cores in CoreXL

sk61962 - SMP IRQ Affinity on Check Point Security Gateway 
sk33250 - Automatic SIM Affinity on Multi-Core CPU Systems 
Performance Tuning R80.30 Administration Guide – Multi-Queue
R80.x - Performance Tuning Tip - Multi Queue


 

Tip 20 - Interface (Heavy Connections)

n computer networking, an elephant flow (heavy connection) is an extremely large in total bytes continuous flow set up by a TCP or other protocol flow measured over a network link. Elephant flows, though not numerous, can occupy a disproportionate share of the total bandwidth over a period of time.  When the observations were made that a small number of flows carry the majority of Internet traffic and the remainder consists of a large number of flows that carry very little Internet traffic (mice flows).

All packets associated with that elephant flow must be handled by the same firewall worker core (CoreXL instance). Packets could be dropped by Firewall when CPU cores, on which Firewall runs, are fully utilized. Such packet loss might occur regardless of the connection's type.

What typically produces heavy connections:

  • System backups
  • Database backups
  • VMWare sync.

Evaluation of heavy connections (epehant flows)

A first indication is a high CPU load on a core if all other cores have a normal CPU load. This can be displayed very nicely with "top". Ok, now a core has 100% CPU usage. What can we do now? For this there is a SK105762 to activate "Firewall Priority Queues".  This feature allows the administrator to monitor the heavy connections that consume the most CPU resources without interrupting the normal operation of the Firewall. After enabling this feature, the relevant information is available in CPView Utility. The system saves heavy connection data for the last 24 hours and CPDiag has a matching collector which uploads this data for diagnosis purposes.

Heavy connection flow system definition on Check Point gateways:

  • Specific instance CPU is over 60%
  • Suspected connection lasts more than 10s
  • Suspected connection utilizes more than 50% of the total work the instance does. In other words, connection CPU utilization must be > 30%  

Tuning Tip: Check for heavy connections on the situation

sk105762 - Firewall Priority Queues in R77.30 / R80.10 and above

R80.x - Performance Tuning Tip - Elephant Flows (Heavy Connections) 

 

Tip 21 - KMFW vs. UMFW

In “Kernel Mode Firewall” KMFW, the maximum number of running cores is limited to 40 because of the Linux/Intel limitation of 2GB kernel memory, and because CoreXL architecture needs to load a large driver (~42MB) dozens of times (according to the CPU number, and up to 40 times). Newer platforms that contain more than 40 cores e.g., 23900 or open server are not fully utilized.

The solution of the problem is a firewall in the user mode of the Linux operating system.

USFW “User Space Firewall” or UMFW stands for “User Mode Firewall”, and it is based on proven VSX code. This mode was introduced in R80.10.

From a performance point of view I could not see any differences between UMFW and KMFW. I noticed that the process fwk0_dev_0 generates a very high CPU load in the UMFW. My guess as to the purpose of the fwk0_dev_0 is that it acts as the liaison between the multiple fwk firewall worker processes (fw instance thread that takes care for the packet processing) and the single fwmod kernel driver instance and the process for high priority cluster thread.

If you want to change the mode from UMFW to KMFW this can be done by changing the registry parameter FwIsUsermode by cpprod_util command. In UMFW the fw instances are threads of the fwk0_dev_0 so by default the top shows all the threads cpu utilization under the main thread. Top has the option to present the utilization per thread as well.

 

Tuning Tip: R80.10 to R80.30: With less then 35 cores use KMFW and with more then 35 cores use UMFW.

sk149973 - How to enable USFW (User-Space Firewall) on a 23900 appliance

R80.x - Performance Tuning Tip – User Mode Firewall vs. Kernel Mode Firewall

 

Tip 22 - BIOS

An interesting point, in performance tuning are BIOS settings. Here we have to distinguish whether we are talking about open servers or applications.

With Check Point appliances the BIOS settings are set correctly and we don't have to do anything. This article (sk120915)  provides the list of Check Point appliances and the available BIOS versions. If there are problems, the TAC can make settings on the appliance.

The situation is different with Open Server. Here the BIOS settings are described in the HCL's if necessary.

In principle, various BIOS settings can be performed on Open Server for the following points. The names of the settings may be different depending on the hardware and processor generation.

Here is an overview of the most important BIOS points:

  • Intel Turbo Boost Technology (old name Turbo Mode) 
  • Intel SpeedStep settings
  • Energy/Performance Bias:
    • Memory Speed
    • CPU Speed
  • Energiy saving settings
    • Minimum Processor Idle Power C-States
    • Minimum Processor Idle Power Package C-States
  • Hyperthreading (SMT) settings (It is only supported from R80.40 on open servers)
  • X2APIC Support
  • AES-NI Support

Tuning Tip: Enable the correct BIOS settings

 

R80.x - Performance Tuning Tip - BIOS

 
Tip 23 - Management Data Plane Separation

Management Data Plane Separation allows a security gateway to have isolated management and data networks. The network system of each domain (plane) is independent and includes interfaces, routes, sockets, and processes. This has the performance advantage that some processes run separately from the firewall core daemon. Thus it reduces the load on the firewall processes, e.g. during the policy installation.

The management plane is a domain whose purpose is to access, provision, and monitor the gateway. This includes:
      - Routing separation
      - Resource Separation
                 - Access:                         SSH, FTP, and more
                 - Provisioning:               Policy installation, GAIA Portal, RestAPI's, and more
                 - Monitoring:                  Logs, SNMP, and more

When resource separation is enabled, the security gateway will separat the management instance. Here is an example:

Mgmt
instance

CPU core 0

SND


CPU core 1

SND


CPU core 2

CoreXL
instance

CPU core 3

CoreXL
instance

CPU core 4

CoreXL
instance

CPU core 5

CoreXL
instance

CPU core 6

CoreXL
instance

CPU core 7


Tuning Tip: Enable MDPS if possible.

SK138672- Management Data Plane Separation R80.x - Performance Tuning Tip - Management Data Plane Separation  (R80.30 kernel 3.10 and JHF 136+)

 

Tip 24 - CPU Spike Detective

The CPU Spike Detective is a tool running only on Gaia OS 3.10 that monitors the system CPU usage and checks for CPU utilization spikes. This tool is introduced starting from R80.40 JHF 69.

How does the spike detective work:

A spike in a CPU core utilization is considered when these conditions are met:

- CPU utilization is over 80% (this threshold is configurable)
- CPU utilization of the specific CPU core is at least 1.5 times higher than the entire system average usage (this threshold is configurable).

This ensures that a highly utilized system (for example, during a performance testing) will not detect all CPU cores as "spiked".

A thread/process is considered as "spiked" if it meets the below conditions:
- Running on a "spiked" CPU core
- Utilization is over 70% (this threshold is configurable)
- Utilization is at least 1.5 times higher than the system average (this threshold is configurable)

Tuning Tip: Use CPU Spike Detective to monitors the system CPU usage.

SK166454 - CPU Spike Detective
R80.x - Performance Tuning Tip - CPU Spike Detective  (R80.40 JHF69+)

 

➜ CCSM Elite, CCME, CCTE ➜ www.checkpoint.tips
(6)
Who rated this post