Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
HeikoAnkenbrand
Champion Champion
Champion
Jump to solution

R81 - Top 25 Gateway Tuning Tips

 

Tip  - use this new tool

Now I have developed a tool that automatically check the most of the points in this article.
Use this tool to show quickly an overview of status information of all your gateways with only one CLI command "eview".
It shows the most important performance relevant information of all gateways, which are briefly summarized in this article:

Easy View Tool - View System Info for All Gateways Simultaneously

Tip 1 - SecureXL

SecureXL is a software acceleration product installed on Security Gateways. SecureXL network acceleration techniques deliver wire-speed performance for Security Gateways. Performance Pack uses SecureXL technology and other innovative network acceleration techniques to deliver wire-speed performance for Security Gateways. The SecureXL device minimizes the connections that are processed by the INSPECT driver. SecureXL accelerates connections on two ways.

SecureXL is implemented either in software or in hardware:

  •       SAM cards on Check Point 21000 appliances
  •       Falcon cards (new in R80.20) on different appliances

Tuning Tip: From R80.20 SecureXL is always enabled and can no longer be disabled completely.

sk98722 - SecureXL for R80.10 and below 
sk98348 - Best Practices - Security Gateway Performance
sk32578 - SecureXL Mechanism 
sk153832 - SecureXL for R80.20 and above 

R80.x - Security Gateway Architecture (Logical Packet Flow)
R80.x - Security Gateway Architecture (Acceleration Card Offloading)


Tip 2 - SecureXL Connection Templates

Feature that accelerates the speed, at which a connection is established by matching a new connection to a set of attributes. When a new connection matches the Connection Template (old name "Accept Template") , subsequent connections are established without performing a rule match and therefore are accelerated. Connection Templates are generated from active connections according to policy rules.

Tuning Tip: Accept Templates are enabled by default.

sk32578 - SecureXL Mechanism
Performance Tuning R80.30 Administration Guide - Connection Templates
R80.x - Security Gateway Architecture (Logical Packet Flow)

 

Tip 3 - SecureXL NAT Templates

Using SecureXL Templates for NAT traffic is critical to achieve high session rate for NAT. SecureXL Templates are supported for Static NAT and Hide NAT using the existing SecureXL Templates mechanism.

Tuning Tip: Enable NAT Templates depending on the situation.

sk71200 - SecureXL NAT Templates  R80.x - Security Gateway Architecture (Logical Packet Flow)

 

Tip 4 - SecureXL Drop Templates

Optimized Drops feature in R76 and above. Heavy load of traffic that should be dropped causes an increase in the Security Gateway's resource consumption. SecureXL Drop Templates are not created, although this option was checked in SmartDashboard.

Tuning Tip: Enable Drop Templates depending on the situation

sk90861 - Optimized Drops feature in R76 and above 
sk66402 - SecureXL Drop Templates 

R80.x - Security Gateway Architecture (Logical Packet Flow)

 

Tip 5 - SecureXL Fast Acceleration

The Fast Acceleration (picture 1 green) feature lets you define trusted connections to allow bypassing deep packet inspection on R80.20 JHF103 and above gateways. This feature significantly improves throughput for these trusted high volume connections and reduces CPU consumption.

The CLI of the gateway can be used to create rules that allow you to bypass the SecureXL PSLXL path to route all connections through the fast path.

Tuning Tip: Use this function to exclude IP's or networks from deep inspection.

sk156672 - SecureXL Fast Accelerator (fw fast_accel) for R80.20 and above

R80.x - Performance Tuning Tip - SecureXL Fast Accelerator (fw ctl fast_accel)

Tip 6 - SecureXL Penalty Box

The SecureXL penalty box is a mechanism that performs an early drop of packets arriving from suspected sources. This mechanism is supported starting in R75.40VS.

The purpose of this feature is to allow the Security Gateway to cope better under high load, possibly caused by a DoS/DDoS attack.

A client that sends packets that are dropped by the firewall rulebase or performs violations of the IPS policy is reported to this mechanism. If a client is reported frequently, it would be put in a penalty box. Any packet arriving from this IP address would be dropped by the performance pack at a very early stage.

Tuning Tip: Use the SecureXL penalty box if you have DDoS attacks

sk74520 - What is the SecureXL penalty box mechanism for offending IP addresses?  R80.x - Performance Tuning Tip - DDoS „fw sam“ vs. „fwaccel dos“

 

Tip 7 - SIM Affinity

Association of a particular network interface with a CPU core (either 'Automatic' (default), or 'Static' / 'Manual'). Interfaces are bound to CPU cores via SMP IRQ affinity setting. SIM Affinity in Automatic mode may make poor decisions on multi-core platforms. In addition, some multi-core hardware platforms suffer from an inability to assign IRQs to use all the CPU cores efficiently.

Tuning Tip: In special cases the SIM affinity should be set manually.

sk61962 - SMP IRQ Affinity on Check Point Security Gateway 
sk33250 - Automatic SIM Affinity on Multi-Core CPU Systems 
Performance Tuning R80.30 Administration Guide – Affinity Settings 

Tip 8 - CoreXL

CoreXL is a performance-enhancing technology for Security Gateways on multi-CPU-core processing platforms. CoreXL enhances Security Gateway performance by enabling the processing CPU cores to concurrently perform multiple tasks.

CoreXL provides almost linear scalability of performance, according to the number of processing CPU cores on a single machine. The increase in performance is achieved without requiring any changes to management or to network topology.

On a Security Gateway with CoreXL enabled, the Firewall kernel is replicated multiple times. Each replicated copy, or FW instance, runs on one processing CPU core. These FW instances handle traffic concurrently, and each FW instance is a complete and independent FW inspection kernel. When CoreXL is enabled, all the FW kernel instances in the Security Gateway process traffic through the same interfaces and apply the same security policy.

sk98737 – CoreXL 
sk98348 - Best Practices - Security Gateway Performance
R80.x - Security Gateway Architecture (Logical Packet Flow)
R80.x - Security Gateway Architecture (Content Inspection)

 

Tip 9 - CoreXL - Dynamic split of CoreXL FW and CoreXL SND

Dynamic split of CoreXL changes the assignment of  CoreXL SND's and CoreXL firewall workers automatically without reboot in R80.40+.  Now, let's assume the CoreXL SNDs are overloaded, a mathematical formula is used to calculate that a further CoreXL SND is added. In this case a CoreXL firewall worker  will not get any new Connections and the connections are distributed to another CoreXL firewall worker. If there are no more connections running through this CoreXL firewall worker, the core will be used for a new CoreXL SND instance. It also works the other way round.

  • Adding and removing a CoreXL firewall worker
  • Adding and removing a CoreXL SND
  • Balance between CoreXL SND and CoreXL firewall worker
  • GAIA 3.10 kernel
  • only Check Point appliances with 8 cores or more

Tuning Tip: Use this function from R80.40 on appliances with 8 cores or more.

No SK is available yet. R80.40 - Dynamic split of CoreXL 

 

Tip 10 - MultiCore IPsec VPN

R80.10 and above introduced MultiCore support for IPsec VPN. Starting in R80.10 Security Gateway, IPsec VPN MultiCore feature allows CoreXL to inspect VPN traffic on all CoreXL FW instances. This feature is enabled by default, and it is not supported to disable it.

Tuning Tip: MultiCore IPsec VPN is enabled by default on R80.x gateways.

sk104760 - VPN Core 
sk105119 - Best Practices - VPN Performance 
sk118097 - MultiCore Support for IPsec VPN in R80.10 and above 

Tip 11 - MultiCore Support for SSL

Introduced in R77.20, SSL MultiCore feature improves SSL performance of Security Gateway. SSL MultiCore feature is based on Check Point CoreXL technology, which enhances Security Gateway / VSX Gateway performance by enabling the CPU processing cores to concurrently perform multiple tasks.

Tuning Tip: MultiCore SSL is enabled by default on R80.x gateways.

sk101223 - MultiCore Support for SSL in R77.20 and above 

Tip 12 - AES-NI

Intel‘s AES New Instructions AES-NI is a encryption instruction set that improves on the Advanced Encryption Standard (AES) algorithm and accelerates the encryption of data in many processor familys. Better throughput can be achieved by selecting a faster encryption algorithm. For a comparison of encryption algorithm speeds. Relative speeds of algorithms for IPsec and SSL. AES-NI is Intel's dedicated instruction set, which significantly improves the speed of Encrypt-Decrypt actions and allows one to increase AES throughput for: Site-to-Site VPN, Remote Access VPN, Mobile Access, HTTPS Interception

The general speed of the system depends on additional parameters. Check Point supports AES-NI on many appliances, only when running Gaia OS with 64-bit kernel. On these appliances AES-NI is enabled by default. AES-NI is also supported on Open Servers. Comprised of seven new instructions, AES-NI gives your environment faster, more affordable data protection and greater security.

Tuning Tip: Enable AES-NI in the BIOS.

sk73980 - Relative speeds of algorithms for IPsec and SSL R80.x - Performance Tuning Tip - AES-NI

 

Tip 13 - Firewall Priority Queues

Packets could be dropped by Firewall when CPU cores, on which Firewall runs, are fully utilized. Such packet loss might occur regardless of the connection's type (for example, local SSH or connection to Security Management Server server). The Firewall Priority Queues are disabled by default. The Priority Queues (PrioQ) mechanism is intended to prioritize part of the traffic, when we need to drop packets because the Security Gateway is stressed (CPU is fully utilized).

Tuning Tip: Use it depending on the situation.

sk105762 - Firewall Priority Queues in R77.30 / R80.10 and above 

Tip 14 - Multi-Queue

By default, each network interface has one traffic queue handled by one CPU. You cannot use more CPU cores for acceleration than the number of interfaces handling traffic. Multi-Queue lets you configure more than one traffic queue for each network interface. For each interface, more than one CPU core is used for acceleration. Multi-Queue is relevant only if SecureXL is enabled. Since R80.40 and R81 Multi Queue is enabled by default on all supported interfaces.

Tuning Tip: Enable multi-queueing on 10/40/100 Gbit/s interfaces.

Performance Tuning R80.30 Administration Guide – Multi-Queue R80.x - Performance Tuning Tip - Multi Queue
R81.x  - Multi Queue (what is new) 

 

Tip 15 - Dynamic Dispatcher

CoreXL is a performance-enhancing technology for Security Gateways on platforms with multiple CPU cores. CoreXL enhances Security Gateway performance by enabling the processing CPU cores to concurrently perform multiple tasks.

On a Security Gateway with CoreXL enabled, the Firewall kernel is replicated multiple times. Each replicated copy, or Firewall instance, runs on one processing CPU core. These Firewall instances handle traffic concurrently, and each Firewall instance is a complete and independent Firewall inspection kernel. When CoreXL is enabled, all the Firewall kernel instances in the Security Gateway process traffic through the same interfaces and apply the same security policy.

The CoreXL software architecture includes the Secure Network Distributor (SND). The SND is responsible for:

  • Processing incoming traffic from the network interfaces
  • Securely accelerating authorized packets (if SecureXL is running)
  • Distributing non-accelerated packets or Medium Path packets among CoreXL FW kernel instances - this functionality is also referred to as dispatcher

Traffic received on network interface cards (NICs) is directed to a processing core running the SND.

The dispatcher is executed when a packet should be forwarded to a CoreXL FW instance (in Slow path and Medium path - see sk98737 for details) and is in charge of selecting the CoreXL FW instance that will inspects the packet.

In R77.20 and lower versions, traffic distribution between CoreXL FW instances is statically based on Source IP addresses, Destination IP addresses, and the IP 'Protocol' type. Therefore, there are possible scenarios where one or more CoreXL FW instances would handle more connections, or perform more processing on the packets forwarded to them, than the other CoreXL FW instances.

This may lead to a situation, where the load is not balanced across the CPU cores, on which the CoreXL FW instances are running.

Tuning Tip: Use Dynamic Dispatcher depending on the situation.

sk105261 - CoreXL Dynamic Dispatcher in R77.30 / R80.10 and above 

Tip 16 - SMT (Hyper Threading)

Hyper Threading Technology is a form of Simultaneous Multithreading Technology (SMT) introduced by Intel. Architecturally, a processor with Hyper-Threading technology consists of two logical processors per core, each of which has its own processor architectural state. Each logical processor can be individually halted, interrupted or directed to execute a specified thread, independently from the other logical processor sharing the same physical core.

SMT (also called HyperThreading or HT) is a feature that is supported on Check Point appliances running Gaia OS. When enabled, SMT doubles the number of logical CPUs on the Security Gateway, which enhances physical processor utilization. When SMT is disabled, the number of logical CPUs equals the number of physical cores.

SMT improves performance up to 30% on NGFW software blades such as IPS, Application & URL Filtering and Threat Prevention by increasing the number of CoreXL FW instances based on the number of logical CPUs.

Tuning Tip: Enable SMT on appliances and disable SMT on open server.

sk93000 - SMT (HyperThreading) Feature Guide R80.x - Performance Tuning Tip - SMT (Hyper Threading)

 

Tip 17 - HTTPS Interception vs. SNI

With enabled HTTPS interception:

If the https interception is enabled, the parameter host from http header can be used for the url because the traffic is analyzed by active streaming. Check Point Active Streaming (CPAS) allow the changing of data, we play the role of “man in the middle”. CPAS breaks the connection into two parts using our own stack – this mean, we are responsible for all the stack work (dealing with options, retransmissions, timers etc.). An application is register to CPAS when a connection start and supply callbacks for event handler and read handler. CPAS breaks the HTTPS connection and others into two parts using our own stack – this mean, we are responsible for all the stack work (dealing with options, retransmissions, timers etc.) 

Without enabled HTTPS interception (SNI is used):

If the https interception is disabled, SNI is used to recognize the virtual URL for application control and url filtering. It is less resource intensive than HTTPS interception

Tuning Tip: Prefer SNI to HTTPS interception, if you only use application control and url filtering.

sk108202 - HTTPS Inspection 
URL Filtering using SNI for HTTPS websites.pdf
R80.20 - SNI vs. enabled HTTPS Interception 

 

Tip 18 - Network Interfaces and Server Hardware

Only use certified hardware for open server and network cards. Prevent network and packet  errors on the network cards.

Tuning Tip: Use supported hardware only and avoid network card issus.

HCL R80.x - Performance Tuning Tip - Intel Hardware

 

Tip 19 - Interface Interface

 

RX-ERR: Should be zero.  Caused by cabling problem, electrical interference, or a bad port.  Examples: framing errors, short frames/runts, late collisions caused by duplex mismatch.
Tuning Tip:  First and easy check duplex mismatch 

RX-OVR: Should be zero.  Overrun in NIC hardware buffering.  Solved by using a higher-speed NIC, bonding multiple interfaces, or enabling Ethernet Flow Control (controversial). 

Tuning Tip:  Use higher speed NIC's or bond interfaces

RX-DRP: Should be less than 0.1% of RX-OK.  Caused by a network ring buffer overflow in the Gaia kernel due to the inability of SoftIRQ to empty the ring buffer fast enough.  Solved by allocating more SND/IRQ cores in CoreXL (always the first step), enabling Multi-Queue, or as a last resort increasing the ring buffer size.

Tuning Tip:  Use more SND/IRQ cores in CoreXL

sk61962 - SMP IRQ Affinity on Check Point Security Gateway 
sk33250 - Automatic SIM Affinity on Multi-Core CPU Systems 
Performance Tuning R80.30 Administration Guide – Multi-Queue
R80.x - Performance Tuning Tip - Multi Queue


 

Tip 20 - Interface (Heavy Connections)

n computer networking, an elephant flow (heavy connection) is an extremely large in total bytes continuous flow set up by a TCP or other protocol flow measured over a network link. Elephant flows, though not numerous, can occupy a disproportionate share of the total bandwidth over a period of time.  When the observations were made that a small number of flows carry the majority of Internet traffic and the remainder consists of a large number of flows that carry very little Internet traffic (mice flows).

All packets associated with that elephant flow must be handled by the same firewall worker core (CoreXL instance). Packets could be dropped by Firewall when CPU cores, on which Firewall runs, are fully utilized. Such packet loss might occur regardless of the connection's type.

What typically produces heavy connections:

  • System backups
  • Database backups
  • VMWare sync.

Evaluation of heavy connections (epehant flows)

A first indication is a high CPU load on a core if all other cores have a normal CPU load. This can be displayed very nicely with "top". Ok, now a core has 100% CPU usage. What can we do now? For this there is a SK105762 to activate "Firewall Priority Queues".  This feature allows the administrator to monitor the heavy connections that consume the most CPU resources without interrupting the normal operation of the Firewall. After enabling this feature, the relevant information is available in CPView Utility. The system saves heavy connection data for the last 24 hours and CPDiag has a matching collector which uploads this data for diagnosis purposes.

Heavy connection flow system definition on Check Point gateways:

  • Specific instance CPU is over 60%
  • Suspected connection lasts more than 10s
  • Suspected connection utilizes more than 50% of the total work the instance does. In other words, connection CPU utilization must be > 30%  

Tuning Tip: Check for heavy connections on the situation

sk105762 - Firewall Priority Queues in R77.30 / R80.10 and above

R80.x - Performance Tuning Tip - Elephant Flows (Heavy Connections) 

 

Tip 21 - KMFW vs. UMFW

In “Kernel Mode Firewall” KMFW, the maximum number of running cores is limited to 40 because of the Linux/Intel limitation of 2GB kernel memory, and because CoreXL architecture needs to load a large driver (~42MB) dozens of times (according to the CPU number, and up to 40 times). Newer platforms that contain more than 40 cores e.g., 23900 or open server are not fully utilized.

The solution of the problem is a firewall in the user mode of the Linux operating system.

USFW “User Space Firewall” or UMFW stands for “User Mode Firewall”, and it is based on proven VSX code. This mode was introduced in R80.10.

From a performance point of view I could not see any differences between UMFW and KMFW. I noticed that the process fwk0_dev_0 generates a very high CPU load in the UMFW. My guess as to the purpose of the fwk0_dev_0 is that it acts as the liaison between the multiple fwk firewall worker processes (fw instance thread that takes care for the packet processing) and the single fwmod kernel driver instance and the process for high priority cluster thread.

If you want to change the mode from UMFW to KMFW this can be done by changing the registry parameter FwIsUsermode by cpprod_util command. In UMFW the fw instances are threads of the fwk0_dev_0 so by default the top shows all the threads cpu utilization under the main thread. Top has the option to present the utilization per thread as well.

 

Tuning Tip: R80.10 to R80.30: With less then 35 cores use KMFW and with more then 35 cores use UMFW.

sk149973 - How to enable USFW (User-Space Firewall) on a 23900 appliance

R80.x - Performance Tuning Tip – User Mode Firewall vs. Kernel Mode Firewall

 

Tip 22 - BIOS

An interesting point, in performance tuning are BIOS settings. Here we have to distinguish whether we are talking about open servers or applications.

With Check Point appliances the BIOS settings are set correctly and we don't have to do anything. This article (sk120915)  provides the list of Check Point appliances and the available BIOS versions. If there are problems, the TAC can make settings on the appliance.

The situation is different with Open Server. Here the BIOS settings are described in the HCL's if necessary.

In principle, various BIOS settings can be performed on Open Server for the following points. The names of the settings may be different depending on the hardware and processor generation.

Here is an overview of the most important BIOS points:

  • Intel Turbo Boost Technology (old name Turbo Mode) 
  • Intel SpeedStep settings
  • Energy/Performance Bias:
    • Memory Speed
    • CPU Speed
  • Energiy saving settings
    • Minimum Processor Idle Power C-States
    • Minimum Processor Idle Power Package C-States
  • Hyperthreading (SMT) settings (It is only supported from R80.40 on open servers)
  • X2APIC Support
  • AES-NI Support

Tuning Tip: Enable the correct BIOS settings

 

R80.x - Performance Tuning Tip - BIOS

 
Tip 23 - Management Data Plane Separation

Management Data Plane Separation allows a security gateway to have isolated management and data networks. The network system of each domain (plane) is independent and includes interfaces, routes, sockets, and processes. This has the performance advantage that some processes run separately from the firewall core daemon. Thus it reduces the load on the firewall processes, e.g. during the policy installation.

The management plane is a domain whose purpose is to access, provision, and monitor the gateway. This includes:
      - Routing separation
      - Resource Separation
                 - Access:                         SSH, FTP, and more
                 - Provisioning:               Policy installation, GAIA Portal, RestAPI's, and more
                 - Monitoring:                  Logs, SNMP, and more

When resource separation is enabled, the security gateway will separat the management instance. Here is an example:

Mgmt
instance

CPU core 0

SND


CPU core 1

SND


CPU core 2

CoreXL
instance

CPU core 3

CoreXL
instance

CPU core 4

CoreXL
instance

CPU core 5

CoreXL
instance

CPU core 6

CoreXL
instance

CPU core 7


Tuning Tip: Enable MDPS if possible.

SK138672- Management Data Plane Separation R80.x - Performance Tuning Tip - Management Data Plane Separation  (R80.30 kernel 3.10 and JHF 136+)

 

Tip 24 - CPU Spike Detective

The CPU Spike Detective is a tool running only on Gaia OS 3.10 that monitors the system CPU usage and checks for CPU utilization spikes. This tool is introduced starting from R80.40 JHF 69.

How does the spike detective work:

A spike in a CPU core utilization is considered when these conditions are met:

- CPU utilization is over 80% (this threshold is configurable)
- CPU utilization of the specific CPU core is at least 1.5 times higher than the entire system average usage (this threshold is configurable).

This ensures that a highly utilized system (for example, during a performance testing) will not detect all CPU cores as "spiked".

A thread/process is considered as "spiked" if it meets the below conditions:
- Running on a "spiked" CPU core
- Utilization is over 70% (this threshold is configurable)
- Utilization is at least 1.5 times higher than the system average (this threshold is configurable)

Tuning Tip: Use CPU Spike Detective to monitors the system CPU usage.

SK166454 - CPU Spike Detective
R80.x - Performance Tuning Tip - CPU Spike Detective  (R80.40 JHF69+)

 


➜ CCSM Elite, CCME, CCTE
(6)
1 Solution

Accepted Solutions
HeikoAnkenbrand
Champion Champion
Champion

More read here:

R80.40 - Dynamic split of CoreXL 

 


➜ CCSM Elite, CCME, CCTE

View solution in original post

(1)
45 Replies
Harry_Morgan
Contributor

Very interesting overview.

Do you have informations about network issues and RX errors?

Timothy_Hall
Champion
Champion

I call the three RX error counters the "Dark Triad" in my Max Power book, more detailed error counters per interface can be accessed by running ethtool -S (interface) from expert mode.  To quickly summarize:

  • RX-ERR: Should be zero.  Caused by cabling problem, electrical interference, or a bad port.  Examples: framing errors, short frames/runts, late collisions caused by duplex mismatch.
  • RX-OVR: Should be zero.  Overrun in NIC hardware buffering.  Solved by using a higher-speed NIC, bonding multiple interfaces, or enabling Ethernet Flow Control (controversial). 
  • RX-DRP: Should be less than 0.1% of RX-OK.  Caused by a network ring buffer overflow in the Gaia kernel due to the inability of SoftIRQ to empty the ring buffer fast enough.  Solved by allocating more SND/IRQ cores in CoreXL (always the first step), enabling Multi-Queue, or as a last resort increasing the ring buffer size.

 

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
HeikoAnkenbrand
Champion Champion
Champion

Thanks @Timothy_Hall,

I'm adding this to the article.


➜ CCSM Elite, CCME, CCTE
Hugo_vd_Kooij
Advisor

As long as RX-DRP is growing you have a performance issue you need to tackle.

It can be as simple as upping the RX-Ringbuffers to 1024 on machine you have upgrade over and over again. Those defaults used to be rather small in the past. An in-place upgrade will not change them.

Other steps might be more complicated and requie a good understanding of the network and the traffic through the firewall.

<< We make miracles happen while you wait. The impossible jobs take just a wee bit longer. >>
Timothy_Hall
Champion
Champion

Maybe, as long as RX-DRP is less than 0.1% of RX-OK generally you don't need to do any tuning.  If piling up >0.1% RX-DRP the general steps to follow are:

1) Allocate more SND/IRQ cores

2) Enable Multi-Queue

3) Increasing ring buffer size is usually a last resort, and probably indicates an under-powered firewall

 

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
Diego_Salazar
Employee Alumnus
Employee Alumnus
Good evening Timothy.

Any reason why increasing the buffer size would be last resort? would this lead to other issues? or secondary effects?
I have a customer with massive RX-drops and I recommended the increase to the maximum allowed by hardware, 4096 according to ethtool.
Timothy_Hall
Champion
Champion

RX-DRPs are caused when a network interface's ring buffer has frames coming into it from the NIC faster than the Gaia SoftIRQ routine can empty it.  The real problem is not enough CPU resources are available to empty the ring buffer fast enough. Increasing the ring buffer size incurs more overhead for ring buffer processing, increases jitter, and under full load can cause a nasty effect called Bufferbloat. I always noticed that cranking the ring buffer size, while it reduced RX-DRPs, would cause what I could only quantify as a "choppiness" in heavy network traffic flows.  Example: FTP/HTTP download speeds bounce around constantly and TCP never seems to settle down into a constant rate.  I finally ran across the Bufferbloat term when researching my first book and realized this was the term describing the choppiness I was seeing.

A RX-DRP rate of <0.1% is acceptable and does not require any tuning.

The correct way to remediate this when using the Gaia 2.6.18 kernel is:

1) Reduce number of kernel instances defined in cpconfig, thus allocating more SND/IRQ cores.

2) If sim affinity -l shows that the problematic interface has been allocated its very own SND/IRQ core (it is not sharing it with any other interfaces) yet RX-DRP is still >0.1% enable Multi-Queue on the interface.  This will tend to happen around 4-5 Gbps on a 10Gbps interface.

3) If even with Multi-Queue enabled RX-DRPs are still >0.1%, allocate more SND/IRQ cores.  If this cannot be done without reducing the number of firewall worker instances to a level that would cause them to be overwhelmed, then and only then should you increase ring buffer size.

On Gaia kernel 3.10 Multi-Queue is enabled by default on all interfaces except the management interface, so only step 3 would normally be needed.

One other corner case: each ring buffer slot is sized to hold one Ethernet frame employing the standard 1500 MTU.  If Jumbo frames are in use, a single 9000 byte frame will take up 6 ring buffer slots!  So watch out for that...

 

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
Harald_Hansen
Advisor
Advisor

I'm a bit late to the party, though CP has some notes regarding interface counters in GAiA with 3.10 kernel in sk166424.

This change has cost a lot of troubleshooing hours, since they now count these factors into RX-drop as well:

  • Bad VLAN tags
  • Packets received with unknown or unregistered protocols
  • IPv6 frames when the server is configured only for ipv4

 

0 Kudos
Danny
Champion Champion
Champion

It‘s noteworthy that Check Point introduces some auto-tuning in R80.40

0 Kudos
Timothy_Hall
Champion
Champion

Yep I've referred to this new R80.40 feature unofficially as "Dynamic Core Split Adjustment", but it is also worth noting R80.30 with Gaia kernel 3.10 has Multi-Queue automatically enabled on all interfaces that support it except for the management interface.  This will definitely help keep individual SND/IRQ cores from getting overloaded.

 

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
HeikoAnkenbrand
Champion Champion
Champion

More read here:

R80.40 - Dynamic split of CoreXL 

 


➜ CCSM Elite, CCME, CCTE
(1)
Tsvika_Gilman
Contributor

Can you tell me exactly how the Dynamic Dispatcher works?

0 Kudos
Timothy_Hall
Champion
Champion

It helps balance the load on the Firewall Worker cores (kernel instances) by directing new connections to the least loaded worker, and replaces a simple hash function that was the default in R77.30 and earlier.  Dynamic Dispatcher is enabled by default in R80.10+ and you definitely want to leave it on, although it is somewhat limited in its ability to deal with elephant flows/heavy connections.  More info:

sk105261: CoreXL Dynamic Dispatcher in R77.30 / R80.10 and above

 

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
James_T_Kirk
Participant

Do you have information here?

0 Kudos
rolf
Participant

 

Will „Dynamic split of CoreXL“ already be available with a JHF below R80.30?

Hugo_vd_Kooij
Advisor

Don't expect a Jumbo Hotfix to change architecture.

Which means: No

 

<< We make miracles happen while you wait. The impossible jobs take just a wee bit longer. >>
HeikoAnkenbrand
Champion Champion
Champion

Add elephant flow (heavy connection!


➜ CCSM Elite, CCME, CCTE
Holm_Klein
Explorer

Great job @HeikoAnkenbrand

0 Kudos
Niroyec_Yerusha
Participant

👍

0 Kudos
HeikoAnkenbrand
Champion Champion
Champion

Added KMFW vs. UMFW


➜ CCSM Elite, CCME, CCTE
Ilan_Missalla
Participant

Great work!

0 Kudos
HeikoAnkenbrand
Champion Champion
Champion

UMFW/ USFW update


➜ CCSM Elite, CCME, CCTE
HeikoAnkenbrand
Champion Champion
Champion

Update: Tip 21 - KMFW vs. UMFW


➜ CCSM Elite, CCME, CCTE
ReinerS
Participant

That's an interesting overview. But there are a few more points like BIOS settings and other parameters.

Timothy_Hall
Champion
Champion

You aren't allowed to change BIOS settings on Check Point appliances, however for open hardware BIOS tuning was covered in the second and third editions of my book:

 

Click to Expand

Firewall Open Hardware BIOS Tuning

If utilizing open hardware instead of a Check Point appliance for your firewall, it is important to make several BIOS configuration adjustments to ensure optimal performance. On the other hand, all Check Point firewall appliances have already had their BIOS settings tuned appropriately at the factory; only Check Point TAC knows the BIOS password for the appliances anyway and they will not disclose it. Updating the BIOS of a Check Point firewall appliance is extremely rare (and only Check Point TAC can actually perform the upgrade), but the following SK reference is included for sake of completeness: sk120915: Check Point Appliances BIOS Firmware versions map.

 

If using an open hardware firewall, it may be possible to change the number of cores per CPU that are presented to the Gaia hardware via the BIOS. In the server’s BIOS, configure an appropriate number of cores per CPU such that the total number of cores presented by the underlying hardware matches the total number of licensed cores. For more information about how this setting impacts Check Point CoreXL licensing, see section “The Trial License Core Crunch” in Chapter 6.

 

The following are general recommendations for open hardware firewalls; specific BIOS setting names will vary slightly between hardware vendors, so you may need to do a little research and exploration to find the relevant settings to adjust. If using open hardware for the firewall that has multiple power supplies, a power supply problem can actually cause system CPUs to be placed in a disabled state which is absolutely disastrous from a performance perspective. See the following SK for an explanation of this rare but nasty corner case: sk103348: Output of 'cat /proc/cpuinfo' command and of 'top' command show only one CPU core on multi-processor Open Server.

In general any BIOS setting that permits the CPU frequency to vary from its base processor speed (either faster or slower) should be disabled. While the Gaia operating system itself performs just fine with varying CPU frequencies, portions of the Check Point firewall code such as the Secure Network Distributor (SND), Dynamic Dispatcher, and ClusterXL assume that all cores are equal at all times in regards to clock speed and processing power.

 

If firewall CPU clock speeds vary in the slightest these features will perform in a sub-optimal fashion; CPU clock speed adjustments can take many forms but here is a sampling of those that should be disabled on open hardware firewalls:

 

  • Intel Turbo Boost/Turbo Mode

  • Intel SpeedStep

  • Energy Saving: P-States, C-States, & HPC Optimizations

  • SMT/Hyperthreading (only supported on Check Point firewall appliances)

  • IO Non Posted Prefetching

  • X2APIC Support

  • Dynamic Power Capping Function

  • Intel Virtualization Technology **

Other relevant BIOS settings to check:

  • CPU and Memory Speed: Maximum Performance

  • Memory Channel Mode: Independent

  • Thermal/Fan Mode: Maximum Performance

  • AES-NI Support: Enable

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
Joachim_Zint1
Employee Employee
Employee
Wow, great overview!
0 Kudos
yilmac_g
Participant
TomTom
Participant

👍

0 Kudos
HeikoAnkenbrand
Champion Champion
Champion

Now with R80.40 updates.


➜ CCSM Elite, CCME, CCTE

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events