Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Nikolaos_Liakop
Explorer

CP latency during policy installation

Hello,

 

We are experiencing the following issue.

We have a pair of ClusterXL CP5000 series devices that are managed by an external SMS and have a dedicated sync interface.

During the policy deployment procedure we are experiencing high latency issues ie by having a ping running indefinitely from a LAN PC we observe that the RTT towards 8.8.8.8 from 40ms gets as high as 2000ms for quite a few packets (10-20 in total) and we are also experiencing a few packet drops (ICMP request timeouts) 2,3 in total. This behavior occurs either when we ping an Internet site (8.8.8.8) or any other internal subnet / VLAN that is routed by the CP cluster and only when the policy gets deployed.

Also by issuing a ping towards 8.8.8.8 from the firewall , we also get a packet loss only during the policy installation procedure.

When the policy has finished everything goes back to normal ie low RTTs and no packet drop at all.

During the policy installation since I get a lot of CUL (Cluster Under Load) start & stop notifications, the CPU gets as high as 100% which I think is something not to worry about since in many if not all of the policy deployments I have seen so far in my career the CPU gets high enough.

Also we have toggled with all of the connection persistence options (Keep all connections, rematch connections) and the behavior is still the same.

What else could we check ?

Regards

 

 

 

 

 

 

0 Kudos
16 Replies
Chris_Atkinson
Employee Employee
Employee

Hi Nikolaos,

What version & JHF is deployed, have you considered upgrading to R81 (sk169096: Accelerated Install Policy For Access Control Policy)?

How's the CPU load normally, can you also provide the output of "fwaccel stat" from Expert mode. 

 

CCSM R77/R80/ELITE
0 Kudos
Timothy_Hall
Legend Legend
Legend

Sounds like CPU saturation, the trick will be determining if it is on your SND or Worker/Instance cores.  ICMP always goes F2F so it has to go through both types of cores.  Policy installation causes heavy CPU on the workers, if you run top do they all max out during policy installation while SNDs are relatively idle?

Probably will need outputs of Super Seven and enabled_blades to comment further.

https://community.checkpoint.com/t5/Scripts/S7PAC-Super-Seven-Performance-Assessment-Commands/td-p/4...

 

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
Nikolaos_Liakop
Explorer

Take 125.

CPU seems to be ok during non-policy installation hours. I have considered doing a fwaccel off and retry to deploy the policy with SecureXL off and see what would be the impact during the process of policy installation. Also I thought maybe doubling the TX buffer of the nics. 

0 Kudos
Timothy_Hall
Legend Legend
Legend

I doubt disabling SecureXL will help improve performance during a policy load on that code version (the full automatic restart of SecureXL every time the policy is installed went away in R80.20), but it is worth a try.

 

Enlarging interface ring buffers is generally a last resort and can actually make thing worse, please provide Super Seven and enabled_blades output for further recommendations.  

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
Nikolaos_Liakop
Explorer

Thank you Timothy for the recommendation.

One question though: When do you want me to run the s7 script ?  During peaky hours , during policy installation ? The behavior seems to appear only during the policy deployment phase. 

0 Kudos
Timothy_Hall
Legend Legend
Legend

You can run it any time as long as the firewall has been active (i.e. not rebooted) for several days and has undergone several policy installations, the cumulative counters should provide enough info.  I may ask you to run it during the slow period if the results are inconclusive, but would prefer not to cause issues in your network with a policy install unless absolutely necessary.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
Nikolaos_Liakop
Explorer

0 Kudos
Timothy_Hall
Legend Legend
Legend

After looking through your outputs you must have a model 5100-5400 as there are only two cores present which results in a 2/2 CoreXL split with overlap between the cores.  Nothing major is jumping out at me other than the firewall is pretty busy with limited core resources.  Definitely do NOT recommend turning SecureXL off.

It would appear that the firewall is undersized for what you are attempting to do, please provide outputs from these commands run anytime:

enabled_blades

free -m

Depending on how many blades you have enabled more RAM *might* help assuming the model supports being upgraded, as policy installs are a very memory and CPU-intensive process.  The fact that setting "keep all connections" (which substantially reduces CPU load during policy install) did not appear to improve the issue implies a shortage of memory during this process.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
Nikolaos_Liakop
Explorer

[Expert@ΧΧΧΧΧ_CP1:0]# enabled_blades

fw vpn cvpn urlf av appi ips identityServer anti_bot ThreatEmulation mon

[Expert@ΧΧΧΧΧ_CP1:0]#

 

The device is CP5400

0 Kudos
Timothy_Hall
Legend Legend
Legend

Almost certainly a memory shortage based on how many blades you have enabled (and possibly CPU saturation during policy load), please provide output of free -m.  The 5400 comes with 8GB of RAM but can be upgraded to 32.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
PhongNN
Contributor

I encountered a case similar to yours. It took us almost a month to get R&D involved. After that we must to change a paramater 

CP_INSTALL_POLICY_MT_MODE 

The default value is 0, we changed to 2 with command:

cpprod_util FwSetParam CP_INSTALL_POLICY_MT_MODE 2

Everything returns to normal with this command. You can find in Jumbo Hotfix take 93 with PRJ-41619,
PMTR-87160

0 Kudos
Timothy_Hall
Legend Legend
Legend

Did R&D happen to explain what changing this parameter does?  I've never seen it before.  Does it enable a Multi-Threaded policy install mode?

 

Edit: Ah I see it now.  Interesting.

 

PRJ-41619,
PMTR-87160

Security Management

UPDATE: To reduce policy installation time in large environments (that have many instances), policy can be installed in batches.

  • Each batch contains several instances that install the policy at the current iteration. By default, the batch size is set to "0" (off).

  • To enable it, run a CLI command "cpprod_util FwSetParam CP_INSTALL_POLICY_MT_LIMIT val" and set the value >0.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
Sergei_Shir
Employee
Employee

This parameter "CP_INSTALL_POLICY_MT_MODE " is described in https://support.checkpoint.com/results/sk/sk182653

Tobias_Moritz
Advisor

@Sergei_Shir: Thank you for that sk!

However, there is a discrepancy between the JHF Release Notes and the sk regarding the default setting 0.

The JHF Release Notes say:

 

  • When set to "0": the feature is disabled, all non-global instances will be included in the batch.

The SK says:

  • 0 is the default value. Security Gateway installs the policy on groups of CoreXL Firewall instances, where each group contains from 50% of all CoreXL Firewall instances to a maximum of 35 CoreXL Firewall instances.

That is not the same.

Can you explain how it really works with default setting 0? And maybe update documentation so that it matches?

Thank you!

 

0 Kudos
Sergei_Shir
Employee
Employee

The SK contains the correct explanation.

(1)

0 is the default value. Security Gateway installs the policy on groups of CoreXL Firewall instances, where each group contains from 50% (Total Number / 2) of all CoreXL Firewall instances to a maximum of 35 CoreXL Firewall instances.

Meaning:

If there are 10 instances, then the default group size will be 5 (because 10 / 2 < 35)

If there are 20 instances, then the default group size will be 10 (because 20 / 2 < 35)

If there are 40 instances, then the default group size will be 20 (because 40 / 2 < 35)

If there are 72 instances, then the default 3 groups will be 35 + 35 + 2 (because 72 / 2 > 35)

If there are 100 instances, then the default 3 groups will be 35 + 35 + 30 (because 100 / 2 > 35)

(2)

If you configure a non-zero group size value, then:

If the group size value is 4, and there are 40 instances, then the 10 groups will be 4 + 4 + ... + 4

If the group size value is 10, and there are 40 instances, then the 4 groups will be 10 + 10 + 10 + 10

If the group size value is 50, and there are 100 instances, then the 3 groups will be 35 + 35 + 30

Tobias_Moritz
Advisor

Thank you!

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events