High CPU spike in GW during access policy installa...

0x41cipher

Recently, we are facing issues with high CPU spikes in the GWs during policy installations. Whenever, an access policy is installed, we observe that average CPU utilization on the active GW reaches 100% for a brief moment of time (2-5 seconds). Due to this, other processes which are running in the background are killed. Outages were caused in VPN tunnels because I assume VPND was restarted due to the spike.

We investigated whether this had anything to do with traffic, but even in lean hours, the spike was observed in the active GW when we installed an access policy. We also observed general Internet and RA VPN connectivity outages in addition to IPsec tunnel outages.

Core dumps were created for usim_x86 on the GW which I believe is a process responsible for SecureXL. This could have probably happened due to the high spike again. We have already opened a TAC case for this investigation and they are in the process of analyzing this.

I would like to know whether this high CPU spike in GW is usual during policy installation? Because, when I created a lab setup, even in that I observed 100% CPU utilization when a policy is installed. Is this normal expected behavior? Or am I going wrong somewhere?

We are using 2 9100 appliances SGWs in HA setup. Running R81.20 JHF Take 89.

Also, is it possible to control which processes are restarted when CPU spikes are encountered. I believe processes are killed due to a CPU race condition. As we are running critical services via IPsec tunnels, is it possible to assign a high priority to the VPND process and prevent it from being killed when a high CPU spike is observed?

Thanks in advance.

Chris_Atkinson

Strongly suggest troubleshooting this further with TAC.

Is your connection persistence keep or rematch?

If you switch from UPPAK to KPPAK does the issue persists?

CCSM R77/R80/ELITE

Timothy_Hall

A CPU and memory utilization spike is expected during a policy installation on the gateway, but it should not disrupt traffic other than a brief rise in latency. If you have a large number of CoreXL Firewall Worker Instances you may want to configure batching of policy installations to avoid such a large spike all at once: sk182653: How to install Security Policy on groups of CoreXL Firewall instances at the same time

usim core dumps are definitely not normal and are indicative of UPPAK being utilized on your Quantum Force 9100 which is the default. Do the usim core dump times correlate with policy installations? Processes should not be dying due to policy installations.

If you have a very large number of connections that need to be rematched (Connection Persistence gateway setting), that will cause a CPU spike which Chris mentioned. I'd try these:

1) Setting persistence to keep all and install policy twice, any better?

2) Disable UPPAK from cpconfig (requires reboot) and try policy installation again.

I will be discussing UPPAK extensively during my CPX Vegas speech.

Attend my 60-minute "Be your Own TAC: Part Deux" Presentation
Exclusively at CPX 2025 Las Vegas Tuesday Feb 25th @ 1:00pm

Jan_Kleinhans

Hello,

we have a similar issue. When we install policy, we see packet loses of UDP traffic after the installation finishes. This harms VPN tunnels and traffic like VoIP and Teams. We have a case open for this issue since >100 days. Some problems shall be resolved in T96, so you can try this. Also changing to KPPAK as Timothy Hall suggested will propably help.

We are using a 19200 cluster with VSX-

Regards,

Jan

Lesley

Let's start with basic health check,

what does hcp -r all gives back as feedback?

https://support.checkpoint.com/results/sk/sk171436

-------
If you like this post please give a thumbs up(kudo)! 🙂

Are you a member of CheckMates?

High CPU spike in GW during access policy installation