Recently, we are facing issues with high CPU spikes in the GWs during policy installations. Whenever, an access policy is installed, we observe that average CPU utilization on the active GW reaches 100% for a brief moment of time (2-5 seconds). Due to this, other processes which are running in the background are killed. Outages were caused in VPN tunnels because I assume VPND was restarted due to the spike.
We investigated whether this had anything to do with traffic, but even in lean hours, the spike was observed in the active GW when we installed an access policy. We also observed general Internet and RA VPN connectivity outages in addition to IPsec tunnel outages.
Core dumps were created for usim_x86 on the GW which I believe is a process responsible for SecureXL. This could have probably happened due to the high spike again. We have already opened a TAC case for this investigation and they are in the process of analyzing this.
I would like to know whether this high CPU spike in GW is usual during policy installation? Because, when I created a lab setup, even in that I observed 100% CPU utilization when a policy is installed. Is this normal expected behavior? Or am I going wrong somewhere?
We are using 2 9100 appliances SGWs in HA setup. Running R81.20 JHF Take 89.
Also, is it possible to control which processes are restarted when CPU spikes are encountered. I believe processes are killed due to a CPU race condition. As we are running critical services via IPsec tunnels, is it possible to assign a high priority to the VPND process and prevent it from being killed when a high CPU spike is observed?
Thanks in advance.