Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
BigHec
Contributor

CPU Spike due to "fw_full" and "unknown" top consumer

Hi All,

I have a cluster gateway with R81.20, installed with the latest hotfixes (T26).

Recently I encountered that the CPU usage of the gateways will spike for a period of seconds. From the spike detector logs, we saw that the CPU spike is caused by the top consumer named "fw_full" and "unknown" for 5th and 6th November.

Active Gateway:Screenshot 2023-11-06 155229.png

 

Standby Gateway:

Screenshot 2023-11-06 155350.png

Anyone has any idea will caused these things happen?

 

Thank you very much. Appreciate it.

0 Kudos
8 Replies
Chris_Atkinson
Employee Employee
Employee

After checking to ensure debugs aren't enabled (sk172047) I would recommend engaging TAC to review the issue.

They will request more information from spike detective / cpinfo / cpview / HCP as relevant to isolating the problem further.

CCSM R77/R80/ELITE
0 Kudos
Jim_Holmes
Employee Alumnus
Employee Alumnus

When you open the TAC case, please have this information ready and upload it to the case. It will save time because you will be asked to do so. Do the cpinfos from all devices (Bad gateway, good gateway, management).

Aka, Chillyjim
0 Kudos
BigHec
Contributor

@Chris_Atkinson & @Jim_Holmes 

Thank you for the suggestion. Will try to disable the debug first and if the issue still persist then I will open a TAC case for this.

 

Appreciate it.

0 Kudos
BigHec
Contributor

@Jim_Holmes & @Chris_Atkinson 

I found out that in the fwd.elg file, there is an update process called "ciu_cmd_kss_commit_set_updater_cur_dir" running every 4 hours and it causes the spike of the CPU utilization. But it does not causes the spike everytime the process run, just sometimes the CPU will spike.

Screenshot 2023-11-07 165013.png

Do you guys have any idea on this process and what does this process do?

 

 

 

0 Kudos
Timothy_Hall
Legend Legend
Legend

Almost certainly this: sk174347: Software blade updates may cause single CPU spikes

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
PetterD
Collaborator

Hi!

Im experiencing a similar issue myself with R81.20 Take 26 and random "fw_full" spikes, causing VSX/VS Failover.
Would be interessting to hear if you were able to solve the issue 🙂

CCSM / CCSE / CCVS / CCTE
0 Kudos
BigHec
Contributor

Hi,

Previously I did opened a TAC case and the TAC is still in progress of investigating this issue. For what they mention is that this is an issue that is happening in the cluster environment of R81.20. In my case is that the CPU will spike every 4 hours, consistently on Standby member and lesser on Active member. For now TAC checked and comeback with only a SK mentioning the update of IPS/Application packages and are normal and expected behaviour for the CPU to spike. 

I need to know why does the Standby member spikes more compared to the Active member when the Active member is also handling the daily traffic.

No fix provided yet at the moment. Still in progress of investigating.

https://support.checkpoint.com/results/sk/sk174347 (SK given by TAC)

0 Kudos
RamGuy239
Advisor
Advisor

I had a similar issue with a customer, but it was never fully confirmed whether this was the root cause of our problem. With this customer, these spikes resulted in downtime. Sometimes the active member would just stall for a short time without triggering a failover, affecting all traffic passing the gateway, other times it would failover but then the standby member would stall for a short time causing the same issue with traffic passing the gateway.

All debugs pointed to software blade updates, causing the spikes when the stalling occurred. But we did receive a hotfix for it, but the issue continued even with the hotfix applied. Sadly, the troubleshooting with TAC took so long that we reverted from R81.20 to R81.10. The problems were with R81.20 - JHF Take 8, Take 10, Take 14 and Take 24, which was the latest JHF before we decided to revert. The issue has yet to happen on R81.10 for this customer.

 

Difficult to jump to any conclusions based on this. I wouldn't expect R81.10 and R81.20 to behave differently when updating software blades. But there might be other differences within the software, making the installation of this customer have issues when these updates occur when running R81.20 compared to running R81.10. Who knows.

Certifications: CCSA, CCSE, CCSM, CCSM ELITE, CCTA, CCTE, CCVS, CCME

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events