- CheckMates
- :
- Products
- :
- Quantum
- :
- Security Gateways
- :
- Re: R80.40 100% CPU
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Are you a member of CheckMates?
×- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
R80.40 100% CPU
Hi mates
Does someone meet performance issue after upgrade to R80.40?
At my 2 customers I faced out the same performance issue when I upgraded the cluster from R80.xx to R80.40 last take.
For both I had to downgrade to the previous version because critical environments where I cannot wait for TAC investigation.
this is the reason why I'm sharing my findings
In both cases I see the active cluster member suddenly has more CPUs 100% usage.
When it happens the gateway is unresponsive and the TOP output shows high usage for "watchdog" daemons.
Reverting to previous version, the performance on the gateway is as expected.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @ggiordano,
You may be able to share the following information:
top (press 1)
fwaccel stats -s
fw ctl affinity -l
cpwd_admin list
more /var/log/messages | grep -B 2 -A 5 error
cpinfo -y all
Open Server or appliance?
PS:
I have also running many CusterXL with R80.40 without problems.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Firstly a few questions.
What hardware are you upgrading?
What version are you upgrading from?
also, when you say ‘watchdog’ daemons - are you referring to any of the daemons monitored by watchdog? Or are you referring to ‘cpwd’ running at 100%?
any other log files collected you could share?
Seems suspicious either way. I’ve upgraded countless clusters to R80.40 without a hitch.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
the upgrade was performed from R80.10.
in a case the cluster is based on 15600 appliances and the other case the cluster is based on 5600 appliances.
TheTOP output, when I meet the issue, I saw 2 "watchdog" processes are 100%
Unfortunately I didn't get any log files.
The messages log file showed errors about GNAT isn't able to de-allocate resources. This issue was mitigated disabling the GNAT feature, but it didn't fix the issue
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @ggiordano,
You may be able to share the following information:
top (press 1)
fwaccel stats -s
fw ctl affinity -l
cpwd_admin list
more /var/log/messages | grep -B 2 -A 5 error
cpinfo -y all
Open Server or appliance?
PS:
I have also running many CusterXL with R80.40 without problems.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
unfortunately I cannot provide the output because I downgraded the cluster to R80.30 because the business impact was very high
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I also have watchdog on 100% on 80.40 take91 sometimes
It looks like we have a CIFS connection with high load in IPS. But why watchdog is on 100%?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Wow that is strange that the watchdog is eating CPU like that, and at real-time priority no less which will constantly kick other processes (like fw_worker_X) off the CPU and cause astronomical amounts of context switching thus degrading performance. The watchdog is a Gaia/Linux program (not Check Point product code) that ensures the system has not hung by running a series of sanity tests and writing to the /dev/watchdog file (called "kicking the watchdog") at least once a minute. If it fails to perform this write in a timely fashion the system is assumed to be hung, the watchdog barks then bites which forcibly reboots the system from the hardware level.
What kind of hardware are you using? Can't see why watchdog would need so much CPU unless there is some kind of hardware issue, you should get TAC involved on this pronto as the watchdog is most certainly NOT a process you want to have issues with, as it can affect the stability of the system or even the ability to gracefully recover from a hard hang. If a system is hard hung and the watchdog does not bark then bite to reset it, the only recourse is physically pulling the power plug.
Perhaps the watchdog is chasing its tail after someone spiked its water dish with Red Bull or something. 🙂
Exclusively at CPX 2025 Las Vegas Tuesday Feb 25th @ 1:00pm
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for this answer.
It's an OpenServer ("IBM System x3650 M4: -[7915J6G]-").
TAC is already involved. I have send TAC this discussion 😉
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I figured you must have been on open hardware; would be very surprised to see this kind of watchdog issue on a Check Point gateway appliance.
Exclusively at CPX 2025 Las Vegas Tuesday Feb 25th @ 1:00pm
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The original post was from ggiordano. He has been using appliances.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am experiencing this same challenge with watchdog eating up 100% after upgrading my 23800 HA setup from 80.40 to 81.10.
The active member always have this intermittent SIC connections to the Management Server. Once we failover to the standby member as active, the same SIC issue begins to manifest.
we only push policies to the standby member independently then failover to the other member to push the policy package as well.
its painstaking.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Fameen I suggest you start a new discussion about your issue. The version and symptoms are quite different from this post.
