- Products
- Learn
- Local User Groups
- Partners
- More
Quantum Spark Management Unleashed!
Check Point Named Leader
2025 Gartner® Magic Quadrant™ for Hybrid Mesh Firewall
HTTPS Inspection
Help us to understand your needs better
CheckMates Go:
SharePoint CVEs and More!
Hi all,
Recently, I had a client faced an issue whereby one of their FULL HA firewalls suddenly hangs and this issue happened more than 2 times within few months.
The 1st hang is on around 30 May 2023 and the firewall is working fine as usual after reboot. Meanwhile, we do hardware diagnostic via command "diagMain" and found out the diagnostic result show OK. With that, we opened a case with TAC on this and TAC suggested to install the latest recommended Jumbo Hotfix (take 197) as it resolves some memnory related issues. After that, we installed the Jumbo Hotfix take 197 on both FULL HA firewall member successfully on 22 July 2023.
The 2nd hang is on 1/8/2023 and this hang happened on the same firewall when it acts as active firewall and management (since 22 July 2023). The firewall is booted up after a reboot and the diagnostic result (via command diagMain) also show all OK. However, we monitor it more than 30 minutes and aware that its CPU utilization is inconsistent (somehow will reach more than 100% for Java process). Moreover, we tried to move the active management
Hence, I would like seek all of your advice whereby is this hang happened more than 2 times within these few months can become the reason to do RMA?
Thank you.
Dear All,
Latest update here whereby the firewalls are working after RMA (round 3 months).
Hence, I believe RMA is the solution for this kind of issue.
Please give comments if you guys have any.
Thank you.
This certainly warrants TAC case for further investigation.
Andy
If you said so, I will open a case with TAC case for further investigation and hopefully can get RMA for that unit.
Well, its simply my logical suggestion, based on all the details you had given.
Regards,
Andy
Yea @the_rock, I understand that it is your suggestion and I'm sorry on my wording in previous post.
So far, I already open a case with TAC on this and one thing that out of my expectation is that the problematic firewall's CPU came back to normal.
However, it is not a good things as we also hard to identify the cause of the hang issue.
Maybe have a look at below...TAC gave this to one of my colleagues yesterday for somewhat similar issue.
Andy
Something to try anyway...
Thanks about that and i try to look into it.
SK actually tells you what to look for, so if those errors match, its definitely related.
Andy
Forgot to mention, TAC, Im fairly sure of this, will ask you to run hardware diag tool, so be prepared for that.
Andy
I ran the hardware diagnostics during that moment (after reboot firewall when the hang issue happens) and the result shows all component are OK which is out of my expectation.
Meanwhile, do you have any idea if the disk test in that hardware diagnostics able to test on SSD as that firewall is using SSD.
About SSD, not 100% certain, you may need to verify with TAC. Unless it shows it clearly in hardware diag tool results.
Andy
Thanks for your reply. I will ask TAC about this.
Which management blades are enabled here?
Also is there a reason that Management isn't deployed separately for this cluster?
Hi Chris, the managment server part just enable normal management server blade, and no other blades enable (e.g. smartevent, endpoint management server).
This FULL HA setup is being built for long time ago and doing distributed will cause customer need to bare another security managemen server license, although I understand that distributed mode is recommended.
I assume you are running R80.40? This version will be EOL at the start of 2024.
Are you sure it is hardware related? It still can be software that causes the crash.
I would recommend to update the firewalls anyway, R81.20 is now recommended for installation.
Second feedback is that the active gateway is running the fw part and mgmt part.
Why not move the mgmt part over to the second backup firewall?
This could help with the load of the system.
Hi Lesley. Thanks for your feedback on this and I have read through your suggestion.
We can't simply help customer upgrade to R81.20 due to some certain reasons.
I understand that sometime software will cause crash, but I have no idea to find out that software causes it.
Yes, the firewall member 1 have the active on management and firewall.
Last but not least, there is another weird things which I not really think that it causes this issue whereby the hang issue happened few hours after we have done the remote backup and schedule backup setting.
If using Management HA, active node should be GW and standby the active management for performance reasons.
This does not sound like a HW issue but like a performance issue. I would advise you to investigate that further.
Hi @_Val_ , I'm working with TAC on this.
Hi all, it is unfortunate that TAC didn't find anything related to the issue from those files given (CPU spike detective, HCP report, and cpinfo file).
Hence, need to monitor and wait until the issue is happening again in future.
I would definitely consider SK I gave you, as we had similar issue with customer recently.
Hi @the_rock , Unfortunately, I checked the cpinfo of the problematic firewall via diagnosticsview and did not get any symptoms that mentioned inside the SK180437.
Lets wait and see if the issue still happened again as I swing the active firewall to the problematic firewall.
Out of my experience: A customer had a Cluster (not Management Haha) with one node showing an issue in certain time intervals. HW diag was OK, memory leak procedure and other suggestions from TAC did not give any clue. Finally, we managed to get a demo appliance as a replacement for the node with the issue that was put in production by the customer. After the issue did not reoccur for a longer time, CP gave RMA for the faulty appliance at last...
Understood @LeeBingKang . Please keep us posted how it goes.
Andy
Hi All
So far what I have done is fresh install and backup restore on the problematic machine.
The reason of this action: My team and I suspect there may have some files which are having issue in the GAIA OS.
Why we suspect that : The other member should be facing the same issue (Hang issue) if the issue is on the database, as Fullsync will make sure both members have the same database. However, the other member is working fine even it becomes active management and firewall.
Thats fait assesment, BUT...here is the question. Backup you restored, when was it taken? Was FW working properly when that backup was generated?
Andy
When the issue occur, all services (management and firewall) go to member 2. After reboot member 1, it is working fine and we failover the firewall to member 1 to see if the issue able replicate (unfortunately, can't replicate it).
With that, i failover the firewall from member 1 to member 2, and extract backup after the failover. After that, I use that backup to do restore after fresh install.
Hi guys,
The hang issue happened again, and this round CheckPoint support really proceed to RMA on that box. I will update at here again once got any actions.
Thats good news, hopefully RMA solves the issue...fingers crossed.
Andy
Yea, hopefully...
Meanwhile, I will accept RMA as the solution once everything still working fine after replaced with RMA unit around 1 month.
Leaderboard
Epsum factorial non deposit quid pro quo hic escorol.
User | Count |
---|---|
19 | |
12 | |
8 | |
7 | |
7 | |
6 | |
6 | |
4 | |
4 | |
3 |
Thu 18 Sep 2025 @ 03:00 PM (CEST)
Bridge the Unmanaged Device Gap with Enterprise Browser - EMEAThu 18 Sep 2025 @ 02:00 PM (EDT)
Bridge the Unmanaged Device Gap with Enterprise Browser - AmericasMon 22 Sep 2025 @ 03:00 PM (CEST)
Defending Hyperconnected AI-Driven Networks with Hybrid Mesh Security EMEAMon 22 Sep 2025 @ 02:00 PM (EDT)
Defending Hyperconnected AI-Driven Networks with Hybrid Mesh Security AMERThu 18 Sep 2025 @ 03:00 PM (CEST)
Bridge the Unmanaged Device Gap with Enterprise Browser - EMEAThu 18 Sep 2025 @ 02:00 PM (EDT)
Bridge the Unmanaged Device Gap with Enterprise Browser - AmericasMon 22 Sep 2025 @ 03:00 PM (CEST)
Defending Hyperconnected AI-Driven Networks with Hybrid Mesh Security EMEAAbout CheckMates
Learn Check Point
Advanced Learning
YOU DESERVE THE BEST SECURITY