- CheckMates
- :
- Products
- :
- Quantum
- :
- Security Gateways
- :
- Re: Question regarding RMA a firewall that diagnos...
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Are you a member of CheckMates?
×- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Question regarding RMA a firewall that diagnostic tool show all OK
Hi all,
Recently, I had a client faced an issue whereby one of their FULL HA firewalls suddenly hangs and this issue happened more than 2 times within few months.
The 1st hang is on around 30 May 2023 and the firewall is working fine as usual after reboot. Meanwhile, we do hardware diagnostic via command "diagMain" and found out the diagnostic result show OK. With that, we opened a case with TAC on this and TAC suggested to install the latest recommended Jumbo Hotfix (take 197) as it resolves some memnory related issues. After that, we installed the Jumbo Hotfix take 197 on both FULL HA firewall member successfully on 22 July 2023.
The 2nd hang is on 1/8/2023 and this hang happened on the same firewall when it acts as active firewall and management (since 22 July 2023). The firewall is booted up after a reboot and the diagnostic result (via command diagMain) also show all OK. However, we monitor it more than 30 minutes and aware that its CPU utilization is inconsistent (somehow will reach more than 100% for Java process). Moreover, we tried to move the active management
Hence, I would like seek all of your advice whereby is this hang happened more than 2 times within these few months can become the reason to do RMA?
Thank you.
- Tags:
- rma
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear All,
Latest update here whereby the firewalls are working after RMA (round 3 months).
Hence, I believe RMA is the solution for this kind of issue.
Please give comments if you guys have any.
Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This certainly warrants TAC case for further investigation.
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you said so, I will open a case with TAC case for further investigation and hopefully can get RMA for that unit.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Well, its simply my logical suggestion, based on all the details you had given.
Regards,
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yea @the_rock, I understand that it is your suggestion and I'm sorry on my wording in previous post.
So far, I already open a case with TAC on this and one thing that out of my expectation is that the problematic firewall's CPU came back to normal.
However, it is not a good things as we also hard to identify the cause of the hang issue.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Maybe have a look at below...TAC gave this to one of my colleagues yesterday for somewhat similar issue.
Andy
Something to try anyway...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks about that and i try to look into it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
SK actually tells you what to look for, so if those errors match, its definitely related.
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Forgot to mention, TAC, Im fairly sure of this, will ask you to run hardware diag tool, so be prepared for that.
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I ran the hardware diagnostics during that moment (after reboot firewall when the hang issue happens) and the result shows all component are OK which is out of my expectation.
Meanwhile, do you have any idea if the disk test in that hardware diagnostics able to test on SSD as that firewall is using SSD.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
About SSD, not 100% certain, you may need to verify with TAC. Unless it shows it clearly in hardware diag tool results.
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your reply. I will ask TAC about this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Which management blades are enabled here?
Also is there a reason that Management isn't deployed separately for this cluster?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Chris, the managment server part just enable normal management server blade, and no other blades enable (e.g. smartevent, endpoint management server).
This FULL HA setup is being built for long time ago and doing distributed will cause customer need to bare another security managemen server license, although I understand that distributed mode is recommended.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I assume you are running R80.40? This version will be EOL at the start of 2024.
Are you sure it is hardware related? It still can be software that causes the crash.
I would recommend to update the firewalls anyway, R81.20 is now recommended for installation.
Second feedback is that the active gateway is running the fw part and mgmt part.
Why not move the mgmt part over to the second backup firewall?
This could help with the load of the system.
If you like this post please give a thumbs up(kudo)! 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Lesley. Thanks for your feedback on this and I have read through your suggestion.
We can't simply help customer upgrade to R81.20 due to some certain reasons.
I understand that sometime software will cause crash, but I have no idea to find out that software causes it.
Yes, the firewall member 1 have the active on management and firewall.
Last but not least, there is another weird things which I not really think that it causes this issue whereby the hang issue happened few hours after we have done the remote backup and schedule backup setting.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If using Management HA, active node should be GW and standby the active management for performance reasons.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This does not sound like a HW issue but like a performance issue. I would advise you to investigate that further.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @_Val_ , I'm working with TAC on this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all, it is unfortunate that TAC didn't find anything related to the issue from those files given (CPU spike detective, HCP report, and cpinfo file).
Hence, need to monitor and wait until the issue is happening again in future.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would definitely consider SK I gave you, as we had similar issue with customer recently.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @the_rock , Unfortunately, I checked the cpinfo of the problematic firewall via diagnosticsview and did not get any symptoms that mentioned inside the SK180437.
Lets wait and see if the issue still happened again as I swing the active firewall to the problematic firewall.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Out of my experience: A customer had a Cluster (not Management Haha) with one node showing an issue in certain time intervals. HW diag was OK, memory leak procedure and other suggestions from TAC did not give any clue. Finally, we managed to get a demo appliance as a replacement for the node with the issue that was put in production by the customer. After the issue did not reoccur for a longer time, CP gave RMA for the faulty appliance at last...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Understood @LeeBingKang . Please keep us posted how it goes.
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All
So far what I have done is fresh install and backup restore on the problematic machine.
The reason of this action: My team and I suspect there may have some files which are having issue in the GAIA OS.
Why we suspect that : The other member should be facing the same issue (Hang issue) if the issue is on the database, as Fullsync will make sure both members have the same database. However, the other member is working fine even it becomes active management and firewall.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thats fait assesment, BUT...here is the question. Backup you restored, when was it taken? Was FW working properly when that backup was generated?
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When the issue occur, all services (management and firewall) go to member 2. After reboot member 1, it is working fine and we failover the firewall to member 1 to see if the issue able replicate (unfortunately, can't replicate it).
With that, i failover the firewall from member 1 to member 2, and extract backup after the failover. After that, I use that backup to do restore after fresh install.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi guys,
The hang issue happened again, and this round CheckPoint support really proceed to RMA on that box. I will update at here again once got any actions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thats good news, hopefully RMA solves the issue...fingers crossed.
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yea, hopefully...
Meanwhile, I will accept RMA as the solution once everything still working fine after replaced with RMA unit around 1 month.
