Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
LeeBingKang
Advisor
Jump to solution

Question regarding RMA a firewall that diagnostic tool show all OK

Hi all,

 

Recently, I had a client faced an issue whereby one of their FULL HA firewalls suddenly hangs and this issue happened more than 2 times within few months.

 

The 1st hang is on around 30 May 2023 and the firewall is working fine as usual after reboot. Meanwhile, we do hardware diagnostic via command "diagMain" and found out the diagnostic result show OK. With that, we opened a case with TAC on this and TAC suggested to install the latest recommended Jumbo Hotfix (take 197) as it resolves some memnory related issues. After that, we installed the Jumbo Hotfix take 197 on both FULL HA firewall member successfully on 22 July 2023.

 

The 2nd hang is on 1/8/2023 and this hang happened on the same firewall when it acts as active firewall and management (since 22 July 2023). The firewall is booted up after a reboot and the diagnostic result (via command diagMain) also show all OK. However, we monitor it more than 30 minutes and aware that its CPU utilization is inconsistent (somehow will reach more than 100% for Java process). Moreover, we tried to move the active management 

 

Hence, I would like seek all of your advice whereby is this hang happened more than 2 times within these few months can become the reason to do RMA?

 

Thank you.

 

 

 

0 Kudos
1 Solution

Accepted Solutions
LeeBingKang
Advisor

Dear All,

 

Latest update here whereby the firewalls are working after RMA (round 3 months).

 

Hence, I believe RMA is the solution for this kind of issue.

 

Please give comments if you guys have any.

 

Thank you.

View solution in original post

41 Replies
the_rock
Legend
Legend

This certainly warrants TAC case for further investigation.

Andy

0 Kudos
LeeBingKang
Advisor

If you said so, I will open a case with TAC case for further investigation and hopefully can get RMA for that unit.

the_rock
Legend
Legend

Well, its simply my logical suggestion, based on all the details you had given.

Regards,

Andy

0 Kudos
LeeBingKang
Advisor

Yea @the_rock, I understand that it is your suggestion and I'm sorry on my wording in previous post.

So far, I already open a case with TAC on this and one thing that out of my expectation is that the problematic firewall's CPU came back to normal.

However, it is not a good things as we also hard to identify the cause of the hang issue.

0 Kudos
the_rock
Legend
Legend

Maybe have a look at below...TAC gave this to one of my colleagues yesterday for somewhat similar issue.

Andy

 

Something to try anyway...

https://support.checkpoint.com/results/sk/sk180437

0 Kudos
LeeBingKang
Advisor

Thanks about that and i try to look into it.

0 Kudos
the_rock
Legend
Legend

SK actually tells you what to look for, so if those errors match, its definitely related.

Andy

0 Kudos
the_rock
Legend
Legend

Forgot to mention, TAC, Im fairly sure of this, will ask you to run hardware diag tool, so be prepared for that.

Andy

https://support.checkpoint.com/results/sk/sk97251

0 Kudos
LeeBingKang
Advisor

I ran the hardware diagnostics during that moment (after reboot firewall when the hang issue happens) and the result shows all component are OK which is out of my expectation.

Meanwhile, do you have any idea if the disk test in that hardware diagnostics able to test on SSD as that firewall is using SSD.

0 Kudos
the_rock
Legend
Legend

About SSD, not 100% certain, you may need to verify with TAC. Unless it shows it clearly in hardware diag tool results.

Andy

0 Kudos
LeeBingKang
Advisor

Thanks for your reply.  I will ask TAC about this.

Chris_Atkinson
Employee Employee
Employee

Which management blades are enabled here?

Also is there a reason that Management isn't deployed separately for this cluster?

CCSM R77/R80/ELITE
0 Kudos
LeeBingKang
Advisor

Hi Chris, the managment server part just enable normal management server blade, and no other blades enable (e.g. smartevent, endpoint management server).

 

This FULL HA setup is being built for long time ago and doing distributed will cause customer need to bare another security managemen server license, although I understand that distributed mode is recommended.

0 Kudos
Lesley
Leader Leader
Leader

I assume you are running R80.40? This version will be EOL at the start of 2024. 

Are you sure it is hardware related? It still can be software that causes the crash. 

I would recommend to update the firewalls anyway, R81.20 is now recommended for installation. 

Second feedback is that the active gateway is running the fw part and mgmt part.

Why not move the mgmt part over to the second backup firewall? 

This could help with the load of the system. 

-------
If you like this post please give a thumbs up(kudo)! 🙂
0 Kudos
LeeBingKang
Advisor

Hi Lesley. Thanks for your feedback on this and I have read through your suggestion.

We can't simply help customer upgrade to R81.20 due to some certain reasons.

I understand that sometime software will cause crash, but I have no idea to find out that software causes it.

Yes, the firewall member 1 have the active on management and firewall.

Last but not least, there is another weird things which I not really think that it causes this issue whereby the hang issue happened few hours after we have done the remote backup and schedule backup setting.

0 Kudos
G_W_Albrecht
Legend Legend
Legend

If using Management HA, active node should be GW and standby the active management for performance reasons.

CCSP - CCSE / CCTE / CTPS / CCME / CCSM Elite / SMB Specialist
0 Kudos
_Val_
Admin
Admin

This does not sound like a HW issue but like a performance issue. I would advise you to investigate that further.

0 Kudos
LeeBingKang
Advisor

Hi @_Val_ , I'm working with TAC on this.

LeeBingKang
Advisor

Hi all, it is unfortunate that TAC didn't find anything related to the issue from those files given (CPU spike detective, HCP report, and cpinfo file).

Hence, need to monitor and wait until the issue is happening again in future.

the_rock
Legend
Legend

I would definitely consider SK I gave you, as we had similar issue with customer recently.

0 Kudos
LeeBingKang
Advisor

Hi @the_rock , Unfortunately, I checked the cpinfo of the problematic firewall via diagnosticsview and did not get any symptoms that mentioned inside the SK180437.

 

Lets wait and see if the issue still happened again as I swing the active firewall to the problematic firewall.

 

0 Kudos
G_W_Albrecht
Legend Legend
Legend

Out of my experience: A customer had a Cluster (not Management Haha) with one node showing an issue in certain time intervals. HW diag was OK, memory leak procedure and other suggestions from TAC did not give any clue. Finally, we managed to get a demo appliance as a replacement for the node with the issue that was put in production by the customer. After the issue did not reoccur for a longer time, CP gave RMA for the faulty appliance at last...

CCSP - CCSE / CCTE / CTPS / CCME / CCSM Elite / SMB Specialist
the_rock
Legend
Legend

Understood @LeeBingKang . Please keep us posted how it goes.

Andy

LeeBingKang
Advisor

Hi All

 

So far what I have done is fresh install and backup restore on the problematic machine.

The reason of this action: My team and I suspect there may have some files which are having issue in the GAIA OS.

Why we suspect that : The other member should be facing the same issue  (Hang issue) if the issue is on the database, as Fullsync will make sure both members have the same database. However, the other member is working fine even it becomes active management and firewall.

0 Kudos
the_rock
Legend
Legend

Thats fait assesment, BUT...here is the question. Backup you restored, when was it taken? Was FW working properly when that backup was generated?

Andy

0 Kudos
LeeBingKang
Advisor

When the issue occur, all services (management and firewall) go to member 2. After reboot member 1, it is working fine and we failover the firewall to member 1 to see if the issue able replicate (unfortunately, can't replicate it).

With that, i failover the firewall from member 1 to member 2, and extract backup after the failover. After that, I use that backup to do restore after fresh install.

0 Kudos
LeeBingKang
Advisor

Hi guys,

 

The hang issue happened again, and this round CheckPoint support really proceed to RMA on that box. I will update at here again once got any actions.

 

 

the_rock
Legend
Legend

Thats good news, hopefully RMA solves the issue...fingers crossed.

Andy

0 Kudos
LeeBingKang
Advisor

Yea, hopefully...

 

Meanwhile, I will accept RMA as the solution once everything still working fine after replaced with RMA unit around 1 month.

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events