Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
prisciltetchou1
Explorer
Jump to solution

R80.40 VSX clusterXL down after failover

Hello All, 

Please I need your help. 

We have a cluster of 2 VSX gateways in Gaia R80.40.

After a failover, one of the members stocks "down". From the screenshot in attachment, it seems as the fwd. process has crashed.  

We did CPSTOP / CPSTART on the firewall, but the FWD process did not restart. 

In the FWD logs, I can see this line which I cannot interpret:  Unable to open '/vs0/dev/fw6v0': Connection refused. 

Please could some one help? 

 

 

 

 

 

 

0 Kudos
1 Solution

Accepted Solutions
Chris_Atkinson
Employee Employee
Employee

Very possibly, the following is a fix in R80.40 JHF T198 that you could verify with TAC.

PRJ-44434,PMTR-89908 - ClusterXL

UPDATE: Improved the fullsync time after reboot in large scale environments. Refer to sk180742

https://support.checkpoint.com/results/sk/sk180742

CCSM R77/R80/ELITE

View solution in original post

0 Kudos
15 Replies
AkosBakos
Leader Leader
Leader

Hi @prisciltetchou1 

Something maybe wrong with the fwkern.conf file. Did you make any changes in the file? 

https://support.checkpoint.com/results/sk/sk92810

Akos

----------------
\m/_(>_<)_\m/
0 Kudos
prisciltetchou1
Explorer

Hello @AkosBakos

Thanks for your reply. 

The file has not been modified. 

But I will compare the file of the faulty firewall with that of the active one. 

0 Kudos
AkosBakos
Leader Leader
Leader

Hi @prisciltetchou1 

Maybe it was modified earlier, but the reboot happened only today -> that's why the problem arose today.

Akos

----------------
\m/_(>_<)_\m/
0 Kudos
prisciltetchou1
Explorer

We have the problem since months after a failover and we had rebooted the firewall before.

Yesterday we just did another reboot.  

0 Kudos
AkosBakos
Leader Leader
Leader

I see, maybe can you just move/rename the fwkern.conf file (only for a test), then perform a reboot again, If the member come up to Standby, we caught the root cause.

Akos

----------------
\m/_(>_<)_\m/
0 Kudos
Chris_Atkinson
Employee Employee
Employee

Which JHF version is this machine installed with and have you reviewed the issue with TAC?

Note R80.40 became EOL in April of this year so please start planning your upgrade to a supported version.

CCSM R77/R80/ELITE
0 Kudos
prisciltetchou1
Explorer

Hello @Chris_Atkinson

The JHF version is Take:  180 on both firewalls. 

Do you think an upgrade to R81.20 or update the JHF could solve the issue? 

0 Kudos
Chris_Atkinson
Employee Employee
Employee

Very possibly, the following is a fix in R80.40 JHF T198 that you could verify with TAC.

PRJ-44434,PMTR-89908 - ClusterXL

UPDATE: Improved the fullsync time after reboot in large scale environments. Refer to sk180742

https://support.checkpoint.com/results/sk/sk180742

CCSM R77/R80/ELITE
0 Kudos
prisciltetchou1
Explorer

Hello All, 

I have not yet get in touch with the TAC, the local partner is still trying to open a case for us. 

In the meantime, I just noticed that the license for Antibot and antivirus are expired on the faulty firewall. 

Could it be the problem? 

 

0 Kudos
Chris_Atkinson
Employee Employee
Employee

License status of these blades is not a factor for ClusterXL.

I would patch the firewalls with the latest recommended JHF, your call if you wish to have some prior validation from TAC.

CCSM R77/R80/ELITE
0 Kudos
prisciltetchou1
Explorer

Hello, 

Thank you @Chris_Atkinson for your response. Installing the JHF T211, solved the issue. 

But after the upgrade from R80.40 to R81.20 JHFT89, we noticed that the MAC of the management interface of the firewalls had changed. Is it normal? what could be the reason?

Thanks,

0 Kudos
Chris_Atkinson
Employee Employee
Employee

Is the Management interface configured as a bond, potentially it might change if the slaves came up in a different order.

CCSM R77/R80/ELITE
0 Kudos
prisciltetchou1
Explorer

I am sorry, but I am not sure I understand your response.

Please could you more explicit? If you have a link or document that could explain the process, I will be pleased if you you could share it.

The environment is cluster of 2 VSX firewalls with 1 VS each. The management interface is not a bond.  

0 Kudos
Chris_Atkinson
Employee Employee
Employee

What hardware / appliance is used, do you have VMAC configured?

Seems sk98219 doesn't apply in your case, I would follow-up with TAC if the problem persists or is creating an issue.

https://support.checkpoint.com/results/sk/sk98219

CCSM R77/R80/ELITE
0 Kudos
prisciltetchou1
Explorer

Hi @Chris_Atkinson

The SK definitely applies to my case. I made a mistake, the Mgmt interface is on a bond. 

Thank you very much for your valuable help! 

Thanks to all who take their time to help me! 

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events