Solved: Re: Weird problem when replacing 5600 cluster with...

JonSnow1 · ‎2023-12-05

Hi all

I need to replace our 5600 gateway cluster with a new 6600 cluster with as little downtime as possible [ideally we want established connections to remain, but if that is not possible we need at least one firewall available to handle new connections], so I followed the advice in Solved: Replace/Upgrade Cluster - Check Point CheckMates but I ran into a weird issue, as follows:

I disconnected the 5600-standby firewall and connected the new 6600-standby firewall in its place.

I established SIC and installed the policy on the cluster, and the policy installed fine.

I ran cphaprob stat on each gateway and the 5600 showed as "Active Attention" [due to the mismatch in CPU's], and the 6600 showed as "Standby". I checked the OSPF routes on the 6600 and it had learned all the correct routes, so I thought it was safe to proceed with the failover.

I ran "clusterXL_admin down" on the 5600 and the 6600 status went to "Active" and it initially appeared to be processing traffic fine, but I then noticed that all outbound traffic to the internet was all being dropped, and the logs showed these drops simply as "Rulebase internal error", with no other information. I had a Endpoint Security VPN connection from my laptop established before I did the failover, and this survived the failover and I was still able to connect to the internal network, but any new outbound connections to the internet where showing in the logs as "Rulebase internal error" and failing to connect.

As I had not come across this error before I failed back to the 5600 gateway and traffic went back to normal, and then I brought the 5600-standby back online.

Can anyone see where I went wrong with this, or offer any advice as to how I should troubleshoot it if it happens again?

Bob_Zimmerman · ‎2023-12-05

There's not a good way to do this. A 5600 has a Core i5-4590S with four cores, and a 6600 has a Core i5-9500E with six cores. They can't synchronize with the default settings.

You could manually set the 6600 to the same CoreXL config as the 5600, and sync should work, but it wouldn't be supported (so the support call center wouldn't help you), and you would be leaving ~30% performance on the table.

The only supported way to do this is to take a hard outage. Configure both new members. Shut down both old members. Bring up both new members. Establish trust between the management and the new members. Push policy.

View solution in original post

Chris_Atkinson · ‎2023-12-05

Strictly speaking this isn't a supported procedure.

With that said could you please supply some further information, as a minimum to start what blades / versions / Jumbos are involved?

R81.10 JHF T110...

PRJ-46113,PRHF-28489: Security Gateway

In rare scenarios, the Security Gateway may drop the traffic after "Rulebase Internal Error" which occurs during policy installation.

CCSM R77/R80/ELITE

JonSnow1 · ‎2023-12-05

Hi Chris

Blades = Firewall, IPSec VPN, Mobile Access

Versions = R81.20 on 5600 and 6600

Jumbos = on 5600's there are no HFA's [fresh install and upgrade image], on 6600's there are HFA take 26 [blink image]

JonSnow1 · ‎2023-12-05

Hi Chris

As you mentioned this isn't really a supported way of doing this, could you confirm what the correct procedure is to replace an R81.20 cluster of 5600 appliances with an R81.20 cluster of 6600 appliances with the same IP addresses please? Is there anyway to do it with at least one gateway online to process new connections?

Bob_Zimmerman · ‎2023-12-05

There's not a good way to do this. A 5600 has a Core i5-4590S with four cores, and a 6600 has a Core i5-9500E with six cores. They can't synchronize with the default settings.

You could manually set the 6600 to the same CoreXL config as the 5600, and sync should work, but it wouldn't be supported (so the support call center wouldn't help you), and you would be leaving ~30% performance on the table.

The only supported way to do this is to take a hard outage. Configure both new members. Shut down both old members. Bring up both new members. Establish trust between the management and the new members. Push policy.

Chris_Atkinson · ‎2023-12-05

@Bob_Zimmerman Captured it well, unfortunately outside of Maestro and maybe ElasticXL (future) clustering of different models isn't expected to be seamless / supported.

CCSM R77/R80/ELITE

Martijn · ‎2023-12-05

Hi,

Is the naming of the interfaces the same on the 5600 and 6600 appliances?

Did you perform a simple debug 'fw ctl zdebug + drop' to got more information about those drops?
Or even better. A full kernel debug should show detailed info about those drops.

Regards,
Martijn

JonSnow1 · ‎2023-12-05

Hi Martijn

Yes the naming of the interfaces is the same [eth1 to eth8], I didn't perform the debug's unfortunately.

Are you a member of CheckMates?

Weird problem when replacing 5600 cluster with 6600 cluster