Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Scott_Paisley
Collaborator

Gateway Cluster Failover procedure

Hi

Years ago we had a 3rd party support vendor managing our checkpoint firewalls, and they used to failover traffic in a cluster by downing one of the members.

On a support call with checkpoint for some issue, we were advised by the checkpoint engineer not to do that, but to change the member priority in the management cluster object and install policy, so we have been using that process ever since.

On another support call recently with a new set of checkpoint engineers we have been told that process is not recommended for cluster failover.

What method do you all use, and is there a documented recommended process we should be following?

Thanks

0 Kudos
14 Replies
Chris_Atkinson
Employee
Employee

Each has its merits depending on the scenario that you're working in, what's the context of the event warranting the failover?

0 Kudos
Scott_Paisley
Collaborator

We use it in all cases where we want traffic to use the other box in a cluster, either for maintenance or troubleshooting

0 Kudos
Tal_Paz-Fridman
Employee
Employee

Hi 

Do not know if this is what you were looking for, but the official documentation has a section on How to Initiate Cluster Failover:

https://sc1.checkpoint.com/documents/R81.10/WebAdminGuides/EN/CP_R81.10_ClusterXL_AdminGuide/Topics-...

 

It also leads to a Best Practice SK:

Best Practices - Manual fail-over in ClusterXL

https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solut...

 

 
0 Kudos
Scott_Paisley
Collaborator

Thanks

That matches what we heard from support engineers this week

I guess my other question now is were we the only people who changed priority and pushed policy as a means of adjusting traffic through the clusters?

0 Kudos
Chris_Atkinson
Employee
Employee

Can depend on the resolver groups involved, in some organisations not everyone will have direct CLI access to a Firewall cluster member.

The other method also needs an extra step to return the node to a standby state as you would be aware.

0 Kudos
rrbranco
Contributor

I guess that it would be interesting to take into consideration also the cluster type, like VRRP and/or ClusterXL for instance.

0 Kudos
the_rock
Authority
Authority

I would agree 100% with what @Tal_Paz-Fridman said in this thread. I had been doing it same way for years and never an issue. Essentially, if you run clusterXL_admin down on current active, it will become standby and when you run clusterXL_admin up, it will still stay standby, so definitely in my opinion, safest way of doing a failover. Im not sure what different engineers told you, but I am positive this is recommended Check Point process.

0 Kudos
Magnus-Holmberg
Advisor

How is it if you using "switch to higher priority cluster member" on member recovery, if doing the clusterXL_admin up, i think it switches over at that time. (not 100% sure)

https://www.youtube.com/c/MagnusHolmberg-NetSec
0 Kudos
Scott_Paisley
Collaborator

Thanks for all the replies. They prompted another question.

If I admin down 1 box in the cluster, what happens if the other box suffers a failure?

Perhaps I should have specified circumstances when traffic might be on the other member for some period of time?

0 Kudos
the_rock
Authority
Authority

That is very unlikely scenario...I mean, what are the chances if you were using say ISP redundancy and you downed one link to test 2nd one and 2nd one went down right after? Thats literally less than 1% possibility...highly unlikely.

0 Kudos
Arne_Boettger
Contributor

Well, it depends... clusterXL_admin down creates a failed cluster resource, making the cluster member "less healthy" than the other.

Small failures like interfaces going down should not make it active again, but if the other fails completely (crashes or reboots), it still becomes active attention.

0 Kudos
the_rock
Authority
Authority

I know some people do cpstop as well, but personally, Im not big fan of doing it that way, since it removes the currently installed policy. I would definitely follow what @Tal_Paz-Fridman suggested. Besides, it is vendor recommended.

0 Kudos
Tal_Paz-Fridman
Employee
Employee

You can run cpstop with specific flags that maintain the policy and connection table:

sk113045 https://supportcenter.checkpoint.com/supportcenter/portal?action=portlets.SearchResultMainAction&eve...

[Expert@HostName:0]# cpstop -fwflag -proc

Running this command will stop Check Point daemons and Security Servers, while maintaining the active Security Policy running in the Check Point kernel. Rules with generic Allow/Reject/Drop actions, based on services, will continue to work.

 

Or if you want to load the Default Filter:

[Expert@HostName:0]# cpstop -fwflag -default

Running this command will stop Check Point daemons and Security Servers. The active Security Policy running in the Check Point kernel will be replaced with the Default Filter policy.

 

Also see:

https://sc1.checkpoint.com/documents/R81.10/WebAdminGuides/EN/CP_R81.10_CLI_ReferenceGuide/Topics-CL...

cpstop -fwflag -proc 

  • Shuts down Check Point processes

  • Keeps the currently loaded kernel policy

  • Maintains the Connections table, so that after you run the cpstart command, you do not experience dropped packets because they are "out of state"

the_rock
Authority
Authority

Thanks for that, was not aware of the sk, but thats helpful!

Andy

0 Kudos