Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
TRajkumar
Contributor

Delay when standby member to came up

HI Checkmates

 

 Today i have seen new issue on cluster XL.

Environment: Distribution architecture

Version : R81.20

Hotfix : 84

Cluster members : 2 checkpoint appliances

When i do a cluster failover, secondary member takes at least 10 minutes to process the traffic. That time our all the services are goes down, after 10 minutes everything works fine. i did not observe any drops on log (smart console).

but the cluster state show active/standby states correctly. no delay on this part.

Kindly help me to sort out the new problem.

 

Thanks

Rajkumar T

 

0 Kudos
9 Replies
the_rock
Legend
Legend

Can you please send outputs of below when this happens?

Andy

**********************

 

cphaprob roles

cphaprob state

cphaprob -a if

cphaprob -i list

cphaprob -l list

cphaprob syncstat

 

********************************

 

Personally, never seen such an issue myself, even back in R55.

0 Kudos
Chris_Atkinson
Employee Employee
Employee

Is there any dynamic routing involved or are there issues with stale ARP entries?

Do the issue occur regardless of which member is active or standby?

CCSM R77/R80/ELITE
0 Kudos
TRajkumar
Contributor

HI Chris

 There is no dynamic routing.

0 Kudos
emmap
Employee
Employee

Have you run any tcpdumps and/or traffic captures to see if the packets are reaching the gateway during the outage period?

0 Kudos
Timothy_Hall
Legend Legend
Legend

Sounds like a Gratuitous ARP issue (which is the default setting), do you have VMAC set on the cluster object?  That should help but if you still experience a 10-12 second delay upon failover even after setting VMAC you'll need to set portfast (NOT disable STP) on the switch ports the firewalls are connected to. 

If everything is working properly, upon failover you should see the following traffic behavior:

Catastrophic Failover (active completely dies/crashes): Outage of up to 2.5 seconds

Non-Catastrophic Failover (active interface failure, clusterXL_admin down, etc.): Outage of up to 300 milliseconds

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
the_rock
Legend
Legend

@TRajkumar 

Actually, @Timothy_Hall makes super valid point. Can you see if below is enabled or not?

Andy

 

Screenshot_1.png

0 Kudos
TRajkumar
Contributor

Dear Timothy

Thanks i will try this.

 

Thanks

Rajkumar T

0 Kudos
the_rock
Legend
Legend

Try to toggle that option and install policy and then do a failover test and see what happens. If no change, naybe open TAC case to further investigate.

Andy

0 Kudos
Lesley
Leader Leader
Leader

share fw tab -t connections -s from both members at the same time.

This will show if the connections are synced. 

-------
If you like this post please give a thumbs up(kudo)! 🙂
0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events