- Products
- Learn
- Local User Groups
- Partners
- More
Quantum Spark Management Unleashed!
Introducing Check Point Quantum Spark 2500:
Smarter Security, Faster Connectivity, and Simpler MSP Management!
Check Point Named Leader
2025 Gartner® Magic Quadrant™ for Hybrid Mesh Firewall
HTTPS Inspection
Help us to understand your needs better
CheckMates Go:
SharePoint CVEs and More!
Hi all,
This is urgent and any comments are more than welcome.
Env:
2GWs(9100), R81.20T99, ClusterXL, High Availability of ClusterXL, not preempt
6 physical interfaces are grouped into 3 bond interfaces
3 802.3ad bond interfaces as; WAN(eth1-01, 02), LAN(eth1-03, 04), Sync(eth1, eth2)
If one bond interface not available failover is configured to be triggered
Advanced settings of Bond set as default
additional NICs are at the link speed of 10GB
I was seeing if failover functionality works as configured.
To test it, I unplugged a certain cable one by one, and check cluster state by # cphaprob state.
After that, Unplugging cables included in one bond interface follows in order to make it go down,
and observed whether failover occurred or not.
Most of the tests went good.
However, I observed standby member occasionally go down for a few seconds
when only one of the member interface of bond interface.
It went back to the state of standby quickly.
It takes the cluster nearly 20 seconds to change its state from active - down to standby
when making it failover by unplugging all cables from physical member of a certain bond interface.
This is against my understanding.
My hypothesis is: LACP link failover time and ClusterXL failover time have bad timing.
Your thoughts?
Saitoh
Hey Saitoh,
First off, if its urgent, I suggest calling TAC and opening a case to see if you can speak with someone. Second, would you mind sending outputs of below commands from both members?
cphaprob roles
cphaprob state
cphaprob -a if
cphaprob -i list
cphaprob -l list
cphaprob syncstat
Andy
How many members have you got in the LACP BOND?
What does cat /proc/net/bondig/bond<X> say?
All things are ok? What is the "churned" state?
Dear all who helped me figure out what the problem is,
I finished the investigation, and succeeded in making it clear what causes this behaviour.
It was just the problem of switches, they were not configured to recognise LACP! 😞
No port-channel on the switches, no surprise. Nothing technical to intrigue you all. Sadge.
Anyways my wholehearted appreciation reaches @the_rock , @Chris_Atkinson , @AkosBakos ,
who instructed me how I should check the status associated with 802.3ad.
I was able to sort out the problem and asked admin to make switches ready thanks to you all.
Saitoh
Hey Saitoh,
First off, if its urgent, I suggest calling TAC and opening a case to see if you can speak with someone. Second, would you mind sending outputs of below commands from both members?
cphaprob roles
cphaprob state
cphaprob -a if
cphaprob -i list
cphaprob -l list
cphaprob syncstat
Andy
Dear @the_rock ,
Thanks for your comments.
Outputs are listed below, with some point masked for the purpose of privacy.
I cannot reach the cluster since it is in production, and therefore I took them in lab environment, same scenario.
Much appreciated to further comment!
I can confirm those settings are same as the screenshot says.
This might need further investigation...did you open TAC case for it?
Andy
How many members have you got in the LACP BOND?
What does cat /proc/net/bondig/bond<X> say?
All things are ok? What is the "churned" state?
Dear @AkosBakos ,
Thanks for your comments, each bond has two physical interface members.
I will check them in the morning. I cannot reach its cluster is in production, located in data center.
Would you mind telling me what you are concerned?
I am not getting used to dealing with LACP, so it would be so much appreciated if you enlighten me.
How quickly are you plugging / unplugging cables - how is portfast configured here?
Dear @Chris_Atkinson ,
I am sorry that I forgot to mention it. It is set to fast.
Un/pluggings were done in a normal manner.
Not quickly, but not strugglingly.
What kind of way could lead to this behaviour for example?
P.S.
It was always standby member which went down, not active one.
Dear all who helped me figure out what the problem is,
I finished the investigation, and succeeded in making it clear what causes this behaviour.
It was just the problem of switches, they were not configured to recognise LACP! 😞
No port-channel on the switches, no surprise. Nothing technical to intrigue you all. Sadge.
Anyways my wholehearted appreciation reaches @the_rock , @Chris_Atkinson , @AkosBakos ,
who instructed me how I should check the status associated with 802.3ad.
I was able to sort out the problem and asked admin to make switches ready thanks to you all.
Saitoh
Gteat job @saitoh
Andy
Leaderboard
Epsum factorial non deposit quid pro quo hic escorol.
User | Count |
---|---|
17 | |
12 | |
6 | |
5 | |
5 | |
5 | |
4 | |
3 | |
3 | |
3 |
Wed 10 Sep 2025 @ 11:00 AM (CEST)
Effortless Web Application & API Security with AI-Powered WAF, an intro to CloudGuard WAFWed 10 Sep 2025 @ 11:00 AM (EDT)
Quantum Spark Management Unleashed: Hands-On TechTalk for MSPs Managing SMB NetworksFri 12 Sep 2025 @ 10:00 AM (CEST)
CheckMates Live Netherlands - Sessie 38: Harmony Email & CollaborationWed 10 Sep 2025 @ 11:00 AM (EDT)
Quantum Spark Management Unleashed: Hands-On TechTalk for MSPs Managing SMB NetworksFri 12 Sep 2025 @ 10:00 AM (CEST)
CheckMates Live Netherlands - Sessie 38: Harmony Email & CollaborationAbout CheckMates
Learn Check Point
Advanced Learning
YOU DESERVE THE BEST SECURITY