- Products
- Learn
- Local User Groups
- Partners
- More
MVP 2026: Submissions
Are Now Open!
What's New in R82.10?
Watch NowOverlap in Security Validation
Help us to understand your needs better
CheckMates Go:
Maestro Madness
Hey guys,
Just wanted to run this by the team to see if anyone may have an idea/suggestion. Essentially, my colleague and I were helping client with cutover from Fortinet to new CP 9100 appliances and all was going well, until we enabled bond interface on both members. It all came up fine, but then we noticed clustering was broken and no matter what we tried, we could not get it to work.
Since we were unable to get TAC on the phone, we tried bunch of things ourselves, such as changing bond type on the interface to active-backup and round robin and though round robin did work for cluster, bonded vlans were still showing as down.
Since we had to roll back eventually, now we want to try figure out in the lab why this failed. Swithes are Aruba and all the config we verified seems correct. Just wondering if someone may had this problem before and if so, how did you solve it?
We tried cphastop; cphastart, disable/re-enable cluster, reboot, no luck...bit tricky to tell at this point if this could be CP or Aruba switch issue...
Below are some screenshots of the config:
CCP mode: Manual (Unicast)
Required interfaces: 2
Required secured interfaces: 1
Interface Name: Status:
eth1 UP
Sync (S) UP
bond1.154 (LS) DOWN
bond1.120 (LS) DOWN
maas_tunnel (P) DOWN (382.7 secs)
Note: For more information on bond interfaces, use the command:
cphaprob show_bond [<bond_name>]
S - sync, HA/LS - bond type, LM - link monitor, P - probing
Tx as always for the help, I really appreciate it!
This is how we fixed the issue in the lab:
if eth3 and eth4 are part of bond:
sw1, lag1 -> connected to eth3 of fw1
sw2, lag1 -> connected to eth4 of fw1
sw2, lag2 -> connected to eth3 of fw2
sw1, lag2 -> connected to eth4 of fw2
Special THANKS to my colleague Rahul for helping with this. We are pretty confident everything will work as expected next cutover window.
Did you checked your bonding? Maybe your LACP channel is not up. What’s in
cat /proc/net/bonding/bondxx
This is what I see now Wolfgang, but keep in mind, though interface is enabled, there is nothing connected to it atm:
[Expert@PER_FW_02:0]# cat /proc/net/bonding/bond1
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
Use RxHash: 0
MII Status: down
MII Polling Interval (ms): 100
Up Delay (ms): 200
Down Delay (ms): 200
802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: ca:33:60:f9:51:cb
bond bond1 has no active aggregator
Slave Interface: eth3
MII Status: down
Speed: Unknown
Duplex: Unknown
Link Failure Count: 0
Permanent HW addr: 00:1c:7f:c8:04:0d
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: churned
Partner Churn State: churned
Actor Churned Count: 1
Partner Churned Count: 1
details actor lacp pdu:
system priority: 65535
system mac address: ca:33:60:f9:51:cb
port key: 0
port priority: 255
port number: 1
port state: 69
details partner lacp pdu:
system priority: 65535
system mac address: 00:00:00:00:00:00
oper key: 1
port priority: 255
port number: 1
port state: 1
Slave Interface: eth4
MII Status: down
Speed: Unknown
Duplex: Unknown
Link Failure Count: 0
Permanent HW addr: 00:1c:7f:c8:04:0f
Slave queue ID: 0
Aggregator ID: 2
Actor Churn State: churned
Partner Churn State: churned
Actor Churned Count: 1
Partner Churned Count: 1
details actor lacp pdu:
system priority: 65535
system mac address: ca:33:60:f9:51:cb
port key: 0
port priority: 255
port number: 2
port state: 69
details partner lacp pdu:
system priority: 65535
system mac address: 00:00:00:00:00:00
oper key: 1
port priority: 255
port number: 1
port state: 1
[Expert@PER_FW_02:0]#
Hi @the_rock ,
The key is here: Actor Churn State: churned
https://support.checkpoint.com/results/sk/sk169760
I spent a lot of time earier to debug this, and the Cisco ACI side the port was is "suspended" state
Did you deattached and attached the cable into the port one-by-one?
Hey Akos,
Thanks for that, never really noticed. Now, here is my question...yes, we did bounce the ports the other night, but wondering though, is this more of generic thing or just with Cisco switches? Because in client's case, they have Aruba.
Hi Bro,
"One of the possible causes for a suspended port channel is the issue outlined in sk115516 with mismatching Aggregator IDs."
Long story short: if you have the bond1 (members: eth1 eth2), unplug them, wait a few second, then plug them back simultaneously.
I don't think so this relates to only Cisco Devices.
Akos
Thanks a lot for this man, I truly appreciate it. Lets us check all in few hours and I will update what we discover during the test. Great thing is we actually do have 4 devices we can test in our lab that another client purchased, so since its brand new config, its perfect to validate all this.
And as I see the IDs are different, which is not a good thing according to the spellbooks 🙂
Slave Interface: eth3
MII Status: down
Speed: Unknown
Duplex: Unknown
Link Failure Count: 0
Permanent HW addr: 00:1c:7f:c8:04:0d
Slave queue ID: 0
Aggregator ID: 1
lave Interface: eth4
MII Status: down
Speed: Unknown
Duplex: Unknown
Link Failure Count: 0
Permanent HW addr: 00:1c:7f:c8:04:0f
Slave queue ID: 0
lave Interface: eth4
MII Status: down
Speed: Unknown
Duplex: Unknown
Link Failure Count: 0
Permanent HW addr: 00:1c:7f:c8:04:0f
Slave queue ID: 0
Aggregator ID: 2
I see what you meanm, BUT...where do you even set that up? because in gaia, you can only give bond group ID, which is number 1 on both cluster members, I dont see this aggregator ID anywhere...is that switch side?
Hi,
Yes, the switch side. I am 100% sure the network team will understand this.
Akos
K, sounds good.
I will mention this to my colleague as well, thanks for bring it up!
Hi Andy, regards from Colombia. I hope that perhaps we can talk again at some point. I also wish you an excellent December.
I’ve been analyzing your issue since this morning. If you’d like, could you share the trunk configuration of the Aruba Core 1 and 2? I once had a similar issue about two years ago on a VSX cluster. In this scenario, maybe it has to do with the same root cause. I could add if you want to check the CRCs from the Checkpoint interface side that are part of the port channel (sar -n EDEV)
Regards
JS
Hey mate,
Greetings to beautiful Colombia! Hey, I dont believe we sadly have access to it, but my colleague may ask customers for the access. Do you recall what could have been an issue?
Hey, Andy mate,
I always remember you, thanks. I love my country. Let me explain in more detail: when we were finally able to check the configurations on the network core appliance from the security team’s side, we found that the bonds were down because the VLANs (dot1q) were not being propagated correctly on the Core Switch. Typically, the same VLANs seen on the Check Point Cluster should also appear on the port-channel or LAG (1 and 2 in this case for the Aruba appliance), but when I encountered the issue, those VLANs were missing on the trunk.
Regards
JS
My colleague will give more details, but appears we did replicate and fixed it in the lab. Seems that lag 1 needed to be connected to fw 1 and lag 2 to fw 2, that did seem to work fine.
Here is what we see now in the lab, which is 100% right:
FW1:
[Expert@V-M-R1-FW-1:0]# cphaprob -a if
CCP mode: Manual (Unicast)
Required interfaces: 4
Required secured interfaces: 1
Interface Name: Status:
Sync (S) UP
Mgmt Non-Monitored
eth8 UP
bond1.120 (LS) UP
bond1.150 (LS) UP
S - sync, HA/LS - bond type, LM - link monitor, P - probing
Virtual cluster interfaces: 3
eth8 10.0.254.60
bond1.120 10.19.120.1
bond1.150 10.19.150.1
[Expert@V-M-R1-FW-1:0]# cphaprob state
Cluster Mode: High Availability (Active Up) with IGMP Membership
ID Unique Address Assigned Load State Name
1 (local) 169.254.1.1 100% ACTIVE poc1
2 169.254.1.2 0% STANDBY poc2
Active PNOTEs: None
Last member state change event:
Event Code: CLUS-114904
State change: ACTIVE(!) -> ACTIVE
Reason for state change: Reason for ACTIVE! alert has been resolved
Event time: Thu Dec 11 15:56:44 2025
Cluster failover count:
Failover counter: 0
Time of counter reset: Tue Dec 9 10:22:43 2025 (reboot)
[Expert@V-M-R1-FW-1:0]#
[Expert@V-M-R1-FW-1:0]#
******************************
Fw2:
[Expert@V-M-R1-FW-2:0]# cphaprob -a if
CCP mode: Manual (Unicast)
Required interfaces: 4
Required secured interfaces: 1
Interface Name: Status:
Sync (S) UP
Mgmt Non-Monitored
eth8 UP
bond1.120 (LS) UP
bond1.150 (LS) UP
S - sync, HA/LS - bond type, LM - link monitor, P - probing
Virtual cluster interfaces: 3
eth8 10.0.254.60
bond1.120 10.19.120.1
bond1.150 10.19.150.1
[Expert@V-M-R1-FW-2:0]# cphaprob state
Cluster Mode: High Availability (Active Up) with IGMP Membership
ID Unique Address Assigned Load State Name
1 169.254.1.1 100% ACTIVE poc1
2 (local) 169.254.1.2 0% STANDBY poc2
Active PNOTEs: None
Last member state change event:
Event Code: CLUS-114802
State change: DOWN -> STANDBY
Reason for state change: There is already an ACTIVE member in the cluster (member 1)
Event time: Thu Dec 11 15:56:53 2025
Cluster failover count:
Failover counter: 0
Time of counter reset: Tue Dec 9 10:22:43 2025 (reboot)
[Expert@V-M-R1-FW-2:0]#
Btw, we will do some more tests Friday as well, just to see if we can get to the bottom of this...as to why this was failing in the first place.
What is the "native" (untagged) vlan on the bonds?
Are the interfaces otherwise cabled properly and each firewall definitely using its own lag as the diagram shows?
Hey Chris,
Yes, they were definitely cabled right and as per diagram. Since we could not work with TAC as no one called us back in time, we had to roll back to Fortigates, but will try again in January. VLANs are 120, 140,150,152 and 154. We have guys who are super knowleagable in Aruba switches, so I will work with one of my colleagues Thursdya to see if we can replicate this in the lab with Aruba 4000 switch.
Review the config in the context of this perhaps:
sk120684: No connectivity over VLAN interfaces configured on a Bond interface on Check Point Security Gateway
Thanks Chris. I will check with my colleague tomorrow when we do remote about this...since I dont have access to the switch and even if I did, truth be told, I would not have a good idea what to even look for, lets see how far we get with the lab testing.
Hey Chris,
Good morning from Canada : - )
I will update the thread once we do remote today...lets see if we can figure something out?
Just to update quick...we were able to make this work with Aruba switches and CP 9700 appliances in the lab, but will do some more tests, just to be on the safe side it functions as it should.
This is how we fixed the issue in the lab:
if eth3 and eth4 are part of bond:
sw1, lag1 -> connected to eth3 of fw1
sw2, lag1 -> connected to eth4 of fw1
sw2, lag2 -> connected to eth3 of fw2
sw1, lag2 -> connected to eth4 of fw2
Special THANKS to my colleague Rahul for helping with this. We are pretty confident everything will work as expected next cutover window.
Leaderboard
Epsum factorial non deposit quid pro quo hic escorol.
| User | Count |
|---|---|
| 19 | |
| 17 | |
| 13 | |
| 8 | |
| 7 | |
| 3 | |
| 3 | |
| 3 | |
| 3 | |
| 3 |
Tue 16 Dec 2025 @ 05:00 PM (CET)
Under the Hood: CloudGuard Network Security for Oracle Cloud - Config and Autoscaling!Thu 18 Dec 2025 @ 10:00 AM (CET)
Cloud Architect Series - Building a Hybrid Mesh Security Strategy across cloudsTue 16 Dec 2025 @ 05:00 PM (CET)
Under the Hood: CloudGuard Network Security for Oracle Cloud - Config and Autoscaling!Thu 18 Dec 2025 @ 10:00 AM (CET)
Cloud Architect Series - Building a Hybrid Mesh Security Strategy across cloudsThu 08 Jan 2026 @ 05:00 PM (CET)
AI Security Masters Session 1: How AI is Reshaping Our WorldAbout CheckMates
Learn Check Point
Advanced Learning
YOU DESERVE THE BEST SECURITY