- Products
- Learn
- Local User Groups
- Partners
- More
Firewall Uptime, Reimagined
How AIOps Simplifies Operations and Prevents Outages
Introduction to Lakera:
Securing the AI Frontier!
Check Point Named Leader
2025 Gartner® Magic Quadrant™ for Hybrid Mesh Firewall
HTTPS Inspection
Help us to understand your needs better
CheckMates Go:
SharePoint CVEs and More!
Hello.
environment:
CPAP-SG3100-NGTP x2
Product version Check Point Gaia R80.20
OS build 101
OS kernel version 2.6.18-92cpx86_64
OS edition 64-bit
I have a cluster configuration with 2 devices.
The memory usage rate of Unit 2 became high, so I reboot it.
I had to pull out all the cables for Unit 2, so I pulled them all out.
After rebooting, I connected one HA(eth4) cable, but it repeated link up/down for 15 minutes.
I would like to know if this is the spec.
Network Interface:
Mgmt Private (Non-Monitored)
bond0
eth1
eth2
eth3
eth4 HA-1
eth5 HA-2
Unit 2 /var/log/messeage:
May 17 09:36:35 2023 XXXXXXXXX-02 kernel: igb: eth4 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
May 17 09:36:36 2023 XXXXXXXXX-02 kernel: bonding: bond0: link status up for interface eth4, enabling it in 200 ms.
May 17 09:36:36 2023 XXXXXXXXX-02 kernel: bonding: bond0: link status definitely up for interface eth4.
May 17 09:36:36 2023 XXXXXXXXX-02 kernel: [fw4_1];CLUS-114205-2: State change: ACTIVE! -> STANDBY | Reason: Member state has been changed due to higher priority of remote cluster member 1 in PRIMARY-UP cluster
May 17 09:36:36 2023 XXXXXXXXX-02 kernel: [fw4_1];CLUS-214904-2: Remote member 1 (state LOST -> ACTIVE) | Reason: Reason for ACTIVE! alert has been resolved
May 17 09:36:36 2023 XXXXXXXXX-02 kernel: [fw4_1];CLUS-110200-2: State change: STANDBY -> DOWN | Reason: Interface eth1 is down (disconnected / link down)
May 17 09:36:37 2023 XXXXXXXXX-02 kernel: igb: eth4 NIC Link is Down
May 17 09:36:38 2023 XXXXXXXXX-02 kernel: bonding: bond0: link status down for idle interface eth4, disabling it in 200 ms.
May 17 09:36:38 2023 XXXXXXXXX-02 kernel: [fw4_1];CLUS-110200-2: State remains: DOWN | Reason: Previous problem resolved, Interface bond0 is down (disconnected / link down)
May 17 09:36:38 2023 XXXXXXXXX-02 kernel: bonding: bond0: link status definitely down for interface eth4, disabling it
May 17 09:36:38 2023 XXXXXXXXX-02 xpand[5070]: Configuration changed from localhost by user admin by the service dbset
May 17 09:36:39 2023 XXXXXXXXX-02 kernel: [fw4_1];check_other_machine_activity: Update state of member id 0 to DEAD, didn't hear from it since 381.2 and now 384.2
May 17 09:36:40 2023 XXXXXXXXX-02 kernel: [fw4_1];CLUS-216400-2: Remote member 1 (state ACTIVE -> LOST) | Reason: Timeout Control Protocol packet expired member declared as DEAD
May 17 09:36:40 2023 XXXXXXXXX-02 kernel: [fw4_1];CLUS-116505-2: State change: DOWN -> ACTIVE(!) | Reason: All other machines are dead (timeout), Interface bond0 is down (disconnected / link down)
May 17 09:36:40 2023 XXXXXXXXX-02 kernel: [fw4_1];CLUS-100102-2: Failover member 1 -> member 2 | Reason: Available on member 1
May 17 09:36:40 2023 XXXXXXXXX-02 xpand[5070]: Configuration changed from localhost by user admin by the service dbset
May 17 09:36:40 2023 XXXXXXXXX-02 kernel: igb: eth4 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
May 17 09:36:40 2023 XXXXXXXXX-02 kernel: bonding: bond0: link status up for interface eth4, enabling it in 200 ms.
May 17 09:36:41 2023 XXXXXXXXX-02 kernel: bonding: bond0: link status definitely up for interface eth4.
May 17 09:36:41 2023 XXXXXXXXX-02 kernel: [fw4_1];CLUS-110205-2: State remains: ACTIVE! | Reason: Interface eth1 is down (disconnected / link down)
May 17 09:36:41 2023 XXXXXXXXX-02 kernel: [fw4_1];CLUS-110205-2: State change: ACTIVE! -> DOWN | Reason: Interface bond0 is down (disconnected / link down)
May 17 09:36:41 2023 XXXXXXXXX-02 kernel: [fw4_1];CLUS-214904-2: Remote member 1 (state LOST -> ACTIVE) | Reason: Reason for ACTIVE! alert has been resolved
May 17 09:36:41 2023 XXXXXXXXX-02 kernel: [fw4_1];CLUS-110200-2: State remains: DOWN | Reason: Previous problem resolved, Interface eth1 is down (disconnected / link down)
May 17 09:36:41 2023 XXXXXXXXX-02 kernel: bonding: bond0: link status down for idle interface eth4, disabling it in 200 ms.
May 17 09:36:41 2023 XXXXXXXXX-02 kernel: [fw4_1];CLUS-110200-2: State remains: DOWN | Reason: Previous problem resolved, Interface bond0 is down (disconnected / link down)
May 17 09:36:42 2023 XXXXXXXXX-02 kernel: bonding: bond0: link status definitely down for interface eth4, disabling it
May 17 09:36:42 2023 XXXXXXXXX-02 kernel: [fw4_1];CLUS-110200-2: State remains: DOWN | Reason: Previous problem resolved, Interface eth1 is down (disconnected / link down)
May 17 09:36:42 2023 XXXXXXXXX-02 xpand[5070]: Configuration changed from localhost by user admin by the service dbset
May 17 09:36:43 2023 XXXXXXXXX-02 kernel: igb: eth4: igb_setup_mrqc: Setting Legacy RSS (Asymmetric
May 17 09:36:44 2023 XXXXXXXXX-02 kernel: [fw4_1];check_other_machine_activity: Update state of member id 0 to DEAD, didn't hear from it since 386.2 and now 389.2
May 17 09:36:44 2023 XXXXXXXXX-02 kernel: [fw4_1];CLUS-216400-2: Remote member 1 (state ACTIVE -> LOST) | Reason: Timeout Control Protocol packet expired member declared as DEAD
May 17 09:36:44 2023 XXXXXXXXX-02 kernel: [fw4_1];CLUS-116505-2: State change: DOWN -> ACTIVE(!) | Reason: All other machines are dead (timeout), Interface eth1 is down (disconnected / link down)
May 17 09:36:44 2023 XXXXXXXXX-02 kernel: [fw4_1];CLUS-100102-2: Failover member 1 -> member 2 | Reason: Available on member 1
May 17 09:36:45 2023 XXXXXXXXX-02 kernel: igb: eth4 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
May 17 09:36:45 2023 XXXXXXXXX-02 kernel: bonding: bond0: link status up for interface eth4, enabling it in 200 ms.
May 17 09:36:45 2023 XXXXXXXXX-02 kernel: bonding: bond0: link status definitely up for interface eth4.
May 17 09:36:46 2023 XXXXXXXXX-02 kernel: [fw4_1];CLUS-110205-2: State change: ACTIVE! -> DOWN | Reason: Interface eth1 is down (disconnected / link down)
May 17 09:36:46 2023 XXXXXXXXX-02 kernel: [fw4_1];CLUS-214904-2: Remote member 1 (state LOST -> ACTIVE) | Reason: Reason for ACTIVE! alert has been resolved
May 17 09:36:47 2023 XXXXXXXXX-02 kernel: igb: eth4 NIC Link is Down
R80.20 is out of support for a long time, just to be sure.
Before anything else, please check that you do not have another Check Point cluster on the same network. Check IGMP snooping is disabled on the switch for all cluster ports, and also make sure your bond is correctly configured on both CP and network sides.
Thanks for the reply. Thank you very much.
I know the version I'm using is no longer supported.
But I don't have a test environment, so I can't upgrade immediately.
No other CP products were used in this system.
It uses IGMP to allow multicast communication.
Disable IGMP snooping on the swich side and see if it makes a difference. Also, post here your bond config from CP
Unit 1:
Network Interface Infomation:
bond0 Type:Bond IPv4 Address:192.168.20.5 Subnet mask:255.255.255.252
bond0 interface:eth4, eth5 Operation Mode:802.3ad Transmit Hash Policy:Layer 3+4 LACP Rate:Slow
show configuration:
add bonding group 0
add bonding group 0 interface eth4
add bonding group 0 interface eth5
set bonding group 0 mode 8023AD
set bonding group 0 lacp-rate slow
set bonding group 0 mii-interval 100
set bonding group 0 down-delay 200
set bonding group 0 up-delay 200
set bonding group 0 xmit-hash-policy layer3+4
set interface bond0 state on
set interface bond0 mtu 1500
set interface bond0 ipv4-address 192.168.20.5 mask-length 30
Unit 2:
192.168.20.6
The device is in operation and cannot be changed immediately.
No link up/down now.
I would like to know why the link up/down happened for about 15 minutes.
Your best bet is to check smart console logs, as well as /var/log/messages* files
You can also try something like below (example from my lab)
grep -i DOWN /var/log/messages*
Andy
Check if virtual MAC option is enabled on cluster properties object under clusterxl, as well as send below from both members.
Andy
cphaprob roles
cphaprob state
cphaprob syncstat
cphaprob list
cphaprob -a if
Thanks for the reply. Thank you very much.
> Check if virtual MAC option is enabled on cluster properties object under clusterxl
I confirmed your advice, but "Use Virtual MAC" is not checked.
Cphaprob Infomation is below;
XXXXXXXXX-01> cphaprob roles
ID Role
1 (local) Master
2 Non-Master
XXXXXXXXX-01> cphaprob state
Cluster Mode: High Availability (Primary Up) with IGMP Membership
ID Unique Address Assigned Load State Name
1 (local) 192.168.20.5 100% ACTIVE XXXXXXXXX-01
2 192.168.20.6 0% STANDBY XXXXXXXXX-02
Active PNOTEs: None
Last member state change event:
Event Code: CLUS-114904
State change: ACTIVE(!) -> ACTIVE
Reason for state change: Reason for ACTIVE! alert has been resolved
Event time: Wed May 17 09:52:38 2023
Last cluster failover event:
Transition to new ACTIVE: Member 2 -> Member 1
Reason: Member state has been changed due to higher priority of remote cluster member 1 in PRIMARY-UP cluster
Event time: Thu Jun 3 13:53:07 2021
Cluster failover count:
Failover counter: 24
Time of counter reset: Fri Dec 13 15:33:47 2019 (reboot)
XXXXXXXXX-01> cphaprob syncstat
Delta Sync Statistics
Sync status: OK
Drops:
Lost updates................................. 0
Lost bulk update events...................... 0
Oversized updates not sent................... 0
Sync at risk:
Sent reject notifications.................... 0
Received reject notifications................ 0
Sent messages:
Total generated sync messages................ 33006557
Sent retransmission requests................. 0
Sent retransmission updates.................. 0
Peak fragments per update.................... 1
Received messages:
Total received updates....................... 14217563
Received retransmission requests............. 0
Queue sizes (num of updates):
Sending queue size........................... 512
Receiving queue size......................... 256
Fragments queue size......................... 50
Timers:
Delta Sync interval (ms)..................... 100
Reset on Thu Jun 3 13:53:07 2021 (triggered by fullsync).
XXXXXXXXX-01> cphaprob list
There are no pnotes in problem state
XXXXXXXXX-01> cphaprob -a if
CCP mode: Automatic
Required interfaces: 4
Required secured interfaces: 1
eth1 UP non sync(non secured), unicast
eth2 UP non sync(non secured), unicast
eth3 UP non sync(non secured), unicast
Mgmt Non-Monitored non sync(non secured)
bond0 UP sync(secured), unicast, bond Load Sharing
Virtual cluster interfaces: 3
eth1 172.29.13X.3X
eth2 172.29.12X.17X
eth3 172.29.12X1.25X
XXXXXXXXX-02> cphaprob roles
ID Role
1 Master
2 (local) Non-Master
XXXXXXXXX-02> cphaprob state
Cluster Mode: High Availability (Primary Up) with IGMP Membership
ID Unique Address Assigned Load State Name
1 192.168.20.5 100% ACTIVE npaknwcfw-01
2 (local) 192.168.20.6 0% STANDBY XXXXXXXXX-02
Active PNOTEs: None
Last member state change event:
Event Code: CLUS-114802
State change: DOWN -> STANDBY
Reason for state change: There is already an ACTIVE member in the cluster (member 1)
Event time: Wed May 17 10:29:13 2023
Last cluster failover event:
Transition to new ACTIVE: Member 2 -> Member 1
Reason: Member state has been changed due to higher priority of remote cluster member 1 in PRIMARY-UP cluster
Event time: Thu Jun 3 13:53:07 2021
Cluster failover count:
Failover counter: 24
Time of counter reset: Fri Dec 13 15:33:47 2019 (reboot)
XXXXXXXXX-02> cphaprob syncstat
Delta Sync Statistics
Sync status: OK
Drops:
Lost updates................................. 0
Lost bulk update events...................... 0
Oversized updates not sent................... 0
Sync at risk:
Sent reject notifications.................... 0
Received reject notifications................ 0
Sent messages:
Total generated sync messages................ 245550
Sent retransmission requests................. 0
Sent retransmission updates.................. 0
Peak fragments per update.................... 1
Received messages:
Total received updates....................... 466544
Received retransmission requests............. 0
Queue sizes (num of updates):
Sending queue size........................... 512
Receiving queue size......................... 256
Fragments queue size......................... 50
Timers:
Delta Sync interval (ms)..................... 100
Reset on Wed May 17 09:33:55 2023 (triggered by fullsync).
XXXXXXXXX-02> cphaprob list
There are no pnotes in problem state
XXXXXXXXX-02> cphaprob -a if
CCP mode: Automatic
Required interfaces: 4
Required secured interfaces: 1
eth1 UP non sync(non secured), unicast
eth2 UP non sync(non secured), unicast
eth3 UP non sync(non secured), unicast
Mgmt Non-Monitored non sync(non secured)
bond0 UP sync(secured), unicast, bond Load Sharing
Virtual cluster interfaces: 3
eth1 172.29.13X.3X
eth2 172.29.12X.17X
eth3 172.29.12X.25X
Based on that output, all looks right to me. Just a small suggestion...when it comes to sync interface, I always tell people to use something from 169.254.x.x subnet, as thats totally non routable and there is literally zero chance any of those IPs would be used in your network.
Anyway, having said that, what you sent looks right. Did you confirm link state on fw side for bond interface? What about the switch?
Andy
The syslog entries would seem to indicate that the link integrity (green light) on the interface was repeatedly lost for a short period (<1sec), which then caused ClusterXL to mark the interface as down. An interface outage of this short duration is generally caused by a loose cable or speed/duplex negotiation flap. If you haven't rebooted the gateway since the incident, the output of ethtool -S eth4 may shed some light. The logs on the switch around the time of the flap might be helpful too.
I don't think a STP issue on the switch will actually drop link integrity when it stops forwarding traffic due to a possible bridging loop on that switchport, and I don't think switch broadcast suppression/storm control would actually drop link either but I could be wrong unless it was some kind of errdisable.
Leaderboard
Epsum factorial non deposit quid pro quo hic escorol.
User | Count |
---|---|
13 | |
12 | |
12 | |
11 | |
9 | |
8 | |
7 | |
6 | |
5 | |
5 |
Thu 09 Oct 2025 @ 10:00 AM (CEST)
CheckMates Live BeLux: Discover How to Stop Data Leaks in GenAI Tools: Live Demo You Can’t Miss!Thu 09 Oct 2025 @ 10:00 AM (CEST)
CheckMates Live BeLux: Discover How to Stop Data Leaks in GenAI Tools: Live Demo You Can’t Miss!Wed 22 Oct 2025 @ 11:00 AM (EDT)
Firewall Uptime, Reimagined: How AIOps Simplifies Operations and Prevents OutagesTue 28 Oct 2025 @ 11:00 AM (EDT)
Under the Hood: CloudGuard Network Security for Google Cloud Network Security Integration - OverviewAbout CheckMates
Learn Check Point
Advanced Learning
YOU DESERVE THE BEST SECURITY