Solved: Re: Server interface configuration high availabili...

yesok

Hi everyone,

I'm trying to configure network bonding on a Linux server (Debian 12) that connects to a Check Point SMB Gateway cluster with High Availability. I'm experiencing interface issues and need clarification on the expected configuration.

Current Setup:

Check Point Side:

SMB Gateway in HA cluster mode
LAN8 interface configured with bond-mode: 802.3ad (LACP)
Two members in the cluster

Linux Server Side:

Debian 12 with two interfaces (ens6f0, ens6f1)
Both cables connected to the same Check Point cluster (LAN8)
Bond0 configured in 802.3ad (LACP) mode

Current /etc/network/interfaces:

auto bond0
iface bond0 inet static
    address x.x.x.x
    netmask 255.255.255.0
    gateway x.x.x.1
    bond-slaves ens6f0 ens6f1
    bond-mode 802.3ad
    bond-miimon 100
    bond-lacp-rate slow
    bond-xmit-hash-policy layer2+3
    bond-updelay 200
    bond-downdelay 200
    dns-nameservers 1.1.1.1

The Problem:

With only ens6f0 in bond0: Everything works fine, LAN8 stays UP
When adding ens6f1 to bond0: The Check Point LAN8 interface goes DOWN
Removing ens6f1 from bond0: LAN8 comes back UP immediately

Bond status shows:

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
...
Slave Interface: ens6f0
MII Status: up
Aggregator ID: 1

Slave Interface: ens6f1
MII Status: up
Aggregator ID: 2  ← Different aggregator!

LACP partner detection:

Partner Mac Address: 00:00:00:00:00:00  ← No LACP partner detected

My Questions:

Is LACP (802.3ad) the correct mode for connecting to a Check Point SMB HA cluster, or should I use active-backup instead?
On the Check Point side, what interface configuration is expected when a server uses bonding? Should LAN8 be configured differently for HA setups?
Should both server NICs connect to the same physical Check Point device, or one to each cluster member?

What I've Tried:

OK Single interface (ens6f0) works perfectly
LACP bonding causes LAN8 to go down on the second member
Both interfaces show different Aggregator IDs
No LACP partner MAC detected

Environment:

Quantum Spark 2560 Appliance (HA cluster)
Debian 12 server 4 port NICs ens6f0, ens6f1
Gateway: 10.2.0.1 (Check Point LAN8)

Any guidance on the proper configuration would be greatly appreciated!

Thanks in advance.

yesok

I solved this issue by installing keepalived on Linux. Here's my working configuration:

Configuration

/etc/keepalived/keepalived.conf

global_defs {
    router_id XXXX
    enable_script_security
    script_user root
}

# Monitor Firewall 1 via ens6f0
vrrp_script check_fw1 {
    script "/usr/bin/fping -I ens6f0 -c 2 -t 500 x.x.x.x"
    interval 3
    weight -50
    fall 2
    rise 2
}

# Monitor Firewall 2 via ens6f1
vrrp_script check_fw2 {
    script "/usr/bin/fping -I ens6f1 -c 2 -t 500 x.x.x.y"
    interval 3
    weight -30
    fall 2
    rise 2
}

vrrp_instance GW_FAILOVER {
    state MASTER
    interface ens6f0
    virtual_router_id 51
    priority 100
    advert_int 1
    
    virtual_ipaddress {
        169.254.1.1/32 dev ens6f0
    }
    
    track_script {
        check_fw1
        check_fw2
    }
    
    notify_master "/etc/keepalived/use_fw1.sh"
    notify_backup "/etc/keepalived/use_fw2.sh"
}

/etc/keepalived/use_fw1.sh

#!/bin/bash
logger -t KEEPALIVED "Using FW1  via ens6f0"
ip route replace default via x.x.x.x dev ens6f0 metric 10

/etc/keepalived/use_fw2.sh

#!/bin/bash
logger -t KEEPALIVED "Switching to FW2 via ens6f1"
ip route replace default via x.x.x.y dev ens6f1 metric 20

chmod +x /etc/keepalived/*.sh
systemctl restart keepalived

How It Works

The solution uses weighted priorities to determine failover:

Priority 100 (initial): Both firewalls reachable → Use FW1
Priority 70: FW2 unreachable (-30) → Stay on FW1
Priority 50: FW1 unreachable (-50) → Switch to FW2 (BACKUP state)
Priority 20: Both unreachable (-80) → FAULT state

When an interface goes down, the fping check automatically fails (can't ping through a down interface), triggering the appropriate failover.

View solution in original post

the_rock

Hey @yesok

See if post I had about it recently helps. I know this was not SMB, but regular Gaia 9100 firewalls, but still, might be related.

https://community.checkpoint.com/t5/Security-Gateways/Issue-with-9100-clusterXL-with-bond-interface-...

Best,
Andy

emmap

Hi

The two appliances in the HA cluster are not presenting a multi-channel bond, so trying to use LACP like that across the two appliances will not work. They are presenting two separate LACP bonds with one interface in each down to the server, but the server is presenting a single LACP bond with two interfaces back to the appliances. This won't work.

Ideally you need a switching layer here so that the gateways can do their HA independly to the server's interface teaming. I don't think there's a good way to do it without a switch between them. If you set it as an active-standby bond on the server and no bond on the appliances, set the appliance HA to be 'primary up' and then set the primary interface on the server to be the one facing the primary appliance, you'll have some way to manage it.. but it won't be perfect. If the primary appliance goes into a 'down' state without that LAN8 interface going down, the server won't know that it should be using the other interface in the bond. Check Point HA isn't designed to be used without a switch on set of HA interfaces.

Vincent_Bacher

I definitely wouldn’t go without switch interlink with LACP bonding as well. If it’s absolutely just without it, I’d first try active backup, and on the Linux side, I’d configure ARP monitoring in addition . However, I can’t actually look up any links right now, since I’m a bit limited. Maybe someone else can do that for us.

and now to something completely different - CCVS, CCAS, CCTE, CCCS, CCSM elite

yesok

Thank you both for your responses. Understanding how HA interfaces function on Spark was our initial requirement, which will now allow us to choose the best configuration mode. We'll proceed with testing the active-backup mode using ARP probe to verify its functionality.

Vincent_Bacher

https://manpages.debian.org/testing/ifupdown-ng/interfaces-bond.5.en.html

something for Debian but as said I would strongly recommend using a switch

and now to something completely different - CCVS, CCAS, CCTE, CCCS, CCSM elite

yesok

We attempted this approach using active-backup mode with ARP probes directed to one of the member IP addresses, but the interface on gw2 still remains down. Unfortunately, installing a switch is not an option for us. As an alternative, we will create a Link Aggregation interface with 2 member interfaces, where only one will be physically connected. We will keep you updated on the results.

gw2> cphaprob -a if

CCP mode: Manual (Unicast)
Required interfaces: 3
Required secured interfaces: 1

Interface Name: Status:

lo Non-Monitored
WAN UP
LAN1 Non-Monitored
LAN6 Non-Monitored
LAN8 (P) Inbound: DOWN (83331.8 secs)
Outbound: DOWN (506.6 secs)
SYNCBOND (S-HA) UP

S - sync, HA/LS - bond type, LM - link monitor, P - probing

Linux active-backup bond configuration

cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v6.1.0-23-amd64

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: ens6f0 (primary_reselect always)
Currently Active Slave: ens6f0
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0
ARP Polling Interval (ms): 1000
ARP Missed Max: 2
ARP IP target/s (n.n.n.n form): x.x.x.x //gw1 interface IP
NS IPv6 target/s (xx::xx form):

Slave Interface: ens6f0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: 6c:fe:54:8c:55:08
Slave queue ID: 0

Slave Interface: ens6f1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 6c:fe:54:8c:55:09
Slave queue ID: 0

emmap

It makes sense that it's down on GW2, as it won't be able to see anything on that link. It will be trying to ARP for IPs and ping them but if it's directly connected to the server's backup link, it won't be getting anything back at all. The clustering just isn't designed to work in such a situation.

sigal

Hi,
If LAN8 is monitored (from cluster perspective) or has virtual IP then you must have a switch between the server and the Spark cluster. In that case, there is no point defining bond on the server.

If you need a bond on the Spark side then configuration should be identical on both cluster members, e.g. LAN7+LAN8 on both gateways. Naturally, this configuration also requires switch between the cluster and the server.

Thanks.

the_rock

100% true. We had exact same scenario in the post I rrferenced.

Best,
Andy

Vincent_Bacher

Oh dear, I made a mistake in my thinking. That's right. Then there's probably no way around using a switch, unless you want to resort to the competition from Sunnyvale, because there you simply set two interfaces without bonding, and you're done.

config system ha
   set hbdev <primary_interface>
   set backup-hbdev <backup_interface>
end

and now to something completely different - CCVS, CCAS, CCTE, CCCS, CCSM elite

yesok

I solved this issue by installing keepalived on Linux. Here's my working configuration:

Configuration

/etc/keepalived/keepalived.conf

global_defs {
    router_id XXXX
    enable_script_security
    script_user root
}

# Monitor Firewall 1 via ens6f0
vrrp_script check_fw1 {
    script "/usr/bin/fping -I ens6f0 -c 2 -t 500 x.x.x.x"
    interval 3
    weight -50
    fall 2
    rise 2
}

# Monitor Firewall 2 via ens6f1
vrrp_script check_fw2 {
    script "/usr/bin/fping -I ens6f1 -c 2 -t 500 x.x.x.y"
    interval 3
    weight -30
    fall 2
    rise 2
}

vrrp_instance GW_FAILOVER {
    state MASTER
    interface ens6f0
    virtual_router_id 51
    priority 100
    advert_int 1
    
    virtual_ipaddress {
        169.254.1.1/32 dev ens6f0
    }
    
    track_script {
        check_fw1
        check_fw2
    }
    
    notify_master "/etc/keepalived/use_fw1.sh"
    notify_backup "/etc/keepalived/use_fw2.sh"
}

/etc/keepalived/use_fw1.sh

#!/bin/bash
logger -t KEEPALIVED "Using FW1  via ens6f0"
ip route replace default via x.x.x.x dev ens6f0 metric 10

/etc/keepalived/use_fw2.sh

#!/bin/bash
logger -t KEEPALIVED "Switching to FW2 via ens6f1"
ip route replace default via x.x.x.y dev ens6f1 metric 20

chmod +x /etc/keepalived/*.sh
systemctl restart keepalived

How It Works

The solution uses weighted priorities to determine failover:

Priority 100 (initial): Both firewalls reachable → Use FW1
Priority 70: FW2 unreachable (-30) → Stay on FW1
Priority 50: FW1 unreachable (-50) → Switch to FW2 (BACKUP state)
Priority 20: Both unreachable (-80) → FAULT state

When an interface goes down, the fping check automatically fails (can't ping through a down interface), triggering the appropriate failover.

the_rock

Thats amazing, thanks for letting us know @yesok ...great work!

Best,
Andy

Are you a member of CheckMates?

Server interface configuration high availability interfaces

Current Setup:

The Problem:

My Questions:

What I've Tried:

Environment:

Configuration

/etc/keepalived/keepalived.conf

/etc/keepalived/use_fw1.sh

/etc/keepalived/use_fw2.sh

How It Works

Configuration

/etc/keepalived/keepalived.conf

/etc/keepalived/use_fw1.sh

/etc/keepalived/use_fw2.sh

How It Works