Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Vladimir
Champion
Champion

Forcing Gratuitous ARP (G-ARP) from ClusterXL with vMAC

While there are few existing threads discussing G-ARP, the solutions provided there do not seem to work for this situation.

I also think that this a scenario is encountered often enough to have its own thread.

 

The scenario is a pending HA cluster hardware swap. The goal is to avoid the 4 hour arp cache expiration problem.

arping does not work for vMAC.

Nor does it seem the fw ctl set int test_arp_refresh 1

Tested as follows (public IPs are fake, R81.10):

Expected G-ARP packet capture for connected router provoked by “arping -c 4 -A -I eth4 200.100.0.2” from one of the cluster members:

 

root@router:/home/vyos# tcpdump -ni eth1 -c4 broadcast and arp and arp[6:2] == 2

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode

listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes

12:52:09.345639 ARP, Reply 200.100.0.2 is-at 08:00:27:f5:0e:39, length 46

12:52:10.345408 ARP, Reply 200.100.0.2 is-at 08:00:27:f5:0e:39, length 46

12:52:11.346263 ARP, Reply 200.100.0.2 is-at 08:00:27:f5:0e:39, length 46

12:52:12.346336 ARP, Reply 200.100.0.2 is-at 08:00:27:f5:0e:39, length 46

4 packets captured

4 packets received by filter

0 packets dropped by kernel

root@router:/home/vyos#

 

 

Output of the same on the cluster member's interface connected to the router (same on both members):

 

[Expert@CPCM1:0]# tcpdump -ni eth4 -c4 broadcast and arp and arp[6:2] == 2

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode

listening on eth4, link-type EN10MB (Ethernet), capture size 262144 bytes

12:53:30.011708 ARP, Reply 200.100.0.2 is-at 08:00:27:f5:0e:39, length 28

12:53:31.012288 ARP, Reply 200.100.0.2 is-at 08:00:27:f5:0e:39, length 28

12:53:32.013320 ARP, Reply 200.100.0.2 is-at 08:00:27:f5:0e:39, length 28

12:53:33.013524 ARP, Reply 200.100.0.2 is-at 08:00:27:f5:0e:39, length 28

4 packets captured

4 packets received by filter

0 packets dropped by kernel

[Expert@CPCM1:0]#

 

 

I presume, we are expecting to see the same, but with the vMAC when we are using "fw ctl set int test_arp_refresh 1"

 

 

But when we are doing it:

 

[Expert@CPCM1:0]# fw ctl set int test_arp_refresh 1

[Expert@CPCM1:0]#

 

 

We are not seeing anything:

 

[Expert@CPCM1:0]# tcpdump -ni eth4 -c4 broadcast and arp and arp[6:2] == 2

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode

listening on eth4, link-type EN10MB (Ethernet), capture size 262144 bytes

 

 

When executing failover on the active cluster member:

 

[Expert@CPCM1:0]# cphaprob stat

 

Cluster Mode:   High Availability (Active Up) with IGMP Membership

 

ID         Unique Address  Assigned Load   State          Name                 

 

1 (local)  192.168.255.2   100%            ACTIVE         CPCM1

2          192.168.255.3   0%              DOWN           CPCM2

 

 

Active PNOTEs: None

 

Last member state change event:

   Event Code:                 CLUS-114904

   State change:               ACTIVE(!) -> ACTIVE

   Reason for state change:    Reason for ACTIVE! alert has been resolved

   Event time:                 Wed Aug 11 12:26:42 2021

 

Last cluster failover event:

   Transition to new ACTIVE:   Member 2 -> Member 1

   Reason:                     ADMIN_DOWN PNOTE

   Event time:                 Wed Aug 11 11:21:20 2021

 

Cluster failover count:

   Failover counter:           2

   Time of counter reset:      Wed Aug 11 10:29:31 2021 (reboot)

 

 

[Expert@CPCM1:0]#

 

 

With vMAC configured:

 

[Expert@CPCM1:0]# cphaprob -a if

 

CCP mode: Manual (Unicast)

Required interfaces: 3

Required secured interfaces: 1

 

 

Interface Name:      Status:

 

eth0                 UP

eth3 (S)             UP

eth4                 UP

 

S - sync, LM - link monitor, HA/LS - bond type

 

Virtual cluster interfaces: 2

 

eth0            10.0.0.1         VMAC address: 00:1C:7F:00:33:61

eth4            200.100.0.1         VMAC address: 00:1C:7F:00:33:61

 

[Expert@CPCM1:0]#

 

 

Weirder yet, is that I am not seeing G-ARP on either cluster member when failing over successfully:

 

 

[Expert@CPCM1:0]# clusterXL_admin down

This command does not survive reboot. To make the change permanent, run either the 'set cluster member admin {down|up} permanent' command in Gaia Clish, or the 'clusterXL_admin {down|up} -p' command in Expert mode

Setting member to administratively down state ...

Member current state is DOWN

[Expert@CPCM1:0]#

 

 

We are not seeing G-ARP requests on either cluster members, (contrary to what was implied in the CheckMated thread https://community.checkpoint.com/t5/Security-Gateways/How-to-send-G-ARP-manually/m-p/69914)

 

[Expert@CPCM1:0]# tcpdump -ni eth4 -c4 broadcast and arp and arp[6:2] == 2

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode

listening on eth4, link-type EN10MB (Ethernet), capture size 262144 bytes

 

[Expert@CPCM2:0]# tcpdump -ni eth4 -c4 broadcast and arp and arp[6:2] == 2

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode

listening on eth4, link-type EN10MB (Ethernet), capture size 262144 bytes

 

While actual failover is taking place successfully:

 

[Expert@CPCM1:0]# cphaprob stat

 

Cluster Mode:   High Availability (Active Up) with IGMP Membership

 

ID         Unique Address  Assigned Load   State          Name                 

 

1 (local)  192.168.255.2   0%              DOWN           CPCM1

2          192.168.255.3   100%            ACTIVE         CPCM2

 

 

Active PNOTEs: ADMIN

 

Last member state change event:

   Event Code:                 CLUS-111400

   State change:               ACTIVE -> DOWN

   Reason for state change:    ADMIN_DOWN PNOTE

   Event time:                 Wed Aug 11 13:08:48 2021

 

Last cluster failover event:

   Transition to new ACTIVE:   Member 1 -> Member 2

   Reason:                     ADMIN_DOWN PNOTE

   Event time:                 Wed Aug 11 13:08:47 2021

 

Cluster failover count:

   Failover counter:           3

   Time of counter reset:      Wed Aug 11 10:29:31 2021 (reboot)

 

 

[Expert@CPCM1:0]#

 

 

[Expert@CPCM2:0]# cphaprob stat

 

Cluster Mode:   High Availability (Active Up) with IGMP Membership

 

ID         Unique Address  Assigned Load   State          Name                 

 

1          192.168.255.2   0%              DOWN           CPCM1

2 (local)  192.168.255.3   100%            ACTIVE         CPCM2

 

 

Active PNOTEs: None

 

Last member state change event:

   Event Code:                 CLUS-114704

   State change:               STANDBY -> ACTIVE

   Reason for state change:    No other ACTIVE members have been found in the cluster

   Event time:                 Wed Aug 11 13:08:47 2021

 

Last cluster failover event:

   Transition to new ACTIVE:   Member 1 -> Member 2

   Reason:                     ADMIN_DOWN PNOTE

   Event time:                 Wed Aug 11 13:08:47 2021

 

Cluster failover count:

   Failover counter:           3

   Time of counter reset:      Wed Aug 11 10:29:31 2021 (reboot)

 

 

[Expert@CPCM2:0]#

[Expert@CPCM2:0]# cphaprob stat

 

Cluster Mode:   High Availability (Active Up) with IGMP Membership

 

ID         Unique Address  Assigned Load   State          Name                  

 

1          192.168.255.2   0%              DOWN           CPCM1

2 (local)  192.168.255.3   100%            ACTIVE         CPCM2

 

 

Active PNOTEs: None

 

Last member state change event:

   Event Code:                 CLUS-114704

   State change:               STANDBY -> ACTIVE

   Reason for state change:    No other ACTIVE members have been found in the cluster

   Event time:                 Wed Aug 11 13:08:47 2021

 

Last cluster failover event:

   Transition to new ACTIVE:   Member 1 -> Member 2

  Reason:                     ADMIN_DOWN PNOTE

   Event time:                 Wed Aug 11 13:08:47 2021

 

Cluster failover count:

   Failover counter:           3

   Time of counter reset:      Wed Aug 11 10:29:31 2021 (reboot)

 

 

[Expert@CPCM2:0]#

 

It'll be great to hear from someone who has tackled this issue successfully in the field.

Thank you,

Vladimir

0 Kudos
3 Replies
Timothy_Hall
Champion
Champion

This is possible using a tool called scapy which is not included with Gaia but can be downloaded here: https://scapy.net/

CAUTION: Adding extra Linux software to Gaia is UNSUPPORTED and may result in very bad things.  You have been warned.

Scapy utilizes Python & libpcap which is built in to Gaia, so it doesn't need to be compiled with gcc or anything like that.  The following was performed on a R81 gateway with Gaia 3.10 in the Shadow Peak lab, here is the scapy code broken out from the screenshot:

SPOOF_MAC = '02:02:02:02:02:02'
BCAST_MAC = 'ff:ff:ff:ff:ff:ff'
SPOOFIPADDR = '192.0.2.222'

from scapy.all import *

send(ARP(op=2,psrc=SPOOFIPADDR,pdst=SPOOFIPADDR,hwsrc=SPOOF_MAC,hwdst=BCAST_MAC))

scapy.png

And the resulting forged packet as seen in cppcap:

scapy_tcpdump.png

 

New 2021 IPS/AV/ABOT Self-Guided Video Series
now available at http://www.maxpowerfirewalls.com
Vladimir
Champion
Champion

Thank you Tim. I've seen references to scapy before, but it is nice to actually see it working.

Theoretically, snapshotting pre-deployed cluster member, using scapy and reverting to the snapshot should do the trick.

I doubt that the banking and insurance clients would go for using 3rd party tool though.

It'll be nice to have Check Point providing native solution for this issue, as it'll make transition from competing solutions less of a headache. Not to mention simply enabling vMAC feature.

0 Kudos
Vladimir
Champion
Champion

Also, the sk50840 suggests that there is a G-ARP sourcing from VMAC:

"With VMAC mode, the G-ARP packet is the only packet which is sourced from the Virtual MAC address. Upon failover, the connected switch learns the port location of the new active member using the MAC Learning process and the switch updates it's CAM table with the new port location. It learns the new location (switch port) of the VMAC, and directs the frames to the new location. In this way, the router does not need to do any work to result in the failover of traffic."

Would be nice to get some clarity from Check Point as to what happened to it...

0 Kudos