Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Vladimir
Champion
Champion

Forcing Gratuitous ARP (G-ARP) from ClusterXL with vMAC

While there are few existing threads discussing G-ARP, the solutions provided there do not seem to work for this situation.

I also think that this a scenario is encountered often enough to have its own thread.

 

The scenario is a pending HA cluster hardware swap. The goal is to avoid the 4 hour arp cache expiration problem.

arping does not work for vMAC.

Nor does it seem the fw ctl set int test_arp_refresh 1

Tested as follows (public IPs are fake, R81.10):

Expected G-ARP packet capture for connected router provoked by “arping -c 4 -A -I eth4 200.100.0.2” from one of the cluster members:

 

root@router:/home/vyos# tcpdump -ni eth1 -c4 broadcast and arp and arp[6:2] == 2

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode

listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes

12:52:09.345639 ARP, Reply 200.100.0.2 is-at 08:00:27:f5:0e:39, length 46

12:52:10.345408 ARP, Reply 200.100.0.2 is-at 08:00:27:f5:0e:39, length 46

12:52:11.346263 ARP, Reply 200.100.0.2 is-at 08:00:27:f5:0e:39, length 46

12:52:12.346336 ARP, Reply 200.100.0.2 is-at 08:00:27:f5:0e:39, length 46

4 packets captured

4 packets received by filter

0 packets dropped by kernel

root@router:/home/vyos#

 

 

Output of the same on the cluster member's interface connected to the router (same on both members):

 

[Expert@CPCM1:0]# tcpdump -ni eth4 -c4 broadcast and arp and arp[6:2] == 2

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode

listening on eth4, link-type EN10MB (Ethernet), capture size 262144 bytes

12:53:30.011708 ARP, Reply 200.100.0.2 is-at 08:00:27:f5:0e:39, length 28

12:53:31.012288 ARP, Reply 200.100.0.2 is-at 08:00:27:f5:0e:39, length 28

12:53:32.013320 ARP, Reply 200.100.0.2 is-at 08:00:27:f5:0e:39, length 28

12:53:33.013524 ARP, Reply 200.100.0.2 is-at 08:00:27:f5:0e:39, length 28

4 packets captured

4 packets received by filter

0 packets dropped by kernel

[Expert@CPCM1:0]#

 

 

I presume, we are expecting to see the same, but with the vMAC when we are using "fw ctl set int test_arp_refresh 1"

 

 

But when we are doing it:

 

[Expert@CPCM1:0]# fw ctl set int test_arp_refresh 1

[Expert@CPCM1:0]#

 

 

We are not seeing anything:

 

[Expert@CPCM1:0]# tcpdump -ni eth4 -c4 broadcast and arp and arp[6:2] == 2

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode

listening on eth4, link-type EN10MB (Ethernet), capture size 262144 bytes

 

 

When executing failover on the active cluster member:

 

[Expert@CPCM1:0]# cphaprob stat

 

Cluster Mode:   High Availability (Active Up) with IGMP Membership

 

ID         Unique Address  Assigned Load   State          Name                 

 

1 (local)  192.168.255.2   100%            ACTIVE         CPCM1

2          192.168.255.3   0%              DOWN           CPCM2

 

 

Active PNOTEs: None

 

Last member state change event:

   Event Code:                 CLUS-114904

   State change:               ACTIVE(!) -> ACTIVE

   Reason for state change:    Reason for ACTIVE! alert has been resolved

   Event time:                 Wed Aug 11 12:26:42 2021

 

Last cluster failover event:

   Transition to new ACTIVE:   Member 2 -> Member 1

   Reason:                     ADMIN_DOWN PNOTE

   Event time:                 Wed Aug 11 11:21:20 2021

 

Cluster failover count:

   Failover counter:           2

   Time of counter reset:      Wed Aug 11 10:29:31 2021 (reboot)

 

 

[Expert@CPCM1:0]#

 

 

With vMAC configured:

 

[Expert@CPCM1:0]# cphaprob -a if

 

CCP mode: Manual (Unicast)

Required interfaces: 3

Required secured interfaces: 1

 

 

Interface Name:      Status:

 

eth0                 UP

eth3 (S)             UP

eth4                 UP

 

S - sync, LM - link monitor, HA/LS - bond type

 

Virtual cluster interfaces: 2

 

eth0            10.0.0.1         VMAC address: 00:1C:7F:00:33:61

eth4            200.100.0.1         VMAC address: 00:1C:7F:00:33:61

 

[Expert@CPCM1:0]#

 

 

Weirder yet, is that I am not seeing G-ARP on either cluster member when failing over successfully:

 

 

[Expert@CPCM1:0]# clusterXL_admin down

This command does not survive reboot. To make the change permanent, run either the 'set cluster member admin {down|up} permanent' command in Gaia Clish, or the 'clusterXL_admin {down|up} -p' command in Expert mode

Setting member to administratively down state ...

Member current state is DOWN

[Expert@CPCM1:0]#

 

 

We are not seeing G-ARP requests on either cluster members, (contrary to what was implied in the CheckMated thread https://community.checkpoint.com/t5/Security-Gateways/How-to-send-G-ARP-manually/m-p/69914)

 

[Expert@CPCM1:0]# tcpdump -ni eth4 -c4 broadcast and arp and arp[6:2] == 2

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode

listening on eth4, link-type EN10MB (Ethernet), capture size 262144 bytes

 

[Expert@CPCM2:0]# tcpdump -ni eth4 -c4 broadcast and arp and arp[6:2] == 2

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode

listening on eth4, link-type EN10MB (Ethernet), capture size 262144 bytes

 

While actual failover is taking place successfully:

 

[Expert@CPCM1:0]# cphaprob stat

 

Cluster Mode:   High Availability (Active Up) with IGMP Membership

 

ID         Unique Address  Assigned Load   State          Name                 

 

1 (local)  192.168.255.2   0%              DOWN           CPCM1

2          192.168.255.3   100%            ACTIVE         CPCM2

 

 

Active PNOTEs: ADMIN

 

Last member state change event:

   Event Code:                 CLUS-111400

   State change:               ACTIVE -> DOWN

   Reason for state change:    ADMIN_DOWN PNOTE

   Event time:                 Wed Aug 11 13:08:48 2021

 

Last cluster failover event:

   Transition to new ACTIVE:   Member 1 -> Member 2

   Reason:                     ADMIN_DOWN PNOTE

   Event time:                 Wed Aug 11 13:08:47 2021

 

Cluster failover count:

   Failover counter:           3

   Time of counter reset:      Wed Aug 11 10:29:31 2021 (reboot)

 

 

[Expert@CPCM1:0]#

 

 

[Expert@CPCM2:0]# cphaprob stat

 

Cluster Mode:   High Availability (Active Up) with IGMP Membership

 

ID         Unique Address  Assigned Load   State          Name                 

 

1          192.168.255.2   0%              DOWN           CPCM1

2 (local)  192.168.255.3   100%            ACTIVE         CPCM2

 

 

Active PNOTEs: None

 

Last member state change event:

   Event Code:                 CLUS-114704

   State change:               STANDBY -> ACTIVE

   Reason for state change:    No other ACTIVE members have been found in the cluster

   Event time:                 Wed Aug 11 13:08:47 2021

 

Last cluster failover event:

   Transition to new ACTIVE:   Member 1 -> Member 2

   Reason:                     ADMIN_DOWN PNOTE

   Event time:                 Wed Aug 11 13:08:47 2021

 

Cluster failover count:

   Failover counter:           3

   Time of counter reset:      Wed Aug 11 10:29:31 2021 (reboot)

 

 

[Expert@CPCM2:0]#

[Expert@CPCM2:0]# cphaprob stat

 

Cluster Mode:   High Availability (Active Up) with IGMP Membership

 

ID         Unique Address  Assigned Load   State          Name                  

 

1          192.168.255.2   0%              DOWN           CPCM1

2 (local)  192.168.255.3   100%            ACTIVE         CPCM2

 

 

Active PNOTEs: None

 

Last member state change event:

   Event Code:                 CLUS-114704

   State change:               STANDBY -> ACTIVE

   Reason for state change:    No other ACTIVE members have been found in the cluster

   Event time:                 Wed Aug 11 13:08:47 2021

 

Last cluster failover event:

   Transition to new ACTIVE:   Member 1 -> Member 2

  Reason:                     ADMIN_DOWN PNOTE

   Event time:                 Wed Aug 11 13:08:47 2021

 

Cluster failover count:

   Failover counter:           3

   Time of counter reset:      Wed Aug 11 10:29:31 2021 (reboot)

 

 

[Expert@CPCM2:0]#

 

It'll be great to hear from someone who has tackled this issue successfully in the field.

Thank you,

Vladimir

0 Kudos
4 Replies
This widget could not be displayed.

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events