Re: Traffic to secondary member of ClusterXL is dr...

dphonovation · ‎2022-12-02

I have the following:

Member1-Site1: 10.10.171.2/24 Member1-Site2: 10.20.171.2/24

Member2-Site2: 10.10.171.3/24 Member2-Site2: 10.20.171.3/24

VIP: 10.10.171.1 VIP: 10.20.171.1

Site 2 Site Tunnel 1 Encryption Domain: 10.11.171.0/24. Site1 has a Cluster VIP here of 10.11.171.1

Site 2 Site Tunnel 2 Encryption Domain: 10.12.171.0/24. Site2 has a Cluster VIP here of 10.12.171.1

Across that IPSEC tunnel I have a Checkpoint Native VxLan interface pointed at back at the opposite cluster:

Member1-Site1: 172.31.0.2/29 Member1-Site1: 172.31.0.5/29

Member1-Site1: 172.31.0.3/29 Member2-Site2: 172.31.0.6/29

VxLan VIP Site1: 172.31.0.1 VxLan VIP Site2: 172.31.0.4

Remote addr: 10.12.171.1 Remote addr: 10.11.171.1

I then have a route from Site1: route 10.20.171.0/24 via 172.31.0.4

And a route from Site2 back: route 10.10.171.0/24 via 172.31.0.1

This works perfectly. I can reach all hosts on 10.10.171.0/24 or 10.20.171.0/24 from either side - except for traffic headed to the standby member in the ClusterXL on the destination net.

Can anyone shed light on why this might be the case?

the_rock · ‎2022-12-02

If you do simple zdebug what do you see? Also, if you issue command ip r g x.x.x.x (IP you are trying to reach), does it look same as one that does work?

Andy

dphonovation · ‎2022-12-02

just saw this in zdebug. A clue!:

@;1464977;[vs_0];[tid_0];[fw4_0];fw_log_drop_ex: Packet proto=6 10.10.171.4:44698 -> 10.20.171.3:18192 dropped by fwha_ccl_inbound_late_do Reason: Dropping dynamic routing packet forwarded to wrong member.;
@;1465051;[vs_0];[tid_0];[fw4_0];fw_log_drop_ex: Packet proto=6 10.10.171.12:55319 -> 10.20.171.3:8443 dropped by fwha_ccl_inbound_late_do Reason: Dropping dynamic routing packet forwarded to wrong member.;

This doesn't seem to help either (tried on all members)
fwha_forw_packet_to_not_active to 1

the_rock · ‎2022-12-02

That would appear to be something routing related, for sure. What is output of ip route get for IP you are testing on both members?

dphonovation · ‎2022-12-02

On Site 1 FW1:

[Expert@cp-fw1-site1:0]# ip r g 10.20.171.3
10.20.171.3 via 172.31.0.4 dev vxlan7 src 172.31.0.2
cache
[Expert@cp-fw1-site1:0]# ip r g 10.20.171.2
10.20.171.2 via 172.31.0.4 dev vxlan7 src 172.31.0.2

On Site 1 FW2:

[Expert@cp-fw2-site1:0]# ip r g 10.20.171.3
10.20.171.3 via 172.31.0.4 dev vxlan7 src 172.31.0.3
cache
[Expert@cp-fw2-site1:0]# ip r g 10.20.171.2
10.20.171.2 via 172.31.0.4 dev vxlan7 src 172.31.0.3
cache

On Site 1 FW1:

[Expert@cp-fw1-site2:0]# ip r g 10.10.171.3
10.10.171.3 via 172.31.0.1 dev vxlan7 src 172.31.0.5
cache
[Expert@cp-fw1-site2:0]# ip r g 10.10.171.2
10.10.171.2 via 172.31.0.1 dev vxlan7 src 172.31.0.5

On Site 1 FW2:

[Expert@cp-fw2-site2:0]# ip r g 10.10.171.3
10.10.171.3 via 172.31.0.1 dev vxlan7 src 172.31.0.6
cache
[Expert@cp-fw2-site2:0]# ip r g 10.10.171.2
10.10.171.2 via 172.31.0.1 dev vxlan7 src 172.31.0.6
cache

the_rock · ‎2022-12-02

That seems correct. I also found below, but you already said you changed the value. Lets see what others have to say.

https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solut...

Also, just wondering, if you compare the traceroute of working and non-working one, where is it failing?

Andy

dphonovation · ‎2022-12-03

Well, the zdebug drop is being shown on the active member of the opposite site.

dphonovation · ‎2022-12-03

And this is the traceroutes:

cp-mgmt-site1> traceroute 10.20.171.2
traceroute to 10.20.171.2 (10.20.171.2), 30 hops max, 40 byte packets
1 10.10.171.2 (10.10.171.2) 2.053 ms 1.682 ms 2.024 ms
2 10.20.171.2 (10.20.171.2) 20.622 ms 20.580 ms 20.607 ms
cp-mgmt-site1> traceroute 10.20.171.3
traceroute to 10.20.171.3 (10.20.171.3), 30 hops max, 40 byte packets
1 10.10.171.2 (10.10.171.2) 2.360 ms 1.796 ms 2.323 ms
2 * * *
3 * * *

while the other side's active member is logging the afroomentioned drops.

What's weird is that the security gateways can ping the standby fine; but I think this is due to an auto NAT rule.

dphonovation · ‎2022-12-03

There is something about routing to the vxlan interface from the standby. Oddly, the standby member can ping both active/standby on the other side. But it cannot ping the management server (10.10.171.4):

In this case, FW2 at Site 2 (in standby) is trying to reach a CP MGMT box on the other side via ping.

FW2 at Site 2 is responding with ICMP unreachable from the IP of its member on the Clustered VxLan interface.

[Expert@cp-fw2-site2:0]# ifconfig vxlan7

vxlan7 Link encap:Ethernet HWaddr 0E:61:40:26:DB:26

inet addr:172.31.0.6 Bcast:172.31.0.7 Mask:255.255.255.248

UP BROADCAST RUNNING MULTICAST MTU:8000 Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0

TX packets:2897 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:0 (0.0 b) TX bytes:81240 (79.3 KiB)

[Expert@cp-fw2-site2:0]# ip r g 10.10.171.4

10.10.171.4 via 172.31.0.1 dev vxlan7 src 172.31.0.6

cache

[Expert@cp-fw2-site2:0]# ping -c 1 172.31.0.6

PING 172.31.0.6 (172.31.0.6) 56(84) bytes of data.

64 bytes from 172.31.0.6: icmp_seq=1 ttl=64 time=0.079 ms

--- 172.31.0.6 ping statistics ---

1 packets transmitted, 1 received, 0% packet loss, time 0ms

rtt min/avg/max/mdev = 0.079/0.079/0.079/0.000 ms

[Expert@cp-fw2-site2:0]# ping 10.10.171.4

PING 10.10.171.4 (10.10.171.4) 56(84) bytes of data.

From 172.31.0.6 icmp_seq=1 Destination Host Unreachable

Whereas everyone else on-net with fw2-site2 (but using fw1 as its active and owns the default gateway vip) CANNOT ping the standby gateway on the other side; but can the mgmt server

Are you a member of CheckMates?

Traffic to secondary member of ClusterXL is dropped using VxLan