Hi All,
At a customer site, I have created a R80.30 ClusterXL cluster with jumbo take 155 which is working fine. All is OK when checking the cluster with 'cphaprob stat', 'cphaprob -l list' and 'cphaprob -a if'. The connection table is also synced, so the cluster seems OK.
But when we perform a fail-over with 'clusterXL_admin down' on the active member, we loose connections on one specific VLAN. On the other interfaces and VLAN's no problems are reported when we perform a fail-over.
Our first impression was the layer 3 devices in that network do not act on the gratuitous ARP being send. But when I manually send a G-ARP into the network, all connections via that VLAN are restored. I used the following to send the G-ARP
echo 1 > /proc/sys/net/ipv4/ip_nonlocal_bind <---- 0=off, 1=on
arping -c 4 -A -I eth3 10.10.10.10
We have a computer with Wireshark in that VLAN and when we perform a fail-over with 'clusterXL_admin down', we do not see the G-ARP packets. When we manually send the G-ARP, we can see these packets in Wireshark.
I have check for know issues with ARP or G-ARP in jumbo hotfixes, but I cannot find anything.
Someone has seen this before? It is very strange because it is on one VLAN only.
Regards,
Martijn