Kaspars Zibarts

CMA IPs not responding to ARP broadcasts

Discussion created by Kaspars Zibarts on Jan 21, 2018

Just thought in case someone else is using Cisco VXLAN in their network and have MDS.

 

We run across issue where after MDS was started and all processes came up, none of the CMA virtual IPs where responding to gateways or HA MDS whilst main MDS IP work just fine. I can note that all IPs (CMAs and MDS) are in the same subnet - so only L2 involved.

 

As it turned out it is known Cisco bug (Cisco Bug: CSCvg07375 - IR EVPN: BUM traffic gets dropped on ingress Leaf after route change in Underlay ) for broadcast packets in the VLAN. Due to this bug ARP broadcast packets may not reach all hosts in the VLAN if they are connected to different switches. We are on version nxos.7.0.3.I7.2 and it still does not work even though Cisco says it was fixed in previous version.

 

To give you a simple example: our primary MDS is connected to one switch and HA MDS to another switch, both in the same L2 subnet / VLAN. Primary MDS was stop/started whilst doing mds_backup with -s flag, meaning all virtual CMA IPs were removed and then re-added. After that we were not able to ping CMAs on primary MDS from HA box.

 

Quick manual solution is manual arping from affected MDS for each CMA. This will populate local switch MAC-IP table and forward it all relevant VXLAN switches. 

   arping -c 2 -s <ip_of_dead_CMA> <default_gw_IP>

 

If you want a little more permanent solution, you may (at your own risk!) modify script that brings up CMA virtual IP:

   $MDSDIR/scripts/fwvirtualon

 

I changed Linux part as follows in the green part

   

else if ( "X$_OS" == "XLinux" ) then
   $MDSDIR/scripts/get_netmask $ip
   if ($status) then
      set netmask = `/sbin/ifconfig $inter | awk ' $1 == "inet" { split ($4,x,":");       print x[2] } '`
   else
      set netmask = `cat $MDSDIR/tmp/netmask`
   endif
   if ( ! ($?noecho) ) echo /sbin/ifconfig "$inter":$i $ip netmask $netmask
      /sbin/ifconfig "$inter":$i $ip netmask $netmask

      

      #Additional ARP to force Cisco VXLAN switch to learn CMA VIP
      set def_gw = `/bin/grep "routed:instance:default:static:default:gateway:address:"       /config/db/initial | /bin/awk -F: '{print $8}' | /bin/awk '{print $1}'`
      /sbin/arping -c 2 -s $ip $def_gw

 

    if ($status) exit 2
    if ( ! ($?noecho) ) then
      echo Virtual IP address "$inter":$i $ip added
    endif
endif

 

More read about this script here sk44773

Now when running mdsstart script you will notice additional arping for each CMA VIP

 

[Expert@mdms02:0]# mdsstart
Starting cpWatchDog
Starting CPM Server ...
[1] 30395
CPM Server is running.
Start Search Infrastructure...
index mode was set to true
startsearch: dbsync does not run on Multi-Domain Security Management
cpwd_admin:
Process SOLR started successfully (pid=31122)
Starting RFL ...
cpwd_admin:
Process RFL started successfully (pid=31139)
Starting SmartView ...
cpwd_admin:
Process SMARTVIEW started successfully (pid=31210)
Start Log Indexer...
cpwd_admin:
Process INDEXER started successfully (pid=31731)
Start SmartLog Server...
cpwd_admin:
Process SMARTLOG_SERVER started successfully (pid=31998)

Starting Domain Management Servers
Adding Virtual IPs
.ARPING 11.22.33.254 from 11.22.33.134 eth0
Unicast reply from 11.22.33.254 [00:08:E3:FF:FD:90] 1.046ms
.ARPING 11.22.33.254 from 11.22.33.122 eth0
Unicast reply from 11.22.33.254 [00:08:E3:FF:FD:90] 0.976ms
.ARPING 11.22.33.254 from 11.22.33.238 eth0
Unicast reply from 11.22.33.254 [00:08:E3:FF:FD:90] 40.113ms
.ARPING 11.22.33.254 from 11.22.33.162 eth0
Unicast reply from 11.22.33.254 [00:08:E3:FF:FD:90] 1.107ms
.ARPING 11.22.33.254 from 11.22.33.116 eth0

...

Initialize starting of Domain Management Servers: 1 out of 18
Initialize starting of Domain Management Servers: 2 out of 18
Initialize starting of Domain Management Servers: 3 out of 18
Initialize starting of Domain Management Servers: 4 out of 18

 

Not perfect but a workaround whilst waiting for Cisco to come back.

 

Outcomes