I'm looking into a strange failover issue with a pair of 5200 gateways. The config is as follows:
Gateways clustered using clusterXL. A single port of each is configured as the WAN (so no bonding) and these are connected to a cisco switch in layer 2 mode. The single internet router is also connected to this switch. A single port on each is configured for the LAN (so again no bonding), these are connected to a cisco switch in layer 3 mode (this is the network core switch). The gateways are running R81.20 Take 92, but this issue has been around through several JHF and even in pervious Gaia R80.30.
All internet traffic in and out flows through these gateways. There are several externally available services including several web servers and a third party remote access server. These are configured in what to me is an unusual way. There are 2 objects for each, one object has the internal address, the other has the external address, and there are a pair of manual NAT rules for each pair. Then, the external addresses are all added as alias addresses to the external interface in Gaia.
In normal operation with the primary up, everything works fine, however, when you failover to the secondary the webservers and the remote access service are not accessible from outside (although you can telnet to the RAS box!). Outbound traffic remains unaffected. Failing back resolves the issue. The issue appears and disappears almost instantly.
I suspect either NAT or ARP, however switching the cluster to virtual MAC and rebooting the internet router to clear the ARP table didn't have any effect.
I need to get some out of hours downtime to troubleshoot this and on site as it disconnects my remote session when failed over.
Any suggestions would be welcome 🙂
Thanks