I'm hoping someone can clarify my diagnosis here!
I'm working on what should be a simple upgrade, VMware management server, and a pair of 5200 gateways in an active standby clusterXL, from R80.30 to R81.20.
Build a new management server VM and copied the config over using Migrate_Server, no problems, so onto the cluster. CPUSE fresh install on the standby member, activate MVC, push access policy, check the Gaia config, all looked good. However first problem, this gateway cannot contact the Checkpoint download server for updates. Bit of troubleshooting and I find it can talk to anything internal but nothing external, so for example I can ping an internal server but not 8.8.8.8. More toubleshooting, checked the routing table using the cli command ip route show, and I spot that there is no default route showing. Check the static routes in Gaia, and its showing correctly. Rebooted, no change. Deleted the default route in Gaia, and readded it, and it then showed in the ip show route output and everything appears to be ok again, can ping external and contact the update servers. More testing, rebooting, all appears to be ok. Only explanation I can come up with for this is that there was a corruption with the default route.
So now ready to do manual failover to allow other member to be upgraded.
clusterXL_admin down, and instantly all network traffic stops, clusterXL_admin up and everything flows normally again. Nothing abnormal in the logs.
I'm assured that the failover worked last time, but I must admin I didn't test it before I started.
Ran admin down again, traffic stopped (running ping verifies this instantly), and ran a packet capture for a short time on the upgraded gateway before running admin up again.
Looking at the capture file it is showing all packets outgoing to the internet as malformed due to incorrect packet lengths, internal ones are ok.
I have now also deleted and readded all static routes, but that didn't help.
Never seen this before, my working theory is that this is not related to the upgrade, but to something external on the WAN interface side, so a faulty port or cable maybe?