We have had an issue since the beginning with our ESXi gateways where policy push randomly fails with the error: "Installation failed. Reason: TCP connectivity failure ( port = 18191 )( IP = 10.90.45.30 )[error no. 10].". Sometimes policy push works fine, the next minute we'll get the error. It doesn't really matter when you're pushing, if it's just Access or Threat policy, it's a roll of the die if it will work.
We have six separate ESXi hosts, one gateway on each that make up three separate clusters. This issue happens on three out of the six gateways. In one cluster, if you set the "backup" gateway to primary, it pushes 100% of the time. Another cluster works 100% of the time regardless of which is the primary. Comparing the configuration between all of them, we cannot see any real difference between them. We thought it was TSO or something but no luck. Management isn't the source of the issue because it can push just fine to some of the systems 100% of the time. There's no firewall between Management and the gateways.
Also, you cannot transfer large files from the gateways, anything over about 10MB fails with an error like "Incorrect MAC received on packet" with WinSCP, etc. Get a similar error with Putty sometimes. Note, that's MAC as in "message authentication code" with SSH.
All of the systems that work are Dell. The rest are HP and one Fujitsu. We've dug into the BIOS to see if there were NIC settings, etc, but can't find anything.
All of the VMs use VMXNET3. We have Promiscuous mode, MAC address changes and Forged transmits enabled on the Port Groups. E1000 NICs don't help.
Using ethtool -S nic# we see that the Tx Queue "ring full" is > 0 and Rx Queue "pkts rx out of buf" is > 0 on all gateways that have issues. Working gateways those values are 0.