So, I encountered a weird cluster issue today.
Two clusters consisting of two members each, running on VMware ESXI on Dell servers (Don't remember the model but they are new).
Interfaces: Uplink is 10gb, Internal interfaces are 40gb and sync is 1gb. They use Intel cards (not default broadcom which have bad drivers for CheckPoint I heard).
Management is Smart-1 appliance
Both management and gateways run R80.10.
The license is for vSec.
One cluster is set up as ClusterXL HA, but the cluster is down because interfaces are not matching on members. This should be easy to fix.
Other cluster is set up as ClusterXL HA, it shows all OK in dashboard, but cphaprob state shows Load sharing Unicast. When I look at the traffic etc it is indeed doing load sharing.
Where do I begin and what the hell happened here? ^^,
Today I was going to troubleshoot a customers cluster with sync problems and stateful inspection issues where return traffic is dropped on TCP high ports.
I looked over the cluster and pretty soon I found that the two members had non-matching interfaces. Also, the Hardware and Version were wrong in the dashboard cluster object. From what I know, having EthX on one side and EthY on the other and then joining them in a cluster interface will break the cluster, but what effect does selecting the wrong hardware and version have? OpenServer when it's a virtual gateway (vSec) and R80 when it's R80.10?
Also, it turned out that the previously described TCP high-ports issue was on another identical cluster which was seemingly healthy from dashboard view. It also had OpenServer and R80 set on the cluster object. Judging from that, the way those parameters are set up here don't cause problems. Or do they? When I looked at the cluster object it clearly stated SecureXL HA. Just to check how it was doing I SSH:ed to the gateways and checked cphaprob state. To my surprise it stated Load sharing unicast! The other member stated the very same. My client had told me just moments before, that not only did they have these drops but packets from the same session also appear on both gateways causing further problems. That shouldn't be possible given a HA setup I thought. But here was proof it could indeed be the case because it was doing load sharing under the hood!
I'm so puzzled about this. I did some searching tonight but can't find anybody else who encountered the same behavior.
Even if you might not have an answer I'm glad you read and hope you had a Lol . But if you do have a clue or have relevant questions, please let me hear.