Re: GCP Cloudguard GW manual-failover

AkosBakos · ‎2024-11-25

Hi Mates,

I have some questions about HA failover in cloud.

Is there anybody here who is expert in GCP?

Akos

----------------
\m/_(>_<)_\m/

Nir_Shamir · ‎2024-11-25

Ask away 🙂 .

AkosBakos · ‎2024-11-25

Thanks, Great!

So I have a basic setup:

Interlnet -> HA cloudguard custer -> client inside.

I initiate a traffic eg. SSH sesion to the internet. If I do a clusterXL_admin down on the active member, the SSH disconnects.

It seems the connection does not sync to the standby member.

Are there any issues around this?

Akos

----------------
\m/_(>_<)_\m/

Nir_Shamir · ‎2024-11-25

Cluster HA in Google Works like a regular Cluster but there are external Google things you need to check before and after the failover.

first, on the Internal VPC the default route should point to the ACTIVE member. I guess that works otherwise you wouldn't have any connection.

When you failover, our GW sends an API call to Google Cloud which tells it to change the default route to the new ACTIVE member.

So first check if this happens. Also check if on the External VPC subnet the "Private Google API access" is enabled.

AkosBakos · ‎2024-11-25

Yes, it is set.

----------------
\m/_(>_<)_\m/

AkosBakos · ‎2024-11-25

I attach a basic topology

----------------
\m/_(>_<)_\m/

AkosBakos · ‎2024-11-25

A small clarification:

"It seems the connection does not sync to the standby member."

In case of manual failver (clusterXL_adn down), the packet flow is changed. The outgoing packet flow through the Active member, but the reply packets in this session flow through on the stanby member.

It cause assymentric route.

Akos

----------------
\m/_(>_<)_\m/

Nir_Shamir · ‎2024-11-25

when you failover , check the health check on the LB screen.

does it see the correct member as healthy (should be the ACTIVE member).

also have you configured the right things in order to use an LB ?

you need to monitor port TCP 8117 and make sure you have this kernel parameter activated on the GWs:

fw ctl get int cloud_balancer

should return 8117

if not add it

fw ctl set -f int cloud_balancer 8117

Rivka-Strilitz · ‎2024-11-25

It sounds like you might be using the nic0 external IP for SSH. Could that be the case?
This is the IP that gets switched between members during failover which is why the SSH connection gets lost.
Try connecting via SSH using NIC1 instead.

If that's not the case, then I think Nir's suggestion is a great place to start the investigation.

AkosBakos · ‎2024-11-25

It sounds like you might be using the nic0 external IP for SSH. Could that be the case?

No, because I NAT to the external IP of the loadbalancer in VPC

Akos

----------------
\m/_(>_<)_\m/

Are you a member of CheckMates?

GCP Cloudguard GW manual-failover