Connection with gateway cluster is lost from the p...

_ladislao_ · ‎2023-05-16

Hi everyone!

The current infrastructure has a following build-up (R80.40): two management servers in the HA pair, two clustered gateways (6000 appliances; active & standby), and a standalone gateway. Previously, the HQ SMS suffered a failover to the DR SMS. We had to create a fresh instance of the HQ SMS, enable the HA pair again (sync was made from the DR to the new HQ), and switch back the places of the primary and secondary SMS (HQ -> to primary; DR -> to secondary).

As of now, HQ SMS tries to communicate with the cluster through the direct IP of the cluster member and has the following issue:

DR SMS (secondary) does not have any connectivity issues and speaks to the cluster member through a management IP (e.g, 192.168.2.1) configured under the Network Management tab:

SIC is working properly (Trust Established for the HQ SMS). The logs keep coming from the cluster to the HQ SMS on port 257. Whenever the SIC status is tested, policy is being pushed or interfaces are being fetched, the SmartConsole goes idle and throws errors, specifying that there is no connection between the SMS and the cluster (e.g., on port 18191).

Taking into consideration that the Network Management is identical between the HQ and DR, is there a way to specify from the HQ SMS side the management IP (same as on DR side) as the proper IP address to establish the communication with the cluster?

PhoneBoy · ‎2023-05-16

The correct IP to use should be pushed as part of a successful policy install.
If for some reason this isn't working correctly, you may need to unload the policy on the gateway (using fw unloadlocal), then push policy.
As this will unload the security policy and likely drop traffic, this will need to be done during a maintenance window.

Otherwise, I recommend engaging the TAC: https://help.checkpoint.com

_ladislao_ · ‎2023-05-17

Thank you for the feedback! We previously pushed a policy from the newly created HQ SMS to the cluster, but that didn't help. Maybe, as you mentioned, the overall policy unload from the cluster might be the case.

JFYI, the cluster members have the IPs set to 10.0.0.1 and 10.0.0.2. Both of them contain a set of identical interfaces configured under the Network Management tab. One of those interfaces is the management interface with the IPs set to 192.168.1.1 (for node-1) and 192.168.1.2 (for node-2). The HQ is now trying to communicate directly to the 10.0.0.1 and 10.0.0.2, while the DR SMS is using the 192.168.1.1 and 192.168.1.2 for connection. Now, in order to perform any policy push activity, we have to open a direct route to the 10.0.0.1 and 10.0.0.2, which was previously never done since both of the SMS were communicating through that management interface. Now it's only DR that follows the desired approach, while HQ is trying to connect through the direct IP address of the cluster.

PhoneBoy · ‎2023-05-17

Curious why having the routes there is an issue, especially if it resolves the issue?

_ladislao_ · ‎2023-05-18

I assume because the owner of the infrastructure was using that dedicated management IP on the cluster before the HQ failover. Now, they do not want to alter the networking part and open a direct route to the cluster members for the HQ SMS only, since the DR SMS is still communicating through the cluster management IP.

PhoneBoy · ‎2023-05-18

It could also be an intermediary firewall or similar that's blocking access.
Have you done a tcpdump to see if the traffic is sent/received correctly on port 18191 on the source and target?

the_rock · ‎2023-05-18

I agree with @PhoneBoy .Please run tcpdump and fw monitor and follow the packet flow. If you need help with filters, let us know and I can send you some good examples.

You can also refer to this website that my colleague made ages ago to help with captures on various vendor firewalls -> https://www.tcpdump101.com

Andy

the_rock · ‎2023-05-16

In my experience, 9 times out of 10, this could be wrong route problem. Maybe do basic route check on both cluster members, say ip r g 8.8.8.8 or ip r g mgmt_Ip_address and make sure its correct.

Andy

_ladislao_ · ‎2023-05-17

Thank you for the advice, but we checked the low-level routing configurations on both DR and HQ SMS (they are identical and were previously synced from the DR to HQ) and didn't find any drawbacks. The ip route contains all the necessary details, plus since the logs are being pushed from the cluster to the HQ SMS, I think there is at least some connectivity between the appliances established.

minhth · ‎2024-03-07

Do you fixed this yet? I've got the same error and still find solution. Thanks

Are you a member of CheckMates?

Connection with gateway cluster is lost from the primary Management Server