Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
samir-brkic
Participant

Advice Needed on ISP Redundancy and VPN Stability in Check Point HA Setup

Hello,

We are currently managing a Check Point cluster configured for High Availability (HA) with two members and are encountering an issue related to VPN stability. I would appreciate your advice on best practices to ensure optimal operation for our setup.

Current Configuration:

  • Data Centers: The Check Point nodes are deployed across two different data centers.
  • Sync and Internal Ports: These ports are connected through two separate switches, which are interlinked to ensure connectivity between the data centers.
  • External Ports: Each Check Point node has an external port connected directly to redundant ISP routers provided by a single ISP. The ISP manages failover on their end, and the ISP routers in each data center are interconnected to maintain redundancy.

Issue Description:

We are experiencing issues with Site-to-Site VPN connections dropping after a standby node reboot. Specifically, the Site-to-Site connections become non-functional, and we need to manually reset them using the command vpn tu with option "0" to re-establish the connections. This command serves as a workaround, but we are looking for a more permanent solution.

During our analysis, we considered that the issue might be related to the physical connection to the ISP routers. However, we could not find best practices for ISP redundancy in setups where multiple ISP routers are used within a single ISP's network. The official documentation primarily covers redundancy with two separate ISPs.

Any insights or recommendations you could provide regarding this issue would be greatly appreciated.

Thank you for your assistance!

Best regards,

Samir

0 Kudos
6 Replies
Duane_Toler
Advisor

Are you using the new "Active Active" mode with one node in each geographically-separated location and different subnets on the interfaces?  If so, VPN blade isn't supported in this fashion.

If not, what mode of ClusterXL are you using?

 

0 Kudos
samir-brkic
Participant

@Duane_Toler 

Thank you for your feedback. As mentioned in the description, we are currently managing a Check Point cluster configured for High Availability (HA) with two members, so we are using Active-Standby mode, not Active-Active.

Thanks again for your assistance!

0 Kudos
Duane_Toler
Advisor

Can you make a quick diagram of your topology?  Depending on how you have interfaces configured and connected, this may (or may not) be part of your issue.  My suspicion is that you are losing path reachability, or peer adjacency, during a failover and you need to use something like BGP between your gateways and ISP routers.  A quick diagram will help answer that with more certainty.

 

0 Kudos
samir-brkic
Participant

this is a quick diagramm of the cluster

0 Kudos
Timothy_Hall
Legend Legend
Legend

Can you please be more specific about exactly when the VPN tunnel drops relative to the timing of the reboot of the standby?  In other words, does the tunnel stop working on the active as soon as the standby is dropped and loses link to the switch (less likely), or does the tunnel die at about the time the standby fetches the latest policy and attempts to rejoin the cluster (more likely)? 

If the former, that would suggest some kind of network issue, perhaps with STP on the switch or possibly routing table re-convergence.  If the latter, I know at one point that a gateway joining a cluster after a reboot would fetch the policy directly from the active member.  There was also some extra logic added recently to a cluster policy installation so I'm wondering if the "policy sync" operation when the rebooted member attempts to join may be disrupting your VPN.  Questions:

0) IKEv1 or IKEv2?  Is this an interoperable VPN between Check Point and some other third-party device, or a homogenous VPN?

1) I assume that reinstalling the policy (a full install, not an accelerated one see sk169096) to the cluster when both members are working does not cause a VPN disruption?

2) Is the checkbox keep_IKE_SAs set under Global Properties...Advanced...Configure...VPN Advanced Properties...VPN IKE Properties?  If not all IKE SAs will be cleared upon policy installation, and the early termination of these IKE SAs has been known to hang tunnels, especially in an interoperable scenario.  However the IPSEC SAs should be maintained so existing tunnel connectivity should continue to work for up to 60 minutes.

3) Any logs about the VPN failing?  Invalid SA?  No response from peer?  Invalid ID?

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
samir-brkic
Participant

Thank you for your detailed response and questions. Here are the specifics based on your queries:

  • The Site-to-Site VPN stops working immediately, even if I run cpstop on the standby member, so it does not appear to be directly related to the cluster joining after a reboot.

To answer your questions:

  1. We are using IKEv1. This is a Site-to-Site VPN between two Check Point clusters, both managed by the same management server.

  2. No, installing the policy (even a full install) does not affect the VPN; the connections remain stable during policy installations.

  3. Yes, the checkbox "keep_IKE_SAs" is checked under Global Properties > Advanced > Configure > VPN Advanced Properties > VPN IKE Properties.

  4. Collecting logs has been challenging because these are both production environments. When the issue occurs, I usually need to apply the workaround immediately (vpn tu) to restore the connection, so I haven’t been able to capture logs during the failure. I will try to arrange a maintenance window, run VPN debug, and replicate the issue to gather more detailed logs.

I appreciate your assistance and will look into arranging the debug session to provide more insights.

Thanks again for your help!

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events