Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
dgrenfell
Contributor
Jump to solution

Has anyone been able to have redundant VPN tunnels with AWS using vti's?

I have 2 site-site VPN tunnels going out to AWS, but I can't seem to force a failover to make sure redundancy is working. We have a cluster of 2 19100 appliances, so I know redundancy would work if we lost a gateway, but for some reason the steps I have taken to force a failover for the tunnels doesn't seem to work. I have performed the following:

- Logged into GAIA and disabled the vti interface (vpnt2 in this case) and pushed policy

- When logged into the active gateway and looking at the tunnel list, I still see the tunnel associated with the vti interface I had disabled still showing connected

- After deleting the SA's for the gateway on the AWS end of this tunnel, it still showed connected, no matter how many times I performed those actions

The vendor on the AWS end said the tunnel never went down, and they were seeing traffic flowing in and out of their server, so that attempt was a bust. I then got CP on a conference call with us and the ONLY way we could get it to "fail over" was to remove the gateway that is associated with the vti from the community. However, the same symptoms were still present (i.e the tunnel still showing connected, etc), but it was when the tunnel negotiation timer ran out that it FINALLY showed disconnected (after pushing policy the AWS side finally went down, but it took approximately 60ish seconds). When we ran fw monitor, we saw that traffic on our end was still trying to send things out the tunnel that was apparently down, so it just broke things, and we had to revert back.

TLDR: Am I missing something here?

Here is my configuration:

- Cluster of 2 19100 CheckPoint appliances running R81.20 with JHF 76

- 2 vti interfaces pointing to their respective AWS gateways, using addressing provided by AWS

- A star community consisting of our cluster as the satellite gateway and the 2 AWS gateways as the center

- Both AWS gateways set with empty groups to facilitate the routed based configuration (instructions provided by AWS and CP TAC)

- Static routes set on both vti's using a priority of 1 and 2 for each gateway (1 being the primary tunnel and 2 being the secondary) so the gateways know which vti to "prefer" to send traffic out

- Directional rules set up in Smart Console to allow the traffic that is to be accepted

The site-site VPN IS working, I just can't seem to perform a forced fail over to go from one tunnel to the other. 

 

Any thoughts? Am I missing anything? Let me know if I need to show or explain anything further. Thanks all!

 

0 Kudos
1 Solution

Accepted Solutions
dgrenfell
Contributor

Welp, I guess it took upgrading to JHTF 118 to get this failover thing to work. So yay! Ha. Thanks all who chimed in to help me with this, it is greatly appreciated. 

View solution in original post

56 Replies
Chris_Atkinson
MVP Gold CHKP MVP Gold CHKP
MVP Gold CHKP

Can you describe further how are the static routes configured, was there a particular guide which you followed?

CCSM R77/R80/ELITE
0 Kudos
dgrenfell
Contributor

Sure! I have the static routes set up in GAIA for both gateways like this:

image.png

 

As far as the guide is concerned, I had a guide that AWS sent me via a text file, but I also looked at several threads on here when something didn't quite make sense in their guide. In the end I got the tunnels up, it's just the redundancy aspect of it is not working. 

 

0 Kudos
Chris_Atkinson
MVP Gold CHKP MVP Gold CHKP
MVP Gold CHKP

You don't appear to be using the ping/monitor option...

What does the active routing table look like when the VTI is disabled?

CCSM R77/R80/ELITE
0 Kudos
dgrenfell
Contributor

Hey Chris, I see what you mean. Will that ping/monitor option do anything in this situation? Also, the route table looks as I expect it would and shows the available vti as the route to AWS. 

0 Kudos
the_rock
MVP Platinum
MVP Platinum

Do you have simple diagram?

Best,
Andy
0 Kudos
dgrenfell
Contributor

I do actually. I scrubbed all private information, but here is the basic diagram of how it flows. We have 2 ISP routers that connect out to the outside world, with the firewall cluster having a VIP between the two routers. The firewall cluster shares 2 vti's with AWS and the traffic gets encrypted within our network, sent through the vti and then out one of the ISP routers (whichever has the better path through BGP at the time). We don't have any dynamic routing on the firewall cluster, as that's all handled at the ISP routers, thus the reason for the static routes.image2.png

 

0 Kudos
the_rock
MVP Platinum
MVP Platinum

Thanks mate! Let me see if I can try lab this up when back from vacation.

Best,
Andy
dgrenfell
Contributor

Sounds good sir! I have another call with CP today, so I'll see what we find out. I'll update this thread if I have any new information afterward. 

the_rock
MVP Platinum
MVP Platinum

Appreciated!

Best,
Andy
0 Kudos
the_rock
MVP Platinum
MVP Platinum

I had done this with Azure, but I suspect would be similar on AWS.

Best,
Andy
0 Kudos
dgrenfell
Contributor

I have a tunnel with Azure for another vendor, it's not redundant, but man it was SO much easier to set up and has never had any issues, unlike this one with AWS. I might need to have you share some pointers on how you got that to work with Azure so I can see if the same can be applied here with AWS. 

0 Kudos
the_rock
MVP Platinum
MVP Platinum

There are some differences, yes!

Best,
Andy
0 Kudos
Duane_Toler
MVP Silver
MVP Silver

You likely will need Dead Peer Detection enabled which is configured via Permanent Tunnels and a GUIDBedit configuration.   It's in the R81.20 Site to Site VPN admin guide, page 138-140.

You have much older Jumbo HFA than is current, and Jumbo HFA 89 includes a fix for DPD, and each Jumbo since 76 has numerous VPN-related fixes that may be of interest to you.

https://sc1.checkpoint.com/documents/Jumbo_HFA/R81.20/R81.20/R81.20-List-of-all-Resolved-Issues.htm?...

 

--
Ansible for Check Point APIs series: https://www.youtube.com/@EdgeCaseScenario and Substack
0 Kudos
the_rock
MVP Platinum
MVP Platinum

I believe DPD is auto enabled when you check permanent tunnel option in vpn community object.

Best,
Andy
0 Kudos
dgrenfell
Contributor

Yep, you are correct sir. 

0 Kudos
dgrenfell
Contributor

Hey Duane! I do have DPD enabled, and verified it in the GUIDBedit tool, but yes, I realize we're on an older HFA - we rolled back because we ran into an issue with R82.10 that basically corrupting our database to the point we couldn't push policy to our cluster; even CP had issues helping us, so we just reverted back to an old snap shot. I'll have to defer to my colleagues on that, as it's a bit of a touchy subject right now. 

0 Kudos
Duane_Toler
MVP Silver
MVP Silver

Yikes, I've been there.  I'm surprised TAC couldn't give you an answer.  There's a curious fix in Jumbo HFA 84 that smells a little bit like what you're describing.  Subsequent HFAs have dozens and dozens of fixes for management and some database issues, too. 

https://sc1.checkpoint.com/documents/Jumbo_HFA/R81.20/R81.20/Take_84.htm?tocpath=Previously%20Releas...

You can generally run the gateways and management server on different HFAs (depending on what features you're trying to enable; some require parity).  If this is strictly a gateway issue (and sounds like it is), you can apply a more recent HFA there to see if it helps resolve the issue.  Hopefully you can get an opportunity to try a more recent HFA on the management server soon!

 

--
Ansible for Check Point APIs series: https://www.youtube.com/@EdgeCaseScenario and Substack
the_rock
MVP Platinum
MVP Platinum

FWIW, I always install latest jumbos, seems pretty safe to do so lately, at least in my opinion.

Best,
Andy
0 Kudos
dgrenfell
Contributor

Well nothing new to add here. We removed one of the tunnels from the community (in this case, tunnel 2 as that's the tunnel that's been handling traffic since the beginning) and pushed policy. Application access broke, yet the tunnel still showed connected. AWS reported their pings were lost once the policy pushed and then a few seconds later saw that the tunnel was down, but not on our end. CP tech kept trying to delete the SA's for the gateway of the tunnel we removed from the community, but that also didn't work. In the past I thought just disabling the vti interface would take the tunnel down, but it didn't, which makes sense, as it's the clusters external interface that creates the IPSEC tunnel, so that was a learning experience. So since we're on R81.20 HTF 76, the only conclusion CP came up with is to upgrade to HTF 118, since that's the highest HTF recommended for R81.20. We're going to give that a whirl and see what happens, but I'm sadly not optimistic. I just don't see why the cluster can't let go of the tunnel, it's like the re-negotiation clock is the end-all-be-all here. 

0 Kudos
the_rock
MVP Platinum
MVP Platinum

I learned something while ago when it comes to disabling the vpn tunnel. Just remove CP fw/cluster from desired community, push policy, thats it.

Best,
Andy
0 Kudos
dgrenfell
Contributor

Hmm, that's not a terrible idea, and definitely something I hadn't thought of. I'll see if I can try that and see if that works, just to make sure redundancy works to that regard. 

the_rock
MVP Platinum
MVP Platinum

As our boss would say "Im not just a pretty face, I got ideas too", to which we reply "Whos the pretty face?" 🙂

Anywho, definitely worth a try!

Best,
Andy
0 Kudos
dgrenfell
Contributor

Unfortunately, that didn't work for me. It took down access to the applications, but in the CLI, it still showed that tunnel as being connected. However, the AWS side showed that it was disconnected. So back to the drawing board. 

0 Kudos
the_rock
MVP Platinum
MVP Platinum

And you pushed policy afterwards?

Best,
Andy
0 Kudos
dgrenfell
Contributor

Of course. 😉

the_rock
MVP Platinum
MVP Platinum

K just making sure 🙂

Best,
Andy
0 Kudos
dgrenfell
Contributor

Yup, we're running the gateways on R81.20 and the SMS on R82.10, but you're right, it was just a gateway issue, but oddly enough, it worked for 6 months and then BAM! Nope! Had the same thing happen on a different cluster we had, but strangely we were able to resolve it with a reboot of the cluster (the same fix didn't help with our Enterprise cluster unfortunately). Since I work in local government, things run a bit slower here, so we'll see where we go with the HFA situation. Thanks for the links and input, I appreciate it sir!

the_rock
MVP Platinum
MVP Platinum

Since you mentioned government, as a kid growing up in eastern Europe, I always thought government there was useless, but I learned having been living in Canada for 25 years and been to literally 99% of the planet that governments are USELESS everywhere, no offence 😂😂😂😂

Best,
Andy
0 Kudos
dgrenfell
Contributor

LOL None taken. Thankfully it's just at the city level, but I'm just a network engineer, so I don't get involved in the uselessness of the political side of things. Love Canada btw!

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events