Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
isuckatthis
Contributor
Jump to solution

VPN community routing not working

Hello smarter than I people!

I have a Checkpoint to Checkpoint VPN. We're using a community. The traffic gets to the Checkpoint FW VPN concentrator and is not routed across the VPN.

How do I troubleshoot this?

The encryption domain on the remote side shows the correct subnets.

 

I've attached images showing the community, logs showing the traffic is not encrypted, a traceroute showing the FW is sending the traffic back to the router then the router back to the FW then the FW back to the router until the end of time (or TTL expires).

 

I have no idea how to troubleshoot this. Help! 😞 

 
 

routing.pnglog.pngtrace.png

0 Kudos
71 Replies
Zee
Contributor

I have actually tried every encryp type and method and I agree with you, but there are no error logs (that I could find) in logs or in debugs as well and as per the files shared to TAC , they could not find anything as well, but its strange that we keep on seeing Phase1 on tunnel monitoring and getting issue between these two CP FWs only.

0 Kudos
the_rock
MVP Gold
MVP Gold

Did you try ike v1 and aes 128/md5?

Best,
Andy
0 Kudos
Zee
Contributor

Did not try md5 but I tried other combination of IkEv1 /v2 /AES 128, AES256

0 Kudos
the_rock
MVP Gold
MVP Gold

Needless to say, md5 should never be used, but I would try in for this purpose, just as a test.

Best,
Andy
0 Kudos
Zee
Contributor

I will try

the_rock
MVP Gold
MVP Gold

Also, as far as logs, try any IP related to that tunnel, domt even do src or dst, just IP.

Best,
Andy
0 Kudos
Duane_Toler
MVP Silver
MVP Silver

If you see nothing with that log filter (blade:VPN and action:reject) then you need to look for traffic to or from the remote peer IP.

src:192.0.2.1 or dst:192.0.2.1

 Replace the IP with the IP of your remote peer.  You ought to have some packets and logs back and forth.  If you don't, then you have a larger problem with network routing and path selection.

In the TCPDUMP output on either gateway, do you have IKE packets (port 500 or port 4500) from the remote peer?  You said you had a tcpdump capture you gave to TAC.  Did it have IKE packets between the peers?  Verify you have the correct network path to the remote peer going the way you expect:

ip route get 192.0.2.1

Again, replace the IP with that of the remote peer.  Is the outgoing interface correct as you expect it to be?  The "via ethX" should be your external-facing interface on both gateways.

 

--
Ansible for Check Point APIs series: https://www.youtube.com/@EdgeCaseScenario and Substack
the_rock
MVP Gold
MVP Gold

@Zee  Do any logs show searching only by community name?

Best,
Andy
0 Kudos
Zee
Contributor

From Site A to Site B service FW1_CPRID (TCP/18208), Encryption Failure Clear text packet should be encrypted and ESP (50) Packet is dropped because an IPsec SA associated with the SPI on the received IPsec packet could not be found

From Site B to Site A I only see ESP (50) in drops with reason Decryption Failed, ( IKE)Authentication Failure, or Packet is dropped because an IPsec SA associated with the SPI on the received IPsec packet could not be found

They do not exactly match the time stamps when I see phase1 issue on Grafana or Tunnel monitoring, I see these error even when the tunnel is up
IKE (UDP/500) packets are accepted

0 Kudos
the_rock
MVP Gold
MVP Gold

I seriously have a gut feeling some of those informations observed might not be 100% accurate...ie, if shows phase 1 is up, but could be down. I suggest tearing down that community and build brand new one from scratch, ONLY for those 2 gateways, does not matter if its star or mesh.

Best,
Andy
0 Kudos
Zee
Contributor

Already removed these gateways from Mesh and configured star but issue is still there. Site A is our HQ so its in mesh for other gws and in star for site b

 

0 Kudos
Duane_Toler
MVP Silver
MVP Silver

Sanity check:  Do your gateways have their time clocks set to a known good NTP time source?  Good time sources include time.windows.com and ntp.checkpoint.com

--
Ansible for Check Point APIs series: https://www.youtube.com/@EdgeCaseScenario and Substack
the_rock
MVP Gold
MVP Gold

Excellent point @Duane_Toler 

Best,
Andy
0 Kudos
Zee
Contributor

Hi,
We are using our own NTP, only the time zone was different when I checked but we are using same NTPs in or network

0 Kudos
Duane_Toler
MVP Silver
MVP Silver

The CPRID log, specifically, is because that traffic is part of the implied rules and should not be part of a VPN.  Do you have your gateways "main IP" (the one in the gateway properties, General) set to the gateway's public IP or an internal interface IP?  It should be set to the external public IP (for this reason).

I'd say this situation needs a VPN and IKE debug to see what is amiss.  I'm sure TAC did this for you, tho.

 

 

--
Ansible for Check Point APIs series: https://www.youtube.com/@EdgeCaseScenario and Substack
the_rock
MVP Gold
MVP Gold

I would say debugs are always a must in this case.

Best,
Andy
0 Kudos
the_rock
MVP Gold
MVP Gold

Fair enough! Happy to do remote next week when back from vacation...you can message me directly.

Best,
Andy
0 Kudos
Zee
Contributor

I am not allowed to do a remote, but I will definitely appreciate your help to discuss the issue when you are back, if possible.

the_rock
MVP Gold
MVP Gold

Be free to message me directly and describe the design, something might "trigger" :-).

Best,
Andy
0 Kudos
Duane_Toler
MVP Silver
MVP Silver

You can't do this.

 

You have overlapping subnets on both sides.  Your log screenshot shows traffic is "Accepted", when it should show "Encrypted".  You need to fix your VPN domains.  You also need to fix your firewall routing tables; both gateways have routing issues.  Your R1 router must also have a 10.0.0.0/8 route with next-hop of the firewall.  That's why you are seeing ping-pong traffic in your traceroute.

 

On one gateway, you have 10.59.0.0/16 being sent out via eth4 interface, which includes the 10.59.78.0/24 subnet of the other gateway.  Your external interface is eth1, however.  The firewall won't encrypt packets that are destined towards an internal interface ("internal" is defined as "not-external").

As @Lesley pointed out, you also have one side set for route-based VPN with a VTI, and the other is not.  The other gateway has a 10.0.0.0/8 route going via the VTI, but it has local interfaces within that supernet.  This also won't be very reliable, and it will cause "strange effects" for your network if you do this.

Your route table screenshots are blocking the names of the gateways, so we can't see which gateway is which.

 

--
Ansible for Check Point APIs series: https://www.youtube.com/@EdgeCaseScenario and Substack
0 Kudos
the_rock
MVP Gold
MVP Gold

Hm...I know what you mean, though Im fairly sure this would work, as long as supernet is set to false, which seems like it is now.

Andy

Best,
Andy
0 Kudos
isuckatthis
Contributor

You have overlapping subnets on both sides. From a traditional routing perspective, I don't understand the problem with this. They're different subnets. They overlap but that's why routers use the most specific mask when matching traffic for a routing decision. The network at the remote site is an extension of the main utility network.

 Your log screenshot shows traffic is "Accepted", when it should show "Encrypted".  You need to fix your VPN domains. I don't understand what's wrong with them. I understand the VPN domains are the policy used to direct traffic over the VPN.
Example: domain1 = 10.0.1.0/24   domain2 = 10.0.2.0/24 
src: 10.0.1.10 dst: 10.0.2.20 = traffic goes across the VPN

src: 10.0.1.10 dst: 10.0.3.30 = traffic does not go across the VPN



You also need to fix your firewall routing tables; both gateways have routing issues.  Your R1 router must also have a 10.0.0.0/8 route with next-hop of the firewall.  That's why you are seeing ping-pong traffic in your traceroute.  -  Close, it's a 0/0 and I agree with what you're saying. I'm expecting the FW to receive the traffic from the router and send it over the VPN because the src and dst of the packets match the VPN domains. Instead, the FW is saying, "screw you VPN community, you mean nothing to me, I don't even know you man, you're dead to me, I'm just going to use this /16 and route traffic back to where it came from"

I don't think I'm understanding what people are telling me. I probably need to better understand order of operations on CP. As it stands right now, I 100% believe that the traffic arrives in a firewall, some part of the process is to match the traffic against the VPN community and if the source and destination match, it will choose that tunnel and forward the traffic. Obviously I'm wrong and it's possible everyone is telling me this but it's not clicking.

How do you tell a FW to forward traffic over a VPN built with a community?

 

On one gateway, you have 10.59.0.0/16 being sent out via eth4 interface, which includes the 10.59.78.0/24 subnet of the other gateway.  Your external interface is eth1, however.  The firewall won't encrypt packets that are destined towards an internal interface ("internal" is defined as "not-external").   From a routing perspective, that seems fine to me, I don't understand the problem. Often, I have summary routes then more specific routes. Maybe I want to traffic engineer so I'll use more specific routes to take certain links, I've never encountered an issue.

 

As @Lesley pointed out, you also have one side set for route-based VPN with a VTI, and the other is not.  The other gateway has a 10.0.0.0/8 route going via the VTI, but it has local interfaces within that supernet.  This also won't be very reliable, and it will cause "strange effects" for your network if you do this.  This is not fully accurate. The remote side has two tunnels. One terminates on an ASA and is a VTI. Unless you think this is causing all my problems, we don't need to worry about that. With this design, I'm again assuming longest mask match wins the routing decision and also assuming the VPN Domains handle everything with routing, that's what I was taught. 
Simplified, I was taught that the two FW peers receive information from the VPN communities and based on src and dst, it will encrypt and send the traffic across the tunnel. That's not happening so I was taught wrong or misunderstood.

 

 

A lot to digest here, thanks man! 😊

0 Kudos
Duane_Toler
MVP Silver
MVP Silver

This is gonna get really nitty-gritty detailed, and more than you're asking, but the core answer for "how does the firewall know to do VPN?":

* There is a function inside the VPN code called "VPN tagging"

* This function only gets triggered *AFTER* a packet passes through the routing table (via a "route lookup layer") and begins to egress an external interface

* VPN tagging (in general) only occurs on an external-facing interface (your eth1)

* The VPN tagging function does a bunch of work to figure out what kind of VPN to use (domain-based, route-based; and this is the part where domain-based VPNs override route-based VPNs)

* The VPN peer must also be reachable out an external interface to trigger the IKE process.

* Remote VPN domains and peers must NOT be within the internal network (including anti-spoofing configurations on the interface).  That's because "VPN" by definition is "something external to the local gateway".

The VPN domain (encryption domain) does tell the local firewall what to encapsulate toward the remote peer, but the problem is that a packet is not going towards a remote peer.  The packet is simply returning back to your local network.

Yes, more-specific routes win, but you don't have a more-specific route for 10.59.78.0/24.

Go to the command line of your firewall and run "ip route get 10.59.78.2" and it will show you "via 10.59.40.9 dev eth4" when you (likely) want it to go via eth1.

 

See the attached screenshot of your routing table.

 
 

If you must have this routing table, then you need to add a more-specific static route for 10.59.78.0/24 via 10.57.0.9.  This will trigger the VPN code and start IKE towards the remote peer.  Hence, the packets will be "tagged" for VPN; "untagged" packets are passed in plain text.  Plain text packets are logged in the traffic log as an "Accept" log type.  "VPN tagged" packets are logged as "Encrypt" (or "Decrypt", if the packet is being received).

 

(hint: you can do this same route trick for Remote Access VPN office-mode subnets, and once a route is in the FIB of the gateway, you can select it for redistribution to dynamic routing protocol)

 

--
Ansible for Check Point APIs series: https://www.youtube.com/@EdgeCaseScenario and Substack
(1)
Lesley
MVP Gold
MVP Gold

Hmm looks like one firewall is route based VPN and other one domain based vpn. 

vpnt20 is route based. You have to pick between them. Either configure route based on both or domain based on both

-------
Please press "Accept as Solution" if my post solved it 🙂
0 Kudos
the_rock
MVP Gold
MVP Gold

Looks like based on last screenshot it would be route based? It all shows 0.0.0.0/0

Andy

Best,
Andy
0 Kudos
isuckatthis
Contributor

More context for you, I didn't include it for simplicity sake.
The spoke has two VPNs. 
1) vpnt20 - goes to a Cisco ASA
2) Checkpoint community VPN magic

vpnt20 to the ASA is working with no issues. I have not considered an issue on the spoke because the hub isn't forwarding to the spoke. As far as the peer having a 0.0.0.10 IP, it's a dynamic IP while we're using Starlink for transport at this remote site, all it has is power basically, well, power and a clear view of the sky.


The hub only has one VPN, using the checkpoint community.

0 Kudos
Duane_Toler
MVP Silver
MVP Silver

Then you *definitely* can't do this.  You have 10.0.0.0/8 out that VTI to the Cisco gateway.  However, you are also trying to do a domain-based VPN to your other gateway, also with networks within the 10.0.0.0/8 supernet.  Domain-based VPNs take priority over route-based VPNs (the VTI).  You are risking breaking the VTI to the Cisco gateway.  The reason is hasn't broken yet is because the hub hasn't seen packets to it (yet).

You *CAN* do route-based VTIs on all of the gateways and control packet pathing in the route tables.  This would be your safest option.  Be very careful and very judicious about domain-based VPNs with route-based VPNs.

Your first (and most significant) issue is the routing and networking on your hub gateway.  Having an interface configured like eth4 (the /29 subnet) in addition to a larger network route (10.59.0.0/16) out the same interface will also cause weird effects making debugging/troubleshooting difficult.  If you want eth4 to be a transit VLAN/subnet, then make it a real transit on its own transit subnet.

With the VPN domains configured like you have them, your logs will also be showing various VPN errors.

 

--
Ansible for Check Point APIs series: https://www.youtube.com/@EdgeCaseScenario and Substack
0 Kudos
isuckatthis
Contributor

Then you *definitely* can't do this.  You have 10.0.0.0/8 out that VTI to the Cisco gateway.  However, you are also trying to do a domain-based VPN to your other gateway, also with networks within the 10.0.0.0/8 supernet.  Domain-based VPNs take priority over route-based VPNs (the VTI).  I want 10.59.0.0/16 to go on the domain based VPN and everything else in 10.0.0.0/8 to go through the VTI. That's not possible? You are risking breaking the VTI to the Cisco gateway.  The reason is hasn't broken yet is because the hub hasn't seen packets to it (yet).

You *CAN* do route-based VTIs on all of the gateways and control packet pathing in the route tables.  This would be your safest option.  Be very careful and very judicious about domain-based VPNs with route-based VPNs.  Maybe we have to re-design and scrap the VPN domains, sounds like that's your recommendation.

Your first (and most significant) issue is the routing and networking on your hub gateway.  Having an interface configured like eth4 (the /29 subnet) in addition to a larger network route (10.59.0.0/16) out the same interface will also cause weird effects making debugging/troubleshooting difficult.  If you want eth4 to be a transit VLAN/subnet, then make it a real transit on its own transit subnet. It is a transit link. I don't have an IP in the 10.59.0.0/16 configured on eth4. Taking the italicized above at face value, I am hearing you say, "you cannot have 10.0.0.1/29 configured on your firewall and 10.0.0.2/29 configured on your router with a route on the FW of 10.59.0.0/16 next hop 10.0.0.2" that doesn't make sense, that's how routing works. What am I missing here?

With the VPN domains configured like you have them, your logs will also be showing various VPN errors. No VPN errors at all, unless I'm looking at the wrong logs.

0 Kudos
Lesley
MVP Gold
MVP Gold

You have overlap in the routing aswell the 10/8 is routed to the ASA routed based tunnel. routed based tunnel have prio over domain based tunnel

-------
Please press "Accept as Solution" if my post solved it 🙂
0 Kudos
isuckatthis
Contributor

Are you saying that the decision making process is this:

Delineated routing tables

Section 1 (route based VPN) of the table is reviewed first. If a route is found that matches the destination, it's automatically used.
Section 2 (policy based VPN) of the table is reviewed ONLY if there is NO route found in section 1, including no 0/0

 

Do I understand correctly?

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events