Re: Knowledge and Solutions to 3 long time bugs (G...

Egenity

Since I operate a Check Point centric consulting practice, I am exposed to many client environments at different levels of complexity and size. In recent events, some recurring problems (bugs) have surfaced across different clients that I have repeatedly experienced over the years, and that have never been fixed. These issues also do not show up in the support knowledge base, at least not fully or completely targeted.

In an effort to help others navigate around these issues, I am going to throw what I know out here:

BUG #1: "Get Interfaces" operation

When working with routed VPN structures, there is a need to create virtual tunnel interfaces (VTIs). Frustratingly, there is no method within SmartConsole to manually add VTIs to the topology. You are forced to use the "Get Interfaces" function on the object to retrieve the VTIs. This sounds workable, but the operation (as of R82 and lower) will destroy your existing topology/interface configuration in subtle ways. AND, the larger the interface list, the more difficult it is to recover.

NOTE: This is the "Get Interfaces without Topology" action. The "Get Interfaces with Topology" variant has same problems, but also generates objects (mostly never desired).

The operation has 4 issues:

Randomly turns off Anti-Spoofing configuration on some interfaces
Interface security zone designations are randomly removed
The operation will randomly drop digits off interface IP configuration fields. This is not obvious, unless you carefully comb through the cluster IP information and correct things. Mostly, I see it butcher the cluster VIPs.
On cluster based objects, it changes the interface names back to natural OS names. This is not an issue for most, and quite understandable, but some environments utilize the interface name (at cluster level) for something more friendly, such as "WAN Transit", "SDWAN" or "External Internet". This is more of annoyance than anything, as it requires the names to be re-entered.

Overall, the Get Interfaces function is very unreliable and requires double checking everything each time it is used. For VTI environments, it guarantees breaking something each time an interface is added. Of course, in R82.10 VPN communities, there is new functionality to manage VTIs automatically, but it still doesn't address this major issue and is a totally different approach.

Of course, SmartConsole should be outfitted with the ability to add VTI interfaces to the gateway/cluster topology manually. Why this is missing, is a mystery.

BUG #2: Route Injection Mechanism (RIM) Failure

Somewhere in R81, there was a new feature added to exclude the External IP address of a gateway from the encryption domain. This is a fantastic feature, as you really don't want to use the external IP address of a gateway for encryption. You want to be able to manage and/or troubleshoot a gateway directly with SSH, ping, etc not relying on a S2S tunnel. Most 3rd party products also default and force it this way, as it makes total sense.

However, if you turn this option ON, it will break the Check Point Tunnel-Test functionality and cause the Route Injection Mechanism (RIM) to consider your S2S tunnel dead (even though it is operating properly), making sure the kernel routes are NOT injected, which causes your dynamic routing protocol (BGP, OSPF, etc) not to advertise the VPN community reachable networks.

The specific issue with Tunnel-Test is discussed in the support knowledge base, but nowhere is the RIM implication ever mentioned or considered (nor intended). I have encountered this multiple times in client environments.

BUG #3: Route Based VPN Community Setting ("VPN Routing" section)

This issue is quite nasty and recently experienced on R81.20 JHF Take 120 gateway cluster.

Example setup:

Site-to-Site VPN community "Remote-Sites" which is normal policy based VPN in star topology (encryption domains) containing both the remote and central gateways
Site-to-Site VPN community "Cloud-Backbone" which is route based VPN utilizing VTIs in star topology containing the central gateway and cloud gateways
Remote site with topology of 10.10.10.0/24
Central site with topology of 10.255.255.0/24
Cloud gateways are irrelevant, just illustrating the two VPN communities
Cloud-Backbone community has VPN Routing section setting of "To center and to other satellites through center"

Symptom:

Remote site can communicate with central site network (via Remote-Sites community) 10.255.255.0/24 for the most part. However, certain target IPs within the range generate a drop, observed in firewall logs AND on kernel debug with the error "According to the policy the packet should not have been decrypted"

Example:

Remote host 10.10.10.10 can communicate normally with everything in 10.255.255.0/24, except for specific IP addresses 10.255.255.100 and 10.255.255.101, which generate the drop with error -- YES, as crazy as that sounds, two IP addresses within the /24 subnet fail, while every other target works fine.

Solution:

The only way to fix this, is to revert the Cloud-Backbone community VPN Routing section setting back to default "To Center Only" - NOTE: This setting is not in the community experiencing the issue, it just seems to trigger it.

NOW, this setting is not vital to the design and operation but was randomly discovered as the root cause of the issue described, which took many hours of troubleshooting to overcome.

→ CCSE, CCTE, CCME

the_rock

I read your post very carefully, some super valid points. I will say though, my personal experience, as far as point 1, I ALWAYS do get interfaces without topology, no issues. Point 2, cant really comment, as I had not dome that in who knows how logs. As far as point 3, have not had any issues with it either. I usually assign empty groups for both enc domains and that seems to work well.

Happy New Year!

Best,
Andy

Egenity

I should have clarified on Bug #1, it is the Get Interfaces WITHOUT Topology action. I will edit the post for better accuracy.

Bug #3 is quite a strange one, looks to be purely a random issue that will probably take quite the effort to trap and fix.

→ CCSE, CCTE, CCME

the_rock

I feel like for point 3, it also depends on what vendor it is, as in my experience, I never had that problem with Azure. When it comes to AWS, it happened once, but it was something on their end. If its say PAN or Fortinet, always works for me, no problems.

Best,
Andy

Egenity

On Bug #3, it seems to have nothing to do with the routed VPN community operation, but the setting in that community triggers the issue in the normal policy based community (all Check Point gateways).

→ CCSE, CCTE, CCME

the_rock

That makes sense.

Best,
Andy

Are you a member of CheckMates?

Knowledge and Solutions to 3 long time bugs (Get Interfaces, RIM failure and VPN Community Routing)