- CheckMates
- :
- Products
- :
- General Topics
- :
- Re: What does Sticky Decision Function DO?
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Are you a member of CheckMates?
×- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What does Sticky Decision Function DO?
I've only just realized I don't know the answer to this after many years with the product.
Without SDF, the following happens:
- Connection 5-tuple -> hash function -> last 8(?) bits determine bucket -> connection processed by fw with bucket
WITH SDF, what changes? We know in particular that...
- Acceleration is neutralized
- It copes better with NAT (tho docs say static manual NAT only, oddly)
So, best guess....it does ONE of...
- Attempts to process NAT rules on mid-TCP packet before determining a bucket
- Relies on a synced state table to actually check the table, and only then decides whether it is the right node to process the packet to completion
- Tries to optimize the source port in NAPT so that it goes in the right bucket for return traffic (unlikely)
- Some unknowable combo of the above.
Not only is the documentation terrible for this (it talks about what SDF might do for you without any hint about how), but no-one even seems to have talked about this. Google turns up nothing. https://community.checkpoint.com/people/dwelccfe6e688-522c-305c-adaa-194bd7a7becc (or anyone) - can you give us a definitive answer?
Thanks!!
Greg
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I suspect the mechanism is much simpler than you're thinking it is.
In a load sharing configuration, both gateways will see traffic, but only one of them should actually process the traffic.
Some of that processing happens in SecureXL, which may not be able to deal effectively with a situation where multiple gateways process the same traffic.
By enabling SDF, it forces everything to slowpath where deeper inspection can be done and the "other gateways" can more easily ignore the appropriate traffic.
Tim Hall might have a better answer
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When SDF is enabled, an additional hash is applied to each sync'ed connection table entry indicating which cluster member should handle all packets associated with that connection in both the forward and return direction. You can see a passing reference to this here: sk65133: Connections Table Format
This extra hash ensures the same cluster member always handles all packets associated with a connection. In a Load Sharing scenario asymmetric handling of packets associated with a single connection is not the end of the world, but will cause a slight delay upon new connection initiation. An example of how a Load Sharing cluster successfully deals with asymmetrically handled traffic is Flush and ACK: sk100226: Cluster Flush and Ack (FnA) mechinism support for ICMP
While workarounds like FnA can be used with most types of traffic asymmetrically traversing a Load Sharing cluster, certain connections/tunnels that "terminate" at the firewall itself pose a special problem which SDF is designed to solve. At its most basic level, there can be a race condition between Load Sharing cluster members in which asymmetric return traffic for a new connection "outruns" the state sync update between cluster members. When an outrun occurs SDF ensures the packet is always handled by the same cluster member and not dropped.
I'd speculate that SecureXL cannot be used at all with SDF for at least one of the following reasons:
1) SecureXL does not support the additional SDF hash info in its separately-maintained connections table (i.e. fwaccel conns)
2) There is a chance of a race condition "outrun" in the notification mechanism between the "main" connections state table in INSPECT/F2F and the SecureXL connections table on the same cluster member.
3) SecureXL separately calculates its own tables on each cluster member; these SecureXL tables are not directly sync'ed between cluster members
Just to be clear when I use the term "race condition" in this context I mean a traffic handling problem and NOT a security vulnerability; felt the need to throw that in there due to all the Meltdown/Spectre hand-wringing.
Edit: SDF is gone in R80.20+ with the rework of SecureXL, and been replaced with the "Cluster Correction Layer" mechanism which is fully compatible with SecureXL as described here: sk169154: Asymmetric Connections in ClusterXL R80.20 and Higher
--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com
now available at maxpowerfirewalls.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Tim!
I should have listed that as one of my options - on the basis that it's the obvious and correct way to do it. I do think it's a shame that source ports can't chosen under NAPT in a way that ends up with the same SecureXL bucket, but c'est la vie - it wasn't anticipated. And perhaps will never be upgraded? It was always an oddity that NAPT was never properly dealt with in SecureXL.
Is it fair to say that SecureXL is not that big of a deal these days? I imagine 97% of the firewall load is with higher level inspection and optimization of simple packet shuffling is an irrelevance.
Last question if I may. In this case the reason I ended up trying to get to the bottom of this this week is because of a customer where the gateway is fragile and on new policy push and won't work with SDF at all. I'm pretty sure that I've tied the problem down to an external VLAN on a native interface, i.e. IP on native trunk plus vlan, which I will remove on the next change window. I can see how this would affect any traffic arriving on the wrong interface. Can you possibly confirm....
- Forward inbound traffic works fine because the hash is reliable and it doesn't matter that forwarding (pivot) is broken - except maybe after a policy push? How could a policy push affect the hash function?
- "fw accel stat" shows awaiting policy, but all inbound traffic is working. Does that mean that pivot is working correctly on the internal return traffic interface? I would imagine that it is and that fw accel stat is simply reporting the result because of partial working.
- SDF mode fails completely and is much worse. Why? Surely it should be much more resilient and not require the broken forwarding?
Just a thought... it wouldn't be a bad idea to enhance and fix SecureXL for NAPT so that it works better for SDN scenarios. I would imagine many places where you would want to be able to push a template down to SDN fabric rather than forwarding a packet through the FW, including a reliable hash for LB. I don't think SecureXL ought to be written off.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Greg,
As Dameon observed, SecureXL is most definitely still a big deal and is very much worth your time to tune up. There are some exciting changes to SecureXL coming that will be available at some point, can't say much more than that except for this: if you attend CPX there is one particular table in the Technology room that you'll most definitely want to visit. When this change becomes available it will trigger a big addendum for my book, was hoping to include a separate chapter covering it in the second edition but the timing did not work out.
You most definitely should not mix native/untagged and tagged traffic on the same physical interface in any kind of Load Sharing ClusterXL implementation, as doing so is not supported and will cause performance problems. I've seen sites get away with doing it in ClusterXL HA, but I still strongly recommend not doing that mixture of tagged/untagged traffic on an interface in both editions of my book.
--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com
now available at maxpowerfirewalls.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hmm, just noticed this revision note at the bottom of sk153832: ATRG: SecureXL for R80.20 and higher:
2 April 2020 |
|
So it looks like enabling the Sticky Decision Function no longer kills all SecureXL acceleration in R80.20+?
now available at maxpowerfirewalls.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Just stumbled over this update in sk42359: SecureXL and Sticky Decision Function in ClusterXL Load Sharing mode, looks like SDF is gone in R80.20+:
Starting from R80.20, Sticky Decision Function (SDF) was completely removed. Connection stickiness is achieved via other methods that don’t require disabling SecureXL.
now available at maxpowerfirewalls.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is it fair to say that SecureXL is not that big of a deal these days? I imagine 97% of the firewall load is with higher level inspection and optimization of simple packet shuffling is an irrelevance.
SecureXL is still very relevant as I believe it is needed to achieve the acceleration benefits of Medium Path/PXL.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is a rather very detailed discussion - i would suggest sk31533 ClusterXL Load Sharing Sticky Decision Function for Passive FTP for an explanation of how SDF works - besides sk65486 for unsupported features when using SDF...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Does anybody know what are those methods?
Connection stickiness is achieved via other methods that don’t require disabling SecureXL.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Someone knows. We have introduces a new mechanism, called cluster correction layer, CCL, quoting from the ClusterAdmin Guide
Cluster Correction Layer (CCL)
Proprietary Check Point mechanism that deals with asymmetric connections in Check Point cluster. The CCL provides connections stickiness by "correcting" the packets to the correct cluster member. In most cases, the CCL makes the correction from the CoreXL SND. In some cases (like Dynamic Routing, or VPN), the CCL makes the correction from the Firewall or SecureXL.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
An SK has been created describing the Cluster Correction Layer - pretty slick: sk169154: Asymmetric connections in ClusterXL R80.20 and higher
now available at maxpowerfirewalls.com