Re: What does Sticky Decision Function DO?

Greg_Harewood · ‎2018-01-22

I've only just realized I don't know the answer to this after many years with the product.

Without SDF, the following happens:

Connection 5-tuple -> hash function -> last 8(?) bits determine bucket -> connection processed by fw with bucket

WITH SDF, what changes? We know in particular that...

Acceleration is neutralized
It copes better with NAT (tho docs say static manual NAT only, oddly)

So, best guess....it does ONE of...

Attempts to process NAT rules on mid-TCP packet before determining a bucket
Relies on a synced state table to actually check the table, and only then decides whether it is the right node to process the packet to completion
Tries to optimize the source port in NAPT so that it goes in the right bucket for return traffic (unlikely)
Some unknowable combo of the above.

Not only is the documentation terrible for this (it talks about what SDF might do for you without any hint about how), but no-one even seems to have talked about this. Google turns up nothing. https://community.checkpoint.com/people/dwelccfe6e688-522c-305c-adaa-194bd7a7becc (or anyone) - can you give us a definitive answer?

Thanks!!

Greg

PhoneBoy · ‎2018-01-22

I suspect the mechanism is much simpler than you're thinking it is.

In a load sharing configuration, both gateways will see traffic, but only one of them should actually process the traffic.

Some of that processing happens in SecureXL, which may not be able to deal effectively with a situation where multiple gateways process the same traffic.

By enabling SDF, it forces everything to slowpath where deeper inspection can be done and the "other gateways" can more easily ignore the appropriate traffic.

Tim Hall‌ might have a better answer

Timothy_Hall · ‎2018-01-23

When SDF is enabled, an additional hash is applied to each sync'ed connection table entry indicating which cluster member should handle all packets associated with that connection in both the forward and return direction. You can see a passing reference to this here: sk65133: Connections Table Format

This extra hash ensures the same cluster member always handles all packets associated with a connection. In a Load Sharing scenario asymmetric handling of packets associated with a single connection is not the end of the world, but will cause a slight delay upon new connection initiation. An example of how a Load Sharing cluster successfully deals with asymmetrically handled traffic is Flush and ACK: sk100226: Cluster Flush and Ack (FnA) mechinism support for ICMP

While workarounds like FnA can be used with most types of traffic asymmetrically traversing a Load Sharing cluster, certain connections/tunnels that "terminate" at the firewall itself pose a special problem which SDF is designed to solve. At its most basic level, there can be a race condition between Load Sharing cluster members in which asymmetric return traffic for a new connection "outruns" the state sync update between cluster members. When an outrun occurs SDF ensures the packet is always handled by the same cluster member and not dropped.

I'd speculate that SecureXL cannot be used at all with SDF for at least one of the following reasons:

1) SecureXL does not support the additional SDF hash info in its separately-maintained connections table (i.e. fwaccel conns)

2) There is a chance of a race condition "outrun" in the notification mechanism between the "main" connections state table in INSPECT/F2F and the SecureXL connections table on the same cluster member.

3) SecureXL separately calculates its own tables on each cluster member; these SecureXL tables are not directly sync'ed between cluster members

Just to be clear when I use the term "race condition" in this context I mean a traffic handling problem and NOT a security vulnerability; felt the need to throw that in there due to all the Meltdown/Spectre hand-wringing.

Edit: SDF is gone in R80.20+ with the rework of SecureXL, and been replaced with the "Cluster Correction Layer" mechanism which is fully compatible with SecureXL as described here: sk169154: Asymmetric Connections in ClusterXL R80.20 and Higher

--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com

New Book: "Max Power 2026" Coming Soon
Check Point Firewall Performance Optimization

Greg_Harewood · ‎2018-01-23

Thanks Tim!

I should have listed that as one of my options - on the basis that it's the obvious and correct way to do it. I do think it's a shame that source ports can't chosen under NAPT in a way that ends up with the same SecureXL bucket, but c'est la vie - it wasn't anticipated. And perhaps will never be upgraded? It was always an oddity that NAPT was never properly dealt with in SecureXL.

Is it fair to say that SecureXL is not that big of a deal these days? I imagine 97% of the firewall load is with higher level inspection and optimization of simple packet shuffling is an irrelevance.

Last question if I may. In this case the reason I ended up trying to get to the bottom of this this week is because of a customer where the gateway is fragile and on new policy push and won't work with SDF at all. I'm pretty sure that I've tied the problem down to an external VLAN on a native interface, i.e. IP on native trunk plus vlan, which I will remove on the next change window. I can see how this would affect any traffic arriving on the wrong interface. Can you possibly confirm....

Forward inbound traffic works fine because the hash is reliable and it doesn't matter that forwarding (pivot) is broken - except maybe after a policy push? How could a policy push affect the hash function?
"fw accel stat" shows awaiting policy, but all inbound traffic is working. Does that mean that pivot is working correctly on the internal return traffic interface? I would imagine that it is and that fw accel stat is simply reporting the result because of partial working.
SDF mode fails completely and is much worse. Why? Surely it should be much more resilient and not require the broken forwarding?

Just a thought... it wouldn't be a bad idea to enhance and fix SecureXL for NAPT so that it works better for SDN scenarios. I would imagine many places where you would want to be able to push a template down to SDN fabric rather than forwarding a packet through the FW, including a reliable hash for LB. I don't think SecureXL ought to be written off.

Timothy_Hall · ‎2018-01-27

Hi Greg,

As Dameon observed, SecureXL is most definitely still a big deal and is very much worth your time to tune up. There are some exciting changes to SecureXL coming that will be available at some point, can't say much more than that except for this: if you attend CPX there is one particular table in the Technology room that you'll most definitely want to visit. When this change becomes available it will trigger a big addendum for my book, was hoping to include a separate chapter covering it in the second edition but the timing did not work out.

You most definitely should not mix native/untagged and tagged traffic on the same physical interface in any kind of Load Sharing ClusterXL implementation, as doing so is not supported and will cause performance problems. I've seen sites get away with doing it in ClusterXL HA, but I still strongly recommend not doing that mixture of tagged/untagged traffic on an interface in both editions of my book.

--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com

New Book: "Max Power 2026" Coming Soon
Check Point Firewall Performance Optimization

Timothy_Hall · ‎2020-07-30

Hmm, just noticed this revision note at the bottom of sk153832: ATRG: SecureXL for R80.20 and higher:

2 April 2020

Removed Sticky Decision function from limitations

So it looks like enabling the Sticky Decision Function no longer kills all SecureXL acceleration in R80.20+?

New Book: "Max Power 2026" Coming Soon
Check Point Firewall Performance Optimization

Timothy_Hall · ‎2020-08-04

Just stumbled over this update in sk42359: SecureXL and Sticky Decision Function in ClusterXL Load Sharing mode, looks like SDF is gone in R80.20+:

Starting from R80.20, Sticky Decision Function (SDF) was completely removed. Connection stickiness is achieved via other methods that don’t require disabling SecureXL.

New Book: "Max Power 2026" Coming Soon
Check Point Firewall Performance Optimization

PhoneBoy · ‎2018-01-26

Is it fair to say that SecureXL is not that big of a deal these days? I imagine 97% of the firewall load is with higher level inspection and optimization of simple packet shuffling is an irrelevance.

SecureXL is still very relevant as I believe it is needed to achieve the acceleration benefits of Medium Path/PXL.

G_W_Albrecht · ‎2018-02-20

This is a rather very detailed discussion - i would suggest sk31533 ClusterXL Load Sharing Sticky Decision Function for Passive FTP for an explanation of how SDF works - besides sk65486 for unsupported features when using SDF...

CCSP - CCSE / CCTE / CTPS / CCME / CCSM Elite / SMB Specialist

Martin_Raska · ‎2020-08-25

Does anybody know what are those methods?

Connection stickiness is achieved via other methods that don’t require disabling SecureXL.

_Val_ · ‎2020-08-25

Someone knows. We have introduces a new mechanism, called cluster correction layer, CCL, quoting from the ClusterAdmin Guide

Cluster Correction Layer (CCL)

Proprietary Check Point mechanism that deals with asymmetric connections in Check Point cluster. The CCL provides connections stickiness by "correcting" the packets to the correct cluster member. In most cases, the CCL makes the correction from the CoreXL SND. In some cases (like Dynamic Routing, or VPN), the CCL makes the correction from the Firewall or SecureXL.

Timothy_Hall · ‎2020-09-19

An SK has been created describing the Cluster Correction Layer - pretty slick: sk169154: Asymmetric connections in ClusterXL R80.20 and higher

New Book: "Max Power 2026" Coming Soon
Check Point Firewall Performance Optimization

Are you a member of CheckMates?

What does Sticky Decision Function DO?