Bridge interface -Gateway performance and flow que...

Scottc98 · ‎2022-01-13

I have worked with Checkpoint gateways for about 6 years now but they all have been Layer 3 Cluster deployments. I am now working with a customer who has L2 Bridge interfaces in a ClusterXL setup; as they mainly were using the GWs for threat inspection only (IPS/Bot/Virus).

I have two main questions that I am hoping I can get some clarity on:

Performance:

- When it comes to overall throughput performance ratings, how does the bridging of the interfaces affect the calculations?

Example: Switch 1G copper => eth1 CP bridged to eth2 => router

if you hit max on this flow, does it count as 1G of throughput or 2G? The bridge to my understanding acts like a 'switch'. While data between the switch and router between the CP GW would only get 1G throughput.....does the CP count it as 2G in a bridge mode....therefore needing to size a gateway with that understanding before any future purchases?

Flow:

Using the same topology:

Example: Switch 1G copper => eth1 CP bridged to eth2 => router

The switch/router here has VLAN tags during its initial deployment. At that time, those VLANs didn't really need to speak with each other until recently for some remote management issues. I 'think' that we are being hit with SK172204 and going to schedule a window to disable SecureXL to confirm. There is no policy rules that is blocking these flows and it works fine when we bypass the CP altogether.

If its this SK, the thought of running with SecureXL disabled doesn't sound like a good idea and would cause a performance hit (Correct?).

The SK mentions about a feature enhancement can be requested and affecting R80.20 through R81. Does anyone know if this limitation is present in R81.10 or added into its release? I can't find any docs saying yes or no but I figure it there was a 'feature enhancement' option, maybe R&D wrapped it in.

Just trying to gather all of the info before I might have to tell my customer that we might have an architectural design issue.

Thanks in advance

Timothy_Hall · ‎2022-01-13

I would think that bridge mode would actually be somewhat less intensive than routed mode, since the Ethernet NIC driver is not having to remove the Ethernet framing on ingress, nor create new Ethernet framing for transmission on egress. It is just switching an existing Ethernet frame from one interface to another assuming it passes inspection, so I doubt you would need to double the throughput numbers for bridged mode vs. routed mode. What I can't remember is whether bridged F2F traffic goes the full iIoO path through the INSPECT engine, or perhaps just through iI (suppose I could run an fw monitor on a system using bridging to determine this).

In regards to the second question you have raised a good point. I'm guessing that bridged traffic is able to be handled by SecureXL to some degree, but how that might differ vs. routing mode is unclear. What I would suggest is running fwaccel stats -s prior to disabling SecureXL and looking at the F2F/slowpath percentage. If it is already high, disabling SecureXL will probably have a minimal performance impact. If however the F2F percentage is low, disabling SecureXL will have a much higher CPU impact (especially on your worker cores) since everything will go F2F/slowpath once SecureXL is disabled.

Let us know what you find out. 😀

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices
Self-Guided Video Series Coming Soon

Scottc98 · ‎2022-01-14

Thanks @Timothy_Hall . On the first one, I'll pass that along for the test window change ticket.

For the first one, the question came about around a set of 3600 GWs that had a bunch of BGP drops between the router/core switch. The TAC individual was monitoring the load on the GW and mentioned that he saw about 2.2-2.5Gb of throughput and look to be exceeding the power rating of the box.

What threw us off was 'how' that would be possible with this single bridge setup. The only other connections is the management interface (which would generate log traffic and such but not 1G worth) and the synch interface for the other cluster.

The management of the gateways for my customer is through a few different aggregators; so need to jump through a few hoops to get the exact TAC ticket info/details.

The only general explanation that we could think of if such data observations was true was that the bridge interfaces counts as 2x here (with MGMT & sync making up the other ~ .5Gb at the time)

Timothy_Hall · ‎2022-01-15

The 3600 only has 4 cores and under the best conditions with only the Firewall blade enabled it is supposed to do 3.3 Gbps according to its data sheet. What features do you have enabled? (run command enabled_blades) I'd say getting 2.2-2.5Gbps is pretty darn good if you have anything enabled other than Firewall.

I would assume that the connections crossing the 3600 were numerous & diverse and you were not making a single elephant flow style connection with iperf or something like that, as that processing cannot be distributed among multiple worker cores until R81 where the pipeline-based processing paths are active.

So you should have a static 1/3 split by default with Core 0 handling SND duties and Cores 1-3 handling INSPECT/Firewall Worker duties, assuming you are not running R81 or later which uses Dynamic Split. Next time the box is bumping up against the 2.2-2.5 Gbit barrier, take a look at individual core utilization. Which CPUs are hitting 100%? Just the SND one? Just some of all of the workers? All of them? Do network RX-DRPs increment in the output of netstat -ni during this period? This exercise is a matter of figuring out where the bottleneck is, and what (if anything) can be done about it via tuning.

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices
Self-Guided Video Series Coming Soon

Are you a member of CheckMates?

Bridge interface -Gateway performance and flow question