Solved: Re: R80.x Performance Tuning Tip - Elephant Flows ...

HeikoAnkenbrand · ‎2019-12-02

Elephant Flow (Heavy Connections)

In computer networking, an elephant flow (heavy connection) is an extremely large in total bytes continuous flow set up by a TCP or other protocol flow measured over a network link. Elephant flows, though not numerous, can occupy a disproportionate share of the total bandwidth over a period of time. When the observations were made that a small number of flows carry the majority of Internet traffic and the remainder consists of a large number of flows that carry very little Internet traffic (mice flows).

All packets associated with that elephant flow must be handled by the same firewall worker core (CoreXL instance). Packets could be dropped by Firewall when CPU cores, on which Firewall runs, are fully utilized. Such packet loss might occur regardless of the connection's type.

What typically produces heavy connections:

System backups
Database backups
VMWare sync.

Chapter

More interesting articles:

- R80.x Architecture and Performance Tuning - Link Collection
- Article list (Heiko Ankenbrand)

Evaluation of heavy connections

The big question is, how do you found elephat flows on an R80 gateway?

Tip 1
Evaluation of heavy connections (epehant flows)

A first indication is a high CPU load on a core if all other cores have a normal CPU load. This can be displayed very nicely with "top". Ok, now a core has 100% CPU usage. What can we do now? For this there is a SK105762 to activate "Firewall Priority Queues". This feature allows the administrator to monitor the heavy connections that consume the most CPU resources without interrupting the normal operation of the Firewall. After enabling this feature, the relevant information is available in CPView Utility. The system saves heavy connection data for the last 24 hours and CPDiag has a matching collector which uploads this data for diagnosis purposes.

Heavy connection flow system definition on Check Point gateways:

Specific instance CPU is over 60%
Suspected connection lasts more than 10s
Suspected connection utilizes more than 50% of the total work the instance does. In other words, connection CPU utilization must be > 30%

CLI Commands

Tip 2
Enable the monitoring of heavy connections.

To enable the monitoring of heavy connections that consume high CPU resources:

# fw ctl multik prioq 1

# reboot

Tip 3
Found heavy connection on the gateway with „print_heavy connections“

On the system itself, heavy connection data is accessible using the command:

# fw ctl multik print_heavy_conn

Tip 4
Found heavy connection on the gateway with cpview

# cpview CPU > Top-Connection > InstancesX

Links

sk105762 - Firewall Priority Queues in R77.30 / R80.10 and above

➜ CCSM Elite, CCME, CCTE ➜ www.checkpoint.tips

HeikoAnkenbrand · ‎2020-03-04

Hi @Martin_Raska,

In “Kernel Mode Firewall” KMFW, the maximum number of running cores is limited to 40 because of the Linux/Intel limitation of 2GB kernel memory, and because CoreXL architecture needs to load a large driver (~42MB) dozens of times (according to the CPU number, and up to 40 times). Newer platforms that contain more than 40 cores e.g., 23900 or open server are not fully utilized.

The solution of the problem is a firewall in the user mode of the Linux operating system.

GAIA version/ Kernel/ Cores	Firewall mode	Check
R80.30 kernel 3.10 more then 35* cores	UMFW is enabled	checked on HP DL 380 G10 2 * Platinum 8180MProcessor 28 cores = 56 cores
R80.30 kernel 3.10 less then 35* cores	KMFW is enabled	checked on HP DL 380 G10 1 * Platinum 8180MProcessor 28 cores
R80.30 kernel 2.6	KMFW is enabled	checked on VMWare with 30 cores and with 46 cores
R80.40 (default 3.10 kernel)	UMFW is enabled by default	checked on VMWare with 4 cores

To make sure that UMFW is activated, run the following command:

# cpprod_util FwIsUsermode

1 = User Mode Firewall
0 = Kernel Mode Firewall

For more information or to change the mode, read more in my article here:

R80.x - Performance Tuning Tip – User Mode Firewall vs. Kernel Mode Firewall

➜ CCSM Elite, CCME, CCTE ➜ www.checkpoint.tips

View solution in original post

Josef_Pecher · ‎2019-12-03

Hi @HeikoAnkenbrand,

Thank you for all the interesting articles about Performance Tuning you wrote.

You could write a book out of this link collection 😀.

R80.x Architecture and Performance Tuning - Link Collection

Paul_Erez · ‎2019-12-03

Hi @HeikoAnkenbrand,

This article has helped me very well.

I followed the steps and actually found a database backup connection. The connection caused about 70% CPU load on one core. We have now limited the bandwidth of the connection via QoS.

Best Regards

Paul

Niroyec_Yerusha · ‎2019-12-04

We were able to identify a very similar problem.

thx

Patricia_OSulli · ‎2019-12-05

We also had the problem with the elephant flows. This is a good way to find them quickly and easily.

HeikoAnkenbrand · ‎2019-12-05

In the past years I had always been looking for a solution to find elephant flows. Check Point has built in a good solution.

➜ CCSM Elite, CCME, CCTE ➜ www.checkpoint.tips

Gaurav_B_ · ‎2019-12-10

I just tried that. This is a very interesting solution. A way to find elefant flows.

Thanks

Delia_Pele · ‎2019-12-13

👌

Dirk_Wisbey · ‎2020-01-07

We have several connections with 5-7% utilization.

What can we do here?

Timothy_Hall · ‎2020-01-08

So glad you asked this question. 🙂

I will be speaking at CPX New Orleans and Vienna on the CheckMates track with a presentation called "Big Game Hunting: Elephant Flows" that will go through how to track down elephant flows (a.k.a. heavy connections), all the different remediation options, and the pros and cons of each. PhoneBoy will be delivering this presentation for me at CPX Bangkok because I'll be very busy that week, with, uh, something else...

Attend my Gateway Performance Optimization R81.20 course
CET (Europe) Timezone Course Scheduled for July 1-2

Igor_Szkaradkie · ‎2020-01-15

This is an interesting approach to detect heavy connections. I had checked this after this article and could identify some systems that were causing problems. We have now created QoS rules to limit the bandwidth. That worked well.

Martin_Raska · ‎2020-01-16

Guys,

if you have a problem with elephant flow you may try this

SecureXL Fast Accelerator (fw fast_accel) for R80.20 and above - sk156672

Josh_Dillig · ‎2020-01-21

Do we have to enable PrioQ to support the "fw ctl multik print_heavy_conn" command? The article suggests it, but the Tip# list isn't execution step#.

Also is this supported on R77.30 and R76SP.50?

CCMA

Timothy_Hall · ‎2020-01-21

Priority Queues must be in mode 1 (Eviluator-only) to use that command; mode 1 is the default on a firewall that does not have USFW enabled. I'll be speaking about this very topic in detail at CPX New Orleans and Vienna.

Support for fw ctl multik print_heavy_conn was added in R80.20; I doubt it can be backported into earlier releases since I'm pretty sure it relies on the major changes introduced to SecureXL in R80.20.

Attend my Gateway Performance Optimization R81.20 course
CET (Europe) Timezone Course Scheduled for July 1-2

uror · ‎2020-03-03

This does not work with R80.40.

HeikoAnkenbrand · ‎2020-03-03

R80.40 gateways use USFW by default.

Unfortunately this is no longer possible with R80.40 in USFW. @Timothy_Hall has already described this well for R80.20.

➜ CCSM Elite, CCME, CCTE ➜ www.checkpoint.tips

Martin_Raska · ‎2020-03-04

Could someone explain why FW was moved from kernel space to user space by default? What is the benefit except alocation more memory when you have more cores? What will be impacted, what is behind? Thanks

_Val_ · ‎2020-03-04

That was discussed here in several posts, I think.

In a nutshell, with more than 48 cores, kernel mode cannot utilise them all. To allow CoreXL use more cores on high performance boxes, User Mode is the only option. Plus, user mode add stability. If FWK instance crashes, it does not affect the whole machine.

VSX is running User Mode FWK instances for ages, actually.

HeikoAnkenbrand · ‎2020-03-04

Hi @Martin_Raska,

In “Kernel Mode Firewall” KMFW, the maximum number of running cores is limited to 40 because of the Linux/Intel limitation of 2GB kernel memory, and because CoreXL architecture needs to load a large driver (~42MB) dozens of times (according to the CPU number, and up to 40 times). Newer platforms that contain more than 40 cores e.g., 23900 or open server are not fully utilized.

The solution of the problem is a firewall in the user mode of the Linux operating system.

GAIA version/ Kernel/ Cores	Firewall mode	Check
R80.30 kernel 3.10 more then 35* cores	UMFW is enabled	checked on HP DL 380 G10 2 * Platinum 8180MProcessor 28 cores = 56 cores
R80.30 kernel 3.10 less then 35* cores	KMFW is enabled	checked on HP DL 380 G10 1 * Platinum 8180MProcessor 28 cores
R80.30 kernel 2.6	KMFW is enabled	checked on VMWare with 30 cores and with 46 cores
R80.40 (default 3.10 kernel)	UMFW is enabled by default	checked on VMWare with 4 cores

To make sure that UMFW is activated, run the following command:

# cpprod_util FwIsUsermode

1 = User Mode Firewall
0 = Kernel Mode Firewall

For more information or to change the mode, read more in my article here:

R80.x - Performance Tuning Tip – User Mode Firewall vs. Kernel Mode Firewall

➜ CCSM Elite, CCME, CCTE ➜ www.checkpoint.tips

Martin_Raska · ‎2020-03-05

Thanks Heiko, I will ask differently, what is a difference when FW code is running in kernel mode or user mode except for memory allocation.

HristoGrigorov · ‎2020-03-05

Kernel mode - faster, direct access to hardware but in case of crash everything goes down

User mode - slower, limited access to hardware but in case of crash only app crashes

Also, writing and maintaining code in kernel mode is often pure nightmare compared to user mode. With current hardware performance really does not suffer that much if you do it well in user mode.

Martin_Raska · ‎2020-03-05

that is I was missing. Thanks

TheGrave · ‎2020-05-28

I'm not so convinced performance is acceptable in user mode. My bet is once the kernel limitations are tackled CheckPoint will be crawling back to KMFW.

Not to mention that if you put traffic through a FW with 40 cores and it can't handle it in kernel mode your design is obviously wrong or software processing it is pure crap. Load-balancing exists for ages.

argur_007 · ‎2020-05-08

Does that also exist for UMFW?

_Val_ · ‎2020-05-08

Kernel or User Mode, Elephant Flows are problematic in both cases

Timothy_Hall · ‎2020-05-08

True, but there are much better tools for detection and remediation of elephant flows when in kernel mode. With USFW enabled detection and remediation tools for elephant flows are quite limited, but based on a recent conversation I learned that Check Point is working on closing that capability gap as we speak. My CPX 2020 presentation summarizes all this here:

https://community.checkpoint.com/fyrhh23835/attachments/fyrhh23835/member-exclusives/430/4/Cloud%20T...

Also the Solution Center has a new feature available that allows the processing of a single elephant flow to be spread across multiple Firewall Worker instances, but this capability is not mainlined yet. This feature was alluded to at the end of my CPX presentation above.

Attend my Gateway Performance Optimization R81.20 course
CET (Europe) Timezone Course Scheduled for July 1-2

TheGrave · ‎2020-05-28

You sure this is the correct file? I'm either blind or not seeing what you are referring to.

Luis_Miguel_Mig · ‎2020-06-12

Is there any way to detect elephant flows in fast path in R77.20 or earlier?

I have made the following summary reading your posts but I miss how to capture elephant flows in fast path in R77.20 or earlier.

Is this summary below correct? Am I missing anything?

- In R77.20 or earlier, you can detect elephant flows with:

* F2F traffic: with /proc/cpkstats/fw_worker_x_stats with or without cpview
* Any traffic: enabling accounting in a number of rules and looking at smartlog.

- Between R77.30 and R80.40:

* you can still use the above options
* Any traffic: priority queues and connection load tracking - cpview and smartlog
"fw ctl multik prioq 1"

- Between R80.20 Take 47 and R83.X

* you can still use all the above

* Any traffic: there is a new elephant flow detection mechanism for kernel mode
"fw ctl multik print_heavy_conn"

Timothy_Hall · ‎2020-06-12

I don't have a R77.20 gateway handy to test, but if the elephant flows are in the fastpath fw_worker stats will not show them.

Accounting is supported directly by SecureXL/fastpath and should work.

I don't think the fwaccel conns command will help much for finding elephant flows in the fastpath but give it a shot. To my knowledge there are no direct elephant flow detection mechanisms in R77.20.

I can't remember if cpview has these screens and whether they will show elephant flows in the fastpath in R77.20, but look for these screens in cpview:

Network...Top Connections
CPU...Top Connections
Advanced...CoreXL...Instances...FW-Instance#...Top FW-Lock consumers

You can also try using the CPMonitor (sk103212: Traffic analysis using the 'CPMonitor' tool) and connstat (sk85780: How to use the 'connstat' utility) tools as described in my CPX 2020 presentation here:

https://community.checkpoint.com/fyrhh23835/attachments/fyrhh23835/member-exclusives/432/3/CPX_Big_G...

Attend my Gateway Performance Optimization R81.20 course
CET (Europe) Timezone Course Scheduled for July 1-2

Amoli · ‎2021-01-05

This no longer works with a 3.10 kernel.

Are you a member of CheckMates?

R80.x Performance Tuning Tip - Elephant Flows (Heavy Connections)