Solved: Re: 81.20 Performance/CPU issue

xiro

Hi Guys,

I'll try to post here, maybe some of you already had such an experience.

A customer has performance issues via VPN for quite some time now. We could identify multiple issues in the network. One of our proactive actions was also to update the FW to the current 81.20 T53.

Afterwards, during further tshoot, I realized an even more odd behavior:
When transferring a file via SMB from a server behind the CPGW, The throughput starts at 30MBps, after a few seconds it drops to 2MBps and stays there for a few minutes. At the same time one CPU rises to 90%+. After these few minutes, it data rate jumps up again to ~25-30MBps, and CPU drops to approx. 70.

After opening a SR and talking to TAC, the proposed changes of the ringsize buffers (RX to 1024, TX to 2048 / from 256/1024).
Interfaces are 1G copper. Also they proposed tuning of CPAS (max burst, ssthresh, queue_size_factor).

After performing the change of the ringsize buffers 2 days ago, performance was great, constantly 85-90MBps rate and fine CPU.

Unfortunately, today we again see similar behavior. Not the same as before, but clearly a massive impact on packet drops, and CPU is back at ~90% during the transer.

Implementing the CPAS change additionally didn't help.
In other threads I read, that the default values should be fine for the buffers for 1G interfaces.

Do you guys have any other idea what could be the cause?

Timothy_Hall

In response to your numbered list:

1) A sudden increase in throughput caused by a failover is highly suspicious and tells me you don't have IPS turned off like you think you do. Unless you have correctly defined a blade-based exception or null profile, IPS enforcement via passive streaming will continue. Streaming inspection is not synced between cluster members, so upon failover any connections being actively or passively streamed by default will be allowed to continue and go fastpath, thus suddenly speeding up. Here is a screenshot from a lab exercise in my Gateway Performance Optimization Course discussing this:

2) You've got something wrong in your policy and internal traffic is being much more heavily inspected than it should. Start by unchecking all TP blades, and in any APCL/URLF policy layer make sure you are using object "Internet" (not All_Internet) in the destination of EVERY SINGLE rule involving APCL/URLF. Do you have any policy install warnings about this? Also if you are using Custom Application/Site objects in your APCL/URLF policy and the "*" wildcard is in use, the Pattern Matcher (PM) code will drive up the CPU which you showed in one of your screenshots. IPS also uses the PM extensively.

Also make sure your cluster object's topology is completely and correctly defined and the External interface is properly set so that object "Internet" is properly matched. Do you have checkbox "interface leads to DMZ" set on any interfaces? That will make them treated as External and their traffic more heavily inspected. These are the main tips covered in my course for Access Control but there many more.

3) If IPS/AV/ABOT appear in the output of enabled_blades they are NOT off. Completely uncheck them on the cluster object.

4) Yes, overall general performance will be heavily impacted by #2.

5) I'd try to adjust your policy first before considering disabling SMT. However the more you describe your problem the more it sounds like you are running into bottlenecks caused by all traffic the of a single connection (SMB, VPN) only being able to be processed by one core (Hyperflow notwithstanding for certain operations). SMT hurts you in this scenario.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com

View solution in original post

Lesley

Depends what the customer complains about, if it is only file transfer or other stuff?

Or are you 'testing' with file transfer to see what speed the customer is getting?

SMB traffic will be inspected, better share what blades are enabled on this gateway.

My guess is that anti-virus blade with IPS is causing slower speeds.

-------
If you like this post please give a thumbs up(kudo)! 🙂

xiro

- it's other stuff too (raised as RDP-issue)

- file transfer is our best way to reproduce it

- blades are IPS, AV, AB - client has a global exception in the TP ruleset, and the first measure was to disable the AV/AB blades (didn't help)

What seems strange to me is the massive increase in speed initially after alternating the ringsize buffers.

I think the issue could be something related to it...

Lesley

Try fastaccel: https://support.checkpoint.com/results/sk/sk156672

If traffic does not hit the fastaccel rule it means it still touched by blades.

Meaning the exception you have made are not working.

What version and take are you running?

-------
If you like this post please give a thumbs up(kudo)! 🙂

the_rock

Good call.

John-Haynes

Make sure that the rules are using the predefined "smb" and not the "microsoft-ds". Like Lesley said, it's probably an IPS check or AV scanning on SMB that could be slowing it down. When going through the IPS checks, make sure to look for both labelled as "smb" and "cifs".

When in doubt, try fast_pathing the traffic and see how it works. You could also try looking at hcp when testing and hope you get lucky. Make sure to enable the test for that: hcp --enable-test "Protections Impact"

I ran into an issue where AV scanning for SMB was not enabled but it was running. Even the am_protections_override.C had everything defined correctly. The fix was to reset all of the protection definitions.

xiro

the test client in question has a "any any" rule in place, and therefore uses "microsoft-ds", because it seem to be the one that is the one that "matches any" by default. Rule is placed at position 1, to rule out any ruleset-related issues.

As described above (additional screenshot attached), the speed increased to a great level after tweaking the ringsize-buffers. But unfortunately it's back after 2 days. During that time, no changes were made to the policy or any other settings. As mentioned in the answer above, AB/AV were already disabled, with no changes to the behavior.

This is what it looks like in cpview during a transfer:

Chris_Atkinson

Which gateway model is this?

S2S or RA VPN and which algorithms are used?

Is MSS clamping enabled?

CCSM R77/R80/ELITE

xiro

- it's a 5900 Cluster

- VPN & all that comes with it we already analyzed, also MTU-sizes etc. - but our problem in question here is not VPN related, since this is a drill-down test from a client that is attached directly to the transfer-NW of the firewall, and accesses a server in a DMZ.

I assume that we may have drops somewhere on provider side that cause the low speeds from VPN, but since my behavior on the firewall is like it is in the screenshots above, all my E2E testing and capturing on different spots in the path is in vain, because the CP GW is causing so many drops.

Timothy_Hall

Based on the snippets I've seen I'd speculate it is IPS that is slowing you down and driving up the CPU (due to the high PM utilization), although it sounds like you may have multiple issues since it looks like you are seeing high CPU on your dispatchers too which shouldn't be caused by IPS. I doubt changing the ring buffer sizes actually helped.

Next time you start seeing slow SMB transfers and/or high CPU utilization, run ips off -n on the gateway from expert mode, then start a series of *new* SMB transfers between new IP addresses. Are they substantially improved for as long as IPS is off? Don't forget to re-enable IPS when you are done testing with ips on.

Beyond that, please post the output of s7pac, ideally run while the system is experiencing slow transfers. Also please provide the output of the following additional commands from expert mode:

enabled_blades

dynamic_balancing -p

mq_mng –o –v

connection_pipelining status

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com

xiro

Hi Timothy,

unfortunately deactivating IPS didn't help. (same as AV/AB)

Please find attached the outputs from the commands you stated above.
During my research I also stumbled upon https://support.checkpoint.com/results/sk/sk181860 - which describes a similar issue when many interfaces are configured (in our case we use all 8 eth-IFs), but I the thing with "get_dev2ifn_hash" doesn't match.

Oh and just to mention: we are aware of an issue, that eth2 (leading to the servers) sometimes peaks up to 1G, therefore leading to drops. But this is a scenario that occurs rarely (e.g. 3x for 5 minutes on Thursday). This behavior here is currently occuring always, event when the throughput through he whole FW is <800M (as you see in the screenshot above).

We are currently looking into either bonding eth2, or upgrading the FW with 10G cards (unfortunately they're EOS).

But that's something we'll adress in the next days - nevertheless it doesn't seem to be related to this issue, which is persistent & reproducable at any time.

Timothy_Hall

Core 1 which you referenced in your last screenshot along with sk181860 is generally a Firewall Worker Instance on an SMT 16-core gateway and not an SND unless Dynamic Split had added another SND when you took that particular screenshot.

So in looking through everything here is what I see:

1) The act of changing the ring buffer size does reinitialize the interface very briefly and I suspect that is what influenced the speed to increase, not the buffer size increase itself. From expert mode try ifdown ethX;ifup ethX when you are seeing the slow performance, does that suddenly speed it up for awhile? If it does that could suggest some kind of issue with the SND code, as Multi-Queue seems to be doing a very good job of spreading traffic around the SND cores as much as possible, but the interface reset will be causing MQ to reinitialize there too so it can't be ruled out. igb (which is implementing MQ here) is a rather old driver (5.3.5.20), and will tentatively be updating to 5.12.3 for R82.

2) Hyperflow/pipelining is working well on your system, but SMB is not eligible for it until (hopefully) R82. So an SMB elephant flow will be stuck on one core and top out as far as performance, and there is not much you can do about it.

3) Make sure the Anti-Virus blade is NOT scanning SMB traffic in the relevant TP profile for the SMB traffic. This was mentioned earlier in the thread. fw stat -b AMW on its "files:" line can be used to check this.

4) There has also been a longstanding problem with SMB/CIFS traffic ending up in the Medium Path although it should be in the fastpath. This was supposed to be fixed in R81.20 Jumbo HFA Take 43+ which you have. You may need to adjust your policy to ensure that APCL/Threat Prevention is not trying to deep inspect SMB traffic, failing that try fast_accel'ing the SMB traffic into the fastpath. sk156672: SecureXL Fast Accelerator (fw fast_accel) for R80.20 and above The command fw ctl multik print_heavy_conn can help you find the attributes of SMB elephant flows needed to set up a fast_accel rule.

5) You have a very high percentage of fully-accelerated traffic (Accelerated pkts) yet really only one physical core acting as an SND with SMT enabled on core 0 and its sibling 8. Dynamic Split can add more SNDs if the Firewall Worker Instances aren't very busy, but given the features you have enabled I'd say they are probably busy quite a bit of the time. In my experience once SND utilization goes north of 75% with SMT enabled, performance starts to degrade quite markedly as the two SND instances "bang into" each other trying to reach the physical core; Firewall Worker instances do benefit about 30% from SMT, but SNDs under heavy load most definitely do not. This may be one of those cases where turning off SMT and going back to 8 full cores and its initial 2/6 split (subject to change by Dynamic Split of course) would probably be a good idea. This was also mentioned earlier in the thread and is your final option if the four ones above don't solve the issue.

6) There are some network errors including rather rare TX errors, but there are so few compared against the total amount of frames it is not worth worrying about at this point.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com

xiro

ad 1) you are right. I did not try resetting the IF, but failing over to the standby member (which has no changes to ringsize buffers) - and voila: Speed for the test is constantly ~90-110 MBps. It took a few minutes and went back down again.

After failing over back and forth, I couldn't reproduce it any more (for at least ~20 minutes) I need to try later again. Not sure when it will hit again. (I think after changing the RS buffers initially, it took also raound 2 hours to re-occur)

Strange thing is - this worked for years and suddenly came up a few weeks ago. (Initially experienced as slow VPN performance, but the reason for it seem to be drops on the CP.)

Updating from 81.10 to 81.20 T53 a few days ago didn't resolve the issue.

ad 2) that is strange, because as mentioned above the issue started out of the blue. Traffic was transferred over the FW for years, not sure why it now suddenly shouldn't be able to handle it. That's not an amount or traffic that should be worrying at all in my opinion.

Also: A 100k-FW-Cluster that can' put through more than ~3MB/s for a File Transfer of 4GB - I mean... the first thing the customer will ask me is: "How can we replace this thing with a Forti?" - and I'd absolutely understand him.

ad 3) as mentioned, AB/AV and IPS were already disabled, without any effect. Also, this is traffic leading from DMZ to DMZ from GW-Perspective, therefore AB or HTTPSI shouldn't even apply at all.

ad 4) The problem is, that it is general performance, not only SMB. As mentioned above, initially the problem was reported as "VPN users can't use RDP". SMB was then our easiest test-method, since it was also affected. Now we nailed down, that independent of the internet connection & VPN-Gateway, a client directly in front of the CPGW has the same problem. So it is our best & easiest way to reproduce it. Therefore fast_accellerating specific connections unfortunately isn't a solution for all of our problems. 😞

ad 5) Not sure if I should tune around with HT, since I'm still on this issue with TAC.

But do I understand this correctly: My SNDs are 0 & 8, but the cores that are peaking are random: 1, 13, 4, etc.

Do you think that disabling SMT would really help? If so, I'd try to arrange a MW.

Anything special to take care of?

Thank you for all your efforts!

Lesley

Pulling traffic via fastaccel in this case might not be a solution but an indication what the problem could be.

You posted that you excluded this traffic from AB,AV and IPS. Actually this is not true(maybe not for IPS, with ips off). I would recommend to temp disable the blade in total. So disable it under firewall gateway object and push policy. Do this only if you are not able to get the traffic in the fastaccel. If the counter does not increase for the fastaccel rule it means traffic is not going via it and will be inspected. I can explain how to make a proper exception via rulebase but it is a bit complex so maybe better to turn off blade for a short period to see result.

Second remark, that you are now testing with windows share. In this case the problems are VPN related so maybe you are now solving 2 different issues. Why not test SMB via the VPN tunnel? To solve issues with VPN clients is a different approach then getting more speed in windows share copies. For example encryption alg could be a huge impact on VPN clients.

I would not be surprised if 3DES is maybe used for the VPN clients. This has serious performance impact as stated in: https://support.checkpoint.com/results/sk/sk73980

If a user logins you can see what it uses in the SmartConsole logs. Or check here: Smart Console -> Global properties -> Remote Access -> VPN auth -> edit. Post what you have for p1 and p2.

Here another SK for general VPN performance guide:

https://support.checkpoint.com/results/sk/sk105119

-------
If you like this post please give a thumbs up(kudo)! 🙂

xiro

Thanks, I'l try this - but (un)fortunately, since my failovers yesterday, the problem currently isn't reproducable. I will check again later and try to implement your suggestions.

Regarding VPN: as mentioned I'd exclude it as a source of the problem. VPN is not terminating on CP, it is on a separate Cisco FW, which is then connected to CP.

When CP doesn't show this issue with SMB/CPU like now, then also the E2E performance through VPN is fine. (10MB instead of 700KB).

Lesley

Hmm that changes the story. If the VPN is not terminated on the Check Point it cannot do a lot of the inspection. Traffic will be encrypted and most IPS cannot be done on it. You could still try it to put it via fastaccel since you have a source scope network.

For CIFS performance, (not related to the VPN issue) it can really help to use fastaccel.

-------
If you like this post please give a thumbs up(kudo)! 🙂

xiro

mmh not sure if you understood my explanation correctly: VPN-GWs inside interface is connected to a DMZ of the CP GW. VPN-GW is connected directly to Internet on the outside IF.
Therefore VPN is terminated there, traffic is decrypted and then routed through the CP-GW (cleartext).

So to exclude the VPN part, we just placed a Test-client in the same network as the inside-IF of the VPN-GW and are testing with it. (Traffic comes in on the CP the same way it would come from a VPN client - just with other TCP setting/MSS etc.)

But since we see approximately the same amount of drops etc., it is a proper test and yes, theoretically CP can inspect.

Timothy_Hall

In response to your numbered list:

1) A sudden increase in throughput caused by a failover is highly suspicious and tells me you don't have IPS turned off like you think you do. Unless you have correctly defined a blade-based exception or null profile, IPS enforcement via passive streaming will continue. Streaming inspection is not synced between cluster members, so upon failover any connections being actively or passively streamed by default will be allowed to continue and go fastpath, thus suddenly speeding up. Here is a screenshot from a lab exercise in my Gateway Performance Optimization Course discussing this:

2) You've got something wrong in your policy and internal traffic is being much more heavily inspected than it should. Start by unchecking all TP blades, and in any APCL/URLF policy layer make sure you are using object "Internet" (not All_Internet) in the destination of EVERY SINGLE rule involving APCL/URLF. Do you have any policy install warnings about this? Also if you are using Custom Application/Site objects in your APCL/URLF policy and the "*" wildcard is in use, the Pattern Matcher (PM) code will drive up the CPU which you showed in one of your screenshots. IPS also uses the PM extensively.

Also make sure your cluster object's topology is completely and correctly defined and the External interface is properly set so that object "Internet" is properly matched. Do you have checkbox "interface leads to DMZ" set on any interfaces? That will make them treated as External and their traffic more heavily inspected. These are the main tips covered in my course for Access Control but there many more.

3) If IPS/AV/ABOT appear in the output of enabled_blades they are NOT off. Completely uncheck them on the cluster object.

4) Yes, overall general performance will be heavily impacted by #2.

5) I'd try to adjust your policy first before considering disabling SMT. However the more you describe your problem the more it sounds like you are running into bottlenecks caused by all traffic the of a single connection (SMB, VPN) only being able to be processed by one core (Hyperflow notwithstanding for certain operations). SMT hurts you in this scenario.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com

Chris_Atkinson

HyperFlow support for SMB traffic will come with R82. However End of Engineering for 5900 approaches.

CCSM R77/R80/ELITE

Alex-

The 5800 and 5900 are outliers in the 5000 series as they're the only one with Xeon hyperthreaded processors, the others have either 2 or 4 physical cores with I-x CPU's. I believe at some point in time the 5900 was retired leaving only the 5800 with such configuration available.

I've seen a few of specific issues happening on them and never on the other appliances with "classical" cores running the same blades.

One of the issues was solved disabling hyper threading to go to physical 8 cores. Luckily for that implementation, the reduced CPU capacity was still enough to support traffic until the issue got fixed by a hotfix somewhere in R81.10.

the_rock

Chris brings up an excellent point. This usually boils down to mss clamping / mtu.

Andy

Are you a member of CheckMates?

81.20 Performance/CPU issue