Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
RS_Daniel
Advisor

High CPU

Hello CheckMates,

I am facing a case with customer who has very high CPU usage constantly on 95-100% on a 5100 HA cluster. Whe i run top, processes which consume most cpu are rad, fw_worker_1, fw_worker_0 and fwd all of them with 20% in average each one. So it would seem to be normal to me. The problem is that the troughput of this cluster is around 200 MB, and according to datasheet 5100 appliance capacity is 1 GB. I understand the appliance has only two cores and that there are many variables on the configuration that could cause this behavior, but it is quit difficult say to the customer that it is normal 100% CPU with only one-fifth of its capacity of the datasheet number. I am attaching s7pac output in case i am missing something and someone could give it a look and provide some help. Thanks in advance!!!

[Expert@hostname:0]# enabled_blades
fw vpn cvpn urlf av appi ips identityServer anti_bot

top+c.PNG

Regards

 

 

0 Kudos
10 Replies
PhoneBoy
Admin
Admin

Is this a full HA cluster (management also on cluster members)?
Also eth1/2/3 have a bunch of errors on them, which might explain a few things also.

the_rock
Legend
Legend

I had customer once on base R80 with this sort of issue and we fixed it by disabling corexl from cpconfig, rebooting, re-enabling, rebooting again. Im not sire if that simply did corexl "reset", but we never saw problem again. Not saying it would fix it in your case, but wort a try. Andy idea when this started happening?

0 Kudos
Timothy_Hall
Champion Champion
Champion

That box is way, WAY overloaded.  200Mbps seems about right for the blades you have enabled given it is a 5100 with a dual-core Celeron.  The network cabling is clean, and the RX-DRP/RX-OVR drops are being caused by a lack of CPU cycles available for SoftIRQ and subsequent ring buffer overruns.

Unlikely, but please provide output of cpstat -f sensors os as you may have a fan failure and CPU underclock condition which will cause extremely high CPU load like this.

There is plenty of memory, your issue is CPU utilization.  Most likely culprit is IPS based on my experience.  Try this to help isolate the issue:

ips off

(wait 60 seconds)

(remeasure CPU load, did it drop significantly?)

ips on

Since it is a 2-core box you could try flat-out disabling CoreXL and leaving it off for a 1/1 split instead of the default 2/2 but given your 92% PXL doing so will almost certainly make things much worse.

You may be able to get some gains with APCL/URLF/AV/ABOT policy tuning, but I think this box is so far over the red line it wouldn't make much of a difference.  Some updated hardware is probably in order.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
genisis__
Leader Leader
Leader

Tim - the datasheet implies this should be able to handle 1Gbps TP traffic (yeah right!),  so realistically if we assume 50% of that figure is the real world usage, the 5100 should be about 45-50% CPU utilisation estimated.

Personally I think Checkpoint need to change the throughput figures in there datasheets based on real world scenarios, such as:

- Firewall policy with 600 rules that are not optimised

- 100 NAT rules

- 60% traffic going through medium path

- 10% traffic going through slow path

- 20% traffic going through fast path

- all blades enabled accept with https inspection (what's the point otherwise)

I also think CP need to really think about there hardware offering ie. they need to offload the https inspect to a dedicated hardware module, like Fortigate.   This would allow better throughput figures using https inspection in lower end devices, I mean who is going to run https inspection on a 6200 or even 6400, realistically to use https inspection in todays platforms you are realistically talking 6700 appliances or above to probably get 300-400Mbps throughput with https inspection and blades enabled (not tested this so could be talking complete rubbish).

In a way Checkpoint themselves are acknowledging this by the fact the sizing tool does not have https inspection listed and you have to actually get the SE to confirm the correct sizing...food for thought guys.

RS_Daniel
Advisor

Hello,

Thank you all for your replies!! Here the requested info:

Is this a full HA cluster (management also on cluster members)? No, it is a regular clusterXL centrally managed

[Expert@hostname:0]# cpstat -f sensors os

Temperature Sensors
----------------------------------------------
|Name |Value|Unit |Type |Status|
----------------------------------------------
|Intake Temp|28.00|Celsius|Temperature| 0|
|Outlet Temp|29.00|Celsius|Temperature| 0|
|CPU Temp |30.50|Celsius|Temperature| 0|
----------------------------------------------

Fan Speed Sensors
--------------------------------------
|Name |Value |Unit|Type|Status|
--------------------------------------
|System Fan1|6250.00|RPM |Fan | 0|
|System Fan2|6250.00|RPM |Fan | 0|
|System Fan3|5818.50|RPM |Fan | 0|
--------------------------------------

Voltage Sensors
---------------------------------
|Name |Value|Unit|Type |Status|
---------------------------------
|VCore|1.72 |Volt|Voltage| 0|
|+12V |11.93|Volt|Voltage| 0|
|3.3V |3.31 |Volt|Voltage| 0|
|VDIMM|1.50 |Volt|Voltage| 0|
|+5V |5.09 |Volt|Voltage| 0|
|VBAT |3.15 |Volt|Voltage| 0|
---------------------------------

 

Turning off ips did not make any change, literally 100% all the time. 

What version of Checkpoint are you running and with what JHFA? R80.40 JHA Take 125

I think i'll try to do the policy optimization but based on IPS test, think it will not be enough.

0 Kudos
genisis__
Leader Leader
Leader

can you give us the output for "enabled_blades"éxample:

# enabled_blades
fw vpn urlf av appi ips anti_bot content_awareness mon

I'm also running R80.40 with JHFA125, but on a 6400 and don't experience this issue, are you running any VPNs or confirm the gateway is not running as a proxy?

 

Also a silly thing, can you type 'fw ctl debug 0' to ensure no debugging is running.

Timothy_Hall
Champion Champion
Champion

Sensors are good, the box is just overloaded.  My statement concerning 200Mbps being about right as the ultimate capacity of a 5100 with the given blades is just my opinion independent of what the data sheet says.  The fact that disabling IPS didn't seem to have any effect would appear to confirm that the box is just way oversubscribed.

Good suggestion from @genisis__ to make sure debugging is disabled.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
genisis__
Leader Leader
Leader

What version of Checkpoint are you running and with what JHFA?

0 Kudos
RS_Daniel
Advisor

Hello,

Thank you @Timothy_Hall and @genisis__ for you suggestions. I have checked debug is disabled on the box. I agree the box is just overloaded, we are running cpsizeme to recommend a hardware upgrade to customer.

However the fact that that numbers are so different from what we can find on datasheets is not the best, we as partners use datasheets to size new deployments, of course we take a margin of 50%, but in this case the difference is 1 to 5!  

cpview.PNG

Maybe some other conditions should be specified on datasheet to take into consideration when doing the sizing. Thank you all again for your help and comments.

Regards

0 Kudos
genisis__
Leader Leader
Leader

This is my point as well.   I've got to the point where I now insist Checkpoint setup a real life scenario setup and confirm the figures or they suggest the correct model to use and back that commercially.

We all know to assume to take the figures with a pinch of salt and assume 50% less, but realistically why...Checkpoint should publish real-world figures based on a typical scenario as I have suggested in this thread.

The impact (and this is what I have seen), customer will move to another vendor, rather then stick to Checkpoint purely because they loss that trust.

I hope Checkpoint read this thread and take a lessons learned and do something positive with it.

 

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    Thu 25 Apr 2024 @ 11:00 AM (SGT)

    APAC: CPX 2024 Recap

    Tue 30 Apr 2024 @ 03:00 PM (CDT)

    EMEA: CPX 2024 Recap

    Wed 01 May 2024 @ 02:00 PM (EDT)

    South US: HTTPS Inspection Best Practices

    Thu 02 May 2024 @ 11:00 AM (SGT)

    APAC: What's new in R82

    Thu 25 Apr 2024 @ 11:00 AM (SGT)

    APAC: CPX 2024 Recap

    Tue 30 Apr 2024 @ 03:00 PM (CDT)

    EMEA: CPX 2024 Recap

    Wed 01 May 2024 @ 02:00 PM (EDT)

    South US: HTTPS Inspection Best Practices

    Thu 02 May 2024 @ 11:00 AM (SGT)

    APAC: What's new in R82
    CheckMates Events