Solved: Re: High CPU utilization on one gateway core.

Muazzam · ‎2019-08-05

Environment:
Management: R80.20 T47
Gateway clusters: R77.30 T216
Hardware: 13800

There are two 10g interfaces. The CPU tied to ingress interface spikes to 100% during the busy times. Typical bandwidth (from cpview) is around 2 Gbps, concurrent connections during peak times is about 500K. Over 80% of the processing is done by SXL. Fw ctl multik stat show equal distribution of load among all workers. MQ is currently not enabled.
The other egress interface showing the same amount of traffic, hardly see spikes there, typically stays 60% or under (during peak).

Total of 20 CPUs with 14/6 split. Workers never show high load, normally stays around 20% (when the ingress CPU is spiking to 100%), otherwise stays at 5%. This looks like a typical example where dispatcher is loaded but workers are low. One recommendation is to move 2 workers to SND and enable MQ. This make sense to me but I have another similar cluster that has much less load (BW around 1 gig, and 200K concurrent connections) but showing same symptoms.

Questions are: Why the CPU tied to the egress interface does not show high load?
What else I can check to make sure what is actually causing the load on CPU?

Timothy_Hall · ‎2019-08-05

Sounds like your gateway is handling the load just fine, wouldn't hurt to turn on MQ for your busy interfaces though to provide some extra headroom.

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course

View solution in original post

Maarten_Sjouw · ‎2019-08-05

This will be due to the fact that most of the sessions are started from the end of the interface where you see the peak in CPU usage, when enable cpmq on the interface this load should be divided over more cores.

Regards, Maarten

Timothy_Hall · ‎2019-08-05

When you say a 14/6 split do you mean 14 workers and 6 SND's? Normally the number of SNDs is first (at least how I write it - so your split is 6/14). When you say 80% of traffic is handled by SecureXL I assume you mean 80%+ is displayed on the "Accelerated pkts" (not "Accelerated conns") line in the output of fwaccel stats -s?

Even if an SND core gets spiked up occasionally that does not necessarily mean you need to do anything, however if you run netstat -ni and the RX-DRP rate for one or both busy interfaces is >0.1% then you most certainly do need to do something about it. If you have only two busy interfaces the 6 SND's you have now is plenty; you just need to enable Multi-Queue on those interfaces to allow the traffic load from your busy interfaces to spread out across the 6 SND's instead of only being handled by two of them (one per interface).

When a high percentage of traffic is accelerated, most of the intensive inspection processing happens on the SND core handling the bulk of the inbound traffic based on your traffic patterns. If you have SMT/Hyperthreading enabled (/sbin/cpuinfo) with such a large amount of fully-accelerated traffic, you may want to consider disabling it and adjusting your split to 4/6 (but try enabling MQ first before thinking about disabling SMT). SMT/Hyperthreading actually hurts performance if a large amount of traffic is fully accelerated due to contention between separate SND/IRQ threads fighting each other for the same physical core under high load.

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course

Muazzam · ‎2019-08-05

Thank you Tim.

FYI: I ran the "Super Seven" commands on the gateway.

Yes, it is 6/14 split. When I ran the "fwaccel stats -s" command, the output for the Accelerated pkts/Total pkts was 83%.

SMT is disabled (not ON).

Almost no drops on "netstat -ni".

Timothy_Hall · ‎2019-08-05

Sounds like your gateway is handling the load just fine, wouldn't hurt to turn on MQ for your busy interfaces though to provide some extra headroom.

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course

Are you a member of CheckMates?

High CPU utilization on one gateway core.