CPU at 100%

The setup is brand new and running R80.10 in a HA cluster (per state VS) with VSX running only one VS so far and for the simple task of browsing the Internets (to access funny cat videos, what else could be worth it?). 

The cluster is composed of two 15600 appliances. The VS has 10 cores allocated.

We are talking about 3500 users.

A couple of times this week, we experienced high CPU usage on at least one of the cores. High enough to make the IPS go into bypass for about ten minutes each time.

The effect was that a lot of users were unable to browse the Internet (so were not able to access funny cat videos aforementioned which made people sad). Their requests got timed out.

TAC and PS haven't got anything so far.

I was wondering if someone here had any hint on where to look for clues?

7 Replies

It would be helpful if you are to provide a bit more information.

As to IPS, do you, perchance, have all protections enabled in "Prevent" mode? Smiley Happy

In the Virtual System Cluster Properties, the IPS Activation Mode is at "Detect only"

Keep in mind that detect will use more CPU than prevent. When you prevent soemthing that´s the end of it, but with detect it will just pas through all valid signatures.

Next to that correct me if I´m wrong but I understood that Dynamic dispatching should be avalable in R80.10 VSX as well?

Regards, Maarten
As Vladimir wrote we have more questions then answers . Cpview can help you determine the heavy connections. Which process takes 100% ? What about securexl optimization ? 


And what about CoreXL configuration ?

fw ctl multik stat

...Global stats ?

fw ctl pstat

Most of steps to follow are described in sk98348‌ : Best Practices - Security Gateway Performance

So far, it seems like sk61143 was the solution.

We are having the same problem for months. R80.10 running on cluster 15600 appliances with VSX. IPS is not used, mostly around 500 Mbps traffic. Time to time one of the cpu cores hit 100% and clients experience slowness issue mostly and timeouts sometimes.

We are planning to increase VS core amount from 8 to 10 this week but we don't really know if it will fix the problem.

TAC hasn't been able to find a real solution other than bypassing some mostly used ports.

Has anyone experienced the same issue and find the root cause of the problem?

