Hi,
We've been having trouble with high CPU usage spikes on a 5100 cluster at one of our offices - on and off for a few months. These issues typically happen in the weekend (it so happens that this office is mostly used during the weekend) and the events generally only last for a few minutes.
When such issues occur, we typically notice the following:
- High CPU can be observed on CPView (see attached screenshot) on both cores.
- Generally network protocols seem to be affected - for example ISP redundancy is disturbed (to the extent that we've had to disable this as it was causing a snowball effect of issues) and BGP sessions dropped.
- There is nothing in the logs leading up to the event that would indicate any problem. I've checked /var/log/messages, dmesg, routed.log and routed_messages (the latter shows the dropped BGP sessions and ISP redundancy flaps but these are an effect of the high CPU, not a cause).
- Whilst the issues generally happen in the weekend in the afternoon, there is no exact/repeatable timestamp at which they occur (which means we cannot link what's happening to any specific process kicking off).
- Leading up to and after such events, the CPU generally sits somewhere between 40-60% so there's no indication of any impending issue.
Would you be able to help me troubleshoot this further as I'm at a bit of a loss as to what I could look at next?
Thanks,
Joe
-