Sure,
spike detective is reporting to /var/log/messages,
we got the following messages over and over again, especially during peak times with a lot of traffic passing through, note the fwk1_hp (according to TAC this is the high priority queue)
Jul 20 15:28:51 2022 <GWNAME> spike_detective: spike info: type: thread, thread id: 3383, thread name: fwk1_hp, start time: 20/07/22 15:28:26, spike duration (sec): 24, initial cpu usage: 100, average cpu usage: 100, perf taken: 0
Jul 20 15:28:57 2022 <GWNAME> spike_detective: spike info: type: cpu, cpu core: 5, top consumer: fwk1_hp, start time: 20/07/22 15:28:26, spike duration (sec): 30, initial cpu usage: 84, average cpu usage: 79, perf taken: 1
Jul 20 15:29:03 2022 <GWNAME> spike_detective: spike info: type: cpu, cpu core: 21, top consumer: fwk1_hp, start time: 20/07/22 15:28:56, spike duration (sec): 6, initial cpu usage: 85, average cpu usage: 85, perf taken: 0
Jul 20 15:29:03 2022 <GWNAME> spike_detective: spike info: type: thread, thread id: 3383, thread name: fwk1_hp, start time: 20/07/22 15:28:56, spike duration (sec): 6, initial cpu usage: 100, average cpu usage: 100, perf taken: 0
For the virtual router, which is normally using 5-10% cpu, this is not normal
The rest was good old detective work with top to identify the VS causing the issue (virtual router in our case).
In the VS $FWDIR/log/fwk.elg to verify that the cluster status is caused by missing CCP packets, messages like this:
State change: ACTIVE -> ACTIVE(!) | Reason: Interface Sync is down (Cluster Control Protocol packets are not received)
even though interfaces are up and you can ping the neighbour and we had enough SNDs to handle all the traffic and all in all the rest of the VS were consuming cpu in a low level, which showed most of the cores idling.
The TAC engineer used one additional command in the remote session, which i failed to note, that showed the various CPU hits for the kernel modules in percent which also displayed the fwk1_hp on top and following all that, he suggested to turn of priority queue.
To deactivate priority queue :
- fw ctl multik prioq
- Select 0 to disable the feature.
- Reboot the device.
BR,
Markus