Hello Val,
The biggest issues are with the 5000 series. Yesterday we had to fail over a 5200 fw to the stand-by member, then reboot the firewall. I have a SR for a 5600 series.
CPU usage started increasing from 7-8% for no reason to 80% (when we failed over not to impact the availability of the services).
This resolution is from TAC case opened on 15/11:
"the issue stemmed from the overutilization of SNDs and out-of-order packets. Upon further investigation within SND, the identified thread function causing the issue is:
# Overhead Command Shared Object Symbol # ........ ....... ................. ...............................................
# 90.95% snd_c [kernel.kallsyms] [k] native_queued_spin_lock_slowpath
TAC engineer recommended a couple of config changes and tweaks:
1) Enable the drop template on the cluster
2) Enable Dynamic balancing on both members. - not the case on 5200/5400 as they both have 2 core. For 5600 series we are going to configure an additional core for SND. (2 for SND and 2 for FW).
3) Prioritize the placement of the most frequently used rules at the top of the policy list. - I don't think this is an issue as this FW CPU load is usually between 7%-15 %
4) If feasible, expedite port 1024 in the fast path. - this is the TAC engineer was wrong ( I believe) as cpview showed 24% CPU utilization for TCP: 1024 and he thought that means high traffic on this port. In fact what I understood is that TCP:1024 means usage on ports above TCP:1024..
Basically, we still don't know what triggers this strange behavior on 5000 series on 81.20. It was rock stable on 80.40.
"root cause of the issue as SND and out-of-order packets" Why? What is causing this behavior on R81.20? Are the appliances to old for 81.20? I know they are EOS Dec 2025.
Thanks!
@ the_rock - that's good to know!!