Item 1: I used a second logging server besides the management server to have a backup of the logs. In CPVIEW I saw that it was ´the top connection on one cpu instance. I removed the second logging server from configuration yesterday. SmartView Monitor shows the positive impact on the average cpu utilization. We have at least 10% less average cpu usage. Item 3: Please provide output of fwaccel stat, packet and connections acceleration looks good but if NAT templates are disabled it will cause a lot of F2F and worker overhead if you have a high new connection rate, see next item. Would caution against enabling NAT templates unless latest Jumbo HFA has been installed on your R80.10 gateway, early implementations of NAT Templates caused SecureXL problems when enabled. [Expert@gw2:0]# fwaccel stat Accelerator Status : on Accept Templates : disabled by Firewall Layer Network disables template offloads from rule #11 Throughput acceleration still enabled. Drop Templates : disabled NAT Templates : disabled by user NMR Templates : enabled NMT Templates : enabled Accelerator Features : Accounting, NAT, Cryptography, Routing, HasClock, Templates, Synchronous, IdleDetection, Sequencing, TcpStateDetect, AutoExpire, DelayedNotif, TcpStateDetectV2, CPLS, McastRouting, WireMode, DropTemplates, NatTemplates, Streaming, MultiFW, AntiSpoofing, Nac, ViolationStats, AsychronicNotif, ERDOS, McastRoutingV2, NMR, NMT, NAT64, GTPAcceleration, SCTPAcceleration Cryptography Features : Tunnel, UDPEncapsulation, MD5, SHA1, NULL, 3DES, DES, CAST, CAST-40, AES-128, AES-256, ESP, LinkSelection, DynamicVPN, NatTraversal, EncRouting, AES-XCBC, SHA256 Item 5: Your single SND core seems to be doing OK so there is probably not much RX-DRP, but your bandwidth numbers under load are suspiciously close to exactly 1Gbit as Heiko noticed, if you are pushing a 1Gbps interface that close to its theoretical limit you are probably racking up overruns (RX-OVR) like crazy. Please provide full output of netstat -ni for analysis. [Expert@gw2:0]# netstat -ni Kernel Interface table Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg eth0 1500 0 17095121 0 0 0 52469826 0 0 0 BMRU eth1 1500 0 4646226 0 0 0 56557289 0 0 0 BMRU eth2 1500 0 1527129891 0 0 0 1525915667 0 0 0 BMRU eth3 1500 0 1075040 0 0 0 1337760 0 0 0 BMRU eth8 1500 0 1928414582 0 0 0 2558669828 0 0 0 BMRU eth8.6 1500 0 23920437 0 0 0 18883083 0 0 0 BMRU eth8.8 1500 0 2005 0 0 0 1985 0 0 0 BMRU eth8.9 1500 0 368643 0 0 0 513479 0 0 0 BMRU eth8.10 1500 0 4344709 0 0 0 2264906 0 0 0 BMRU eth8.13 1500 0 10152927 0 0 0 9233849 0 0 0 BMRU eth8.20 1500 0 22986074 0 0 0 23206182 0 0 0 BMRU eth8.22 1500 0 1138885 0 0 0 1135808 0 0 0 BMRU eth8.25 1500 0 57127017 0 0 0 41214797 0 0 0 BMRU eth8.33 1500 0 2005 0 0 0 1985 0 0 0 BMRU eth8.45 1500 0 104151782 0 0 0 124167249 0 0 0 BMRU eth8.48 1500 0 11540642 0 0 0 11136870 0 0 0 BMRU eth8.225 1500 0 2005 0 0 0 1985 0 0 0 BMRU eth8.240 1500 0 2005 0 0 0 1985 0 0 0 BMRU eth8.246 1500 0 681622 0 0 0 512699 0 0 0 BMRU eth8.248 1500 0 144590 0 0 0 74329 0 0 0 BMRU eth8.1025 1500 0 1691849598 0 0 0 2326319727 0 0 0 BMRU eth9 1500 0 7594968671 0 40368 0 6335594446 0 0 0 BMRU eth9.3 1500 0 2153262412 0 0 0 1855546766 0 0 0 BMRU eth9.222 1500 0 2005 0 0 0 1985 0 0 0 BMRU eth9.228 1500 0 973876542 0 0 0 679167951 0 0 0 BMRU eth9.236 1500 0 4466434884 0 0 0 3799226077 0 0 0 BMRU eth9.600 1500 0 565502 0 0 0 802153 0 0 0 BMRU eth9.616 1500 0 506469 0 0 0 528396 0 0 0 BMRU eth9.632 1500 0 321252 0 0 0 321352 0 0 0 BMRU lo 16436 0 50074 0 0 0 50074 0 0 0 LRU Item 6: You're not using SHA-384 for any of your VPN's are you? Looks like most of your VPN traffic is fully accelerated but using SHA-384 will cause VPN traffic to go F2F. We are still using SHA1 for our VPN so this shouldn't be the issue. Item 7: Even under load it looks like your firewall workers are pretty evenly balanced by the Dynamic Dispatcher, so I doubt it is an elephant flow issue but still worth investigating as Heiko mentioned. Regarding to Heikos post I activated the monitoring and deactivated the additional logging to a second server. Item 8: High load can be caused by an overloaded sync network in a cluster, please provide output of fw ctl pstat. Might need to do selective synchronization for protocols mentioned in Item 1, as combo of suddenly high connection rates and having to state sync them too is a nasty double whammy on the workers. [Expert@gw2:0]# fw ctl pstat System Capacity Summary: Memory used: 8% (956 MB out of 11908 MB) - below watermark Concurrent Connections: 26321 (Unlimited) Aggressive Aging is enabled, not active Hash kernel memory (hmem) statistics: Total memory allocated: 1245708288 bytes in 304128 (4096 bytes) blocks using 1 pool Total memory bytes used: 0 unused: 1245708288 (100.00%) peak: 158797568 Total memory blocks used: 0 unused: 304128 (100%) peak: 41269 Allocations: 2201133546 alloc, 0 failed alloc, 2200275358 free System kernel memory (smem) statistics: Total memory bytes used: 1906785160 peak: 2020497636 Total memory bytes wasted: 3349445 Blocking memory bytes used: 5669748 peak: 6996596 Non-Blocking memory bytes used: 1901115412 peak: 2013501040 Allocations: 58204 alloc, 0 failed alloc, 55555 free, 0 failed free vmalloc bytes used: 1895716872 expensive: no Kernel memory (kmem) statistics: Total memory bytes used: 757310984 peak: 923289992 Allocations: 2201189988 alloc, 0 failed alloc 2200330487 free, 0 failed free External Allocations: 40192 for packets, 100097979 for SXL Cookies: 1371627468 total, 0 alloc, 0 free, 488463 dup, 2953172806 get, 24166610 put, 1383957773 len, 7898799 cached len, 0 chain alloc, 0 chain free Connections: 48945945 total, 28304124 TCP, 19894499 UDP, 738239 ICMP, 9083 other, 1572699 anticipated, 9081 recovered, 26321 concurrent, 102039 peak concurrent Fragments: 10397559 fragments, 5181881 packets, 954 expired, 0 short, 0 large, 3 duplicates, 0 failures NAT: 10938250/0 forw, 5598392/0 bckw, 14600550 tcpudp, 1935721 icmp, 2721881-880295 alloc Sync: Version: new Status: Able to Send/Receive sync packets Sync packets sent: total : 55843064, retransmitted : 45, retrans reqs : 438, acks : 7034 Sync packets received: total : 404227, were queued : 1182, dropped by net : 1729 retrans reqs : 15, received 23774 acks retrans reqs for illegal seq : 0 dropped updates as a result of sync overload: 915 Callback statistics: handled 23657 cb, average delay : 1, max delay : 222 Item 9: Doubtful you are having memory issues, but please provide output of free -m anyway. [Expert@gw2:0]# free -m total used free shared buffers cached Mem: 15877 4825 11051 0 258 1446 -/+ buffers/cache: 3121 12756 Swap: 18441 0 18441 Item 10: You could be getting flooded with fragmented packets during a high CPU event (frags always go F2F in your version), please provide output of fwaccel stats -p. [Expert@gw2:0]# fwaccel stats -p F2F packets: -------------- Violation Packets -------------------- --------------- pkt is a fragment 3884239 ICMP miss conn 4399923 TCP-other miss conn 4580451 other miss conn 181668 ICMP conn is F2Fed 4805131 UDP conn is F2Fed 637444 uni-directional viol 27 TCP state viol 4922725 bridge, src=dst 0 sanity checks failed 0 fwd to non-pivot 0 cluster message 0 PXL returned F2F 5573 chain forwarding 0 Tmpl no-match time 0 route change 0 outbound zone change 0 pkt has IP options 64750 TCP-SYN miss conn 605516731 UDP miss conn 76945540 VPN returned F2F 1884 TCP conn is F2Fed 266741774 other conn is F2Fed 0 possible spoof viol 0 out if not def/accl 9 routing decision err 289323 temp conn expired 921 broadcast/multicast 0 partial conn 0 cluster forward 0 Tmpl no-match range 0 general reason 0 inbound zone change 0 In the period we had the high CPU load the "TCP-other miss conn" raised strongly. #Start | CPVIEW.Advanced.SecureXL.F2F-Reasons | [25Mar2020 14:17:00] |----------------------------------------------------------------------- | F2F Reasons | | Reason #Packets % out of Total | TCP-SYN miss conn 360,182,668 57% | TCP-other miss conn 4,339,478 0% | TCP conn is F2Fed 183,708,905 30% #End | CPVIEW.Advanced.SecureXL.F2F-Reasons | [26Mar2020 13:18:00] |----------------------------------------------------------------------- | F2F Reasons | | Reason #Packets % out of Total | TCP-SYN miss conn 720,184,973 20% | TCP-other miss conn 2,366,173,695 67% | TCP conn is F2Fed 191,575,101 30%