<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Persistent fwFullyUtilizedDrops on 26000 appliances (R81.20) – Is disabling SMT a valid next ste in Firewall and Security Management</title>
    <link>https://community.checkpoint.com/t5/Firewall-and-Security-Management/Persistent-fwFullyUtilizedDrops-on-26000-appliances-R81-20-Is/m-p/274763#M104680</link>
    <description>&lt;P class=""&gt;After extensive troubleshooting, we finally identified the root cause — and it turned out to be a monitoring configuration error on our end.&lt;/P&gt;&lt;P class=""&gt;We had assigned OID &lt;CODE class=""&gt;iso.3.6.1.4.1.2620.1.1.25.13.0&lt;/CODE&gt; (&lt;CODE class=""&gt;fwLoggedTotal&lt;/CODE&gt;) to our Zabbix item instead of &lt;CODE class=""&gt;iso.3.6.1.4.1.2620.1.1.25.26.0&lt;/CODE&gt; (&lt;CODE class=""&gt;fwFullyUtilizedDrops&lt;/CODE&gt;). We were never actually observing &lt;CODE class=""&gt;fwFullyUtilizedDrops&lt;/CODE&gt; at all — we were monitoring the total number of logged connections.&lt;/P&gt;&lt;P class=""&gt;In hindsight, the most obvious clue was right there from the beginning: the drops were completely invisible in &lt;CODE class=""&gt;zdebug drop&lt;/CODE&gt; and in SmartConsole logs. That inconsistency should have immediately pointed to a measurement problem rather than an actual performance issue — but none of us caught it.&lt;/P&gt;&lt;P class=""&gt;We want to thank Timothy Hall and Bob Zimmerman for their thorough and technically sound analysis of our CoreXL/SMT configuration. Even though it was based on a false premise, the findings regarding SND/Worker core overlap on our dual-socket setup are real and worth addressing independently.&lt;/P&gt;&lt;P class=""&gt;Lesson learned: always validate your data source before analyzing the data.&lt;/P&gt;&lt;P class=""&gt;Closing this thread as resolved.&lt;/P&gt;</description>
    <pubDate>Thu, 02 Apr 2026 13:35:09 GMT</pubDate>
    <dc:creator>CSSBE_Avenger</dc:creator>
    <dc:date>2026-04-02T13:35:09Z</dc:date>
    <item>
      <title>Persistent fwFullyUtilizedDrops on 26000 appliances (R81.20) – Is disabling SMT a valid next step?</title>
      <link>https://community.checkpoint.com/t5/Firewall-and-Security-Management/Persistent-fwFullyUtilizedDrops-on-26000-appliances-R81-20-Is/m-p/274184#M104451</link>
      <description>&lt;P&gt;Hello CheckMates,&lt;/P&gt;&lt;P&gt;We are troubleshooting a persistent issue with &lt;STRONG&gt;fwFullyUtilizedDrops&lt;/STRONG&gt; on a cluster of two &lt;STRONG&gt;Check Point 26000 appliances&lt;/STRONG&gt; running &lt;STRONG&gt;R81.20 (Take 120)&lt;/STRONG&gt; in Kernel Mode (&lt;STRONG&gt;KPPAK&lt;/STRONG&gt;).&lt;/P&gt;&lt;P&gt;Our environment handles approximately &lt;STRONG&gt;350,000 packets/sec&lt;/STRONG&gt; (as reported by CPView Overview) with about &lt;STRONG&gt;200,000 active sessions&lt;/STRONG&gt;. We are observing constant &lt;STRONG&gt;fwFullyUtilizedDrops&lt;/STRONG&gt; (around 2,000 to 2,500 drops/sec during peak hours), which represents roughly a 1% packet loss. These drops are invisible in &lt;CODE&gt;zdebug drop&lt;/CODE&gt; or&amp;nbsp;&lt;CODE&gt;zdebug + drop&lt;/CODE&gt;.&lt;/P&gt;&lt;P&gt;Here is a summary of our optimization efforts so far:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;What worked:&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;Disabling the QoS blade:&lt;/STRONG&gt; This resulted in a noticeable improvement, stabilizing the CPU load and halving the drops. However, the remaining 2,500/sec is still considered abnormally high.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;What did NOT work (No impact or negative impact on drops):&lt;/STRONG&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;Enabling SecureXL Accept Templates:&lt;/STRONG&gt; We optimized our policy to ensure templates are "Enabled", but the drop rate remained unchanged.&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Disabling all Threat Prevention blades:&lt;/STRONG&gt; We tested running &lt;CODE&gt;fw amw unload&lt;/CODE&gt; and &lt;CODE&gt;ips off&lt;/CODE&gt;. While CPU usage decreased, &lt;STRONG&gt;fwFullyUtilizedDrops remained exactly the same&lt;/STRONG&gt;, suggesting the bottleneck is not at the Worker processing level.&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Reducing Multi-Queue queues:&lt;/STRONG&gt; We reduced the number of queues by half on our busiest interfaces to test the correlation. The drop rate did &lt;STRONG&gt;not&lt;/STRONG&gt; increase, suggesting the issue isn't a lack of NIC queues.&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Total TCP/UDP bypass via fast_accel:&lt;/STRONG&gt; We attempted a total bypass for all TCP (6) and UDP (17) traffic. The hit count was unexpectedly low (~1500 pps) (monitoring with "fw ctl fast_accel show_table"), and surprisingly, the &lt;STRONG&gt;fwFullyUtilizedDrops actually increased&lt;/STRONG&gt; during the test.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;The SMT Hypothesis:&lt;/STRONG&gt; We are now considering &lt;STRONG&gt;disabling SMT (Hyperthreading)&lt;/STRONG&gt;. Our hypothesis is that at 350,000 pps, the SND instances are suffering from resource contention or cache thrashing on the physical cores, hindering their ability &lt;STRONG&gt;to empty&lt;/STRONG&gt; the buffers and inject packets into the CoreXL queues.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Questions:&lt;/STRONG&gt;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;In a high-PPS/KPPAK environment, has anyone seen a positive impact on &lt;CODE&gt;fwFullyUtilizedDrops&lt;/CODE&gt; by disabling SMT?&lt;/LI&gt;&lt;LI&gt;Since the SMT toggle is missing from &lt;CODE&gt;cpconfig&lt;/CODE&gt; in R81.20 (as per sk172304), is the &lt;CODE&gt;fwboot bootconf set_core_override&lt;/CODE&gt; command a safe alternative to a BIOS/TAC intervention for a production test?&lt;/LI&gt;&lt;LI&gt;Could the increase in drops during the &lt;CODE&gt;fast_accel&lt;/CODE&gt; test confirm that the SND is at its absolute physical limit for processing interruptions?&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;SPAN class=""&gt;We're giving the result of the s7pac command at the moment of this writing, in case it may be useful.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;Finally, we are planning an upgrade to &lt;/SPAN&gt;&lt;STRONG&gt;R82.10&lt;/STRONG&gt;&lt;SPAN class=""&gt; in the coming weeks to leverage the structural benefits of &lt;/SPAN&gt;&lt;STRONG&gt;Polling mode&lt;/STRONG&gt;&lt;SPAN class=""&gt;. However, we are proceeding with caution as our environment includes a critical Site-to-Site VPN with a partner where coordination for troubleshooting would be extremely difficult in the event of a regression.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Thank you for your time and insights.&lt;/P&gt;</description>
      <pubDate>Fri, 27 Mar 2026 16:44:02 GMT</pubDate>
      <guid>https://community.checkpoint.com/t5/Firewall-and-Security-Management/Persistent-fwFullyUtilizedDrops-on-26000-appliances-R81-20-Is/m-p/274184#M104451</guid>
      <dc:creator>CSSBE_Avenger</dc:creator>
      <dc:date>2026-03-27T16:44:02Z</dc:date>
    </item>
    <item>
      <title>Re: Persistent fwFullyUtilizedDrops on 26000 appliances (R81.20) – Is disabling SMT a valid next ste</title>
      <link>https://community.checkpoint.com/t5/Firewall-and-Security-Management/Persistent-fwFullyUtilizedDrops-on-26000-appliances-R81-20-Is/m-p/274194#M104455</link>
      <description>&lt;P&gt;Turning off SMT will probably fix the problem, but not for the reasons you think.&lt;/P&gt;
&lt;P&gt;Are you using Dynamic Split?&amp;nbsp; I don't think you are, because you have 10 real cores trying to execute both SND and Worker functions, if I'm understanding your architecture correctly due to how SMT divides cores.&amp;nbsp; 26000 uses a&amp;nbsp;&lt;SPAN data-sheets-root="1"&gt;2x Intel Xeon Gold 6254 CPU @ 3.10GHz (18C, 36T).&amp;nbsp; So 2 sockets, but your split appears to be set for only one socket.&amp;nbsp; With SMT, socket 1 is cores 0-35 (18 real &amp;amp; 18 siblings), socket 2 is cores 36-71 (18 real &amp;amp; 18 siblings).&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-sheets-root="1"&gt;Your SNDs are cores 0-4 and cores 36-40, which would make sense if there were a single socket, as these would be sibling cores.&amp;nbsp; But you have two sockets, so cores 0-4 are siblings with cores 18-22 (all workers), and cores 36-40 are siblings with cores 54-58 (all workers).&amp;nbsp; So you have 10 of your 36 real cores trying to perform both SND and worker functions.&amp;nbsp; &lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-sheets-root="1"&gt;It's easy to see how the CoreXL queues could overflow if the worker is constantly getting clobbered by SND functions. Since the SNDs run in the kernel for&amp;nbsp;KPPAK mode, the busier the SNDs get, the more the conflicting worker instances get kicked off the sibling core to make way for the kernel, which has ultimate scheduling priority over measly USFW worker processes.&amp;nbsp; This may also cause significant traffic between NUMA nodes, further impacting performance.&amp;nbsp; Doing fast_accel's would make this situation much worse on the 10 overlapping real cores, by making the conflicting SNDs even busier than they were before.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-sheets-root="1"&gt;Check my work here, please&amp;nbsp;&lt;a href="https://community.checkpoint.com/t5/user/viewprofilepage/user-id/27871"&gt;@Bob_Zimmerman&lt;/a&gt;.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 25 Mar 2026 21:15:03 GMT</pubDate>
      <guid>https://community.checkpoint.com/t5/Firewall-and-Security-Management/Persistent-fwFullyUtilizedDrops-on-26000-appliances-R81-20-Is/m-p/274194#M104455</guid>
      <dc:creator>Timothy_Hall</dc:creator>
      <dc:date>2026-03-25T21:15:03Z</dc:date>
    </item>
    <item>
      <title>Re: Persistent fwFullyUtilizedDrops on 26000 appliances (R81.20) – Is disabling SMT a valid next ste</title>
      <link>https://community.checkpoint.com/t5/Firewall-and-Security-Management/Persistent-fwFullyUtilizedDrops-on-26000-appliances-R81-20-Is/m-p/274198#M104457</link>
      <description>&lt;P&gt;Just checked one of my dual-socket boxes (a 16200, running R82 jumbo 60), and I see cores 0-11 on socket 0, 12-23 on socket 1, 24-35 on socket 0, and 36-47 on socket 1. Pretty sure the hyperthreads are 24-47. Still agreed, this suggests something is wrong with CoreXL and dynamic split.&lt;/P&gt;
&lt;P&gt;The output of 'cpstat os -f multi_cpu -o 1 -c 5' looks like the load is relatively well spread. A drop debug should show which instance dropped the traffic, or which instance the SND was trying to send the traffic to. Is it consistent, or do the drops implicate many instances?&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Edit&lt;/STRONG&gt;: Incidentally, the 16200 I checked is under light load, and has SNDs on cores 0, 1, 12, 13, 24, 25, 36, and 37. In other words, the first two cores of every set of cores. This pattern holds for all of my non-VSX firewalls with two sockets. On a dual-socket system running VSX, I see SNDs on 0, 1, 24, and 25. On an asymmetric single-socket system (a 9300, which uses an Intel i5-13400E), I see SNDs on 0 and 1 (which seems to be the hyperthread for 0), and none on the e-core complex (12-15).&lt;/P&gt;
&lt;P&gt;Are these 26000 units running VSX or MDPS?&lt;/P&gt;</description>
      <pubDate>Thu, 26 Mar 2026 18:36:14 GMT</pubDate>
      <guid>https://community.checkpoint.com/t5/Firewall-and-Security-Management/Persistent-fwFullyUtilizedDrops-on-26000-appliances-R81-20-Is/m-p/274198#M104457</guid>
      <dc:creator>Bob_Zimmerman</dc:creator>
      <dc:date>2026-03-26T18:36:14Z</dc:date>
    </item>
    <item>
      <title>Re: Persistent fwFullyUtilizedDrops on 26000 appliances (R81.20) – Is disabling SMT a valid next ste</title>
      <link>https://community.checkpoint.com/t5/Firewall-and-Security-Management/Persistent-fwFullyUtilizedDrops-on-26000-appliances-R81-20-Is/m-p/274273#M104478</link>
      <description>&lt;DIV class=""&gt;&lt;SPAN class=""&gt;Dynamic split (Dynamic Balancing) appears to be enabled, as the command &lt;/SPAN&gt;&lt;CODE class=""&gt;dynamic_balancing -p&lt;/CODE&gt;&lt;SPAN class=""&gt; reports the status as &lt;/SPAN&gt;&lt;STRONG&gt;'on'&lt;/STRONG&gt;&lt;SPAN class=""&gt;. However, there is clearly a configuration issue because our &lt;/SPAN&gt;&lt;CODE class=""&gt;$FWDIR/conf/dynamic_split.conf&lt;/CODE&gt;&lt;SPAN class=""&gt; file shows &lt;/SPAN&gt;&lt;STRONG&gt;AUTOMATION_MODE=0&lt;/STRONG&gt;&lt;SPAN class=""&gt;, which may indicate the mechanism is effectively in a static or 'frozen' state and unable to perform automatic adjustments (although not sure what AUTOMATION=0 really means)&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;&lt;a href="https://community.checkpoint.com/t5/user/viewprofilepage/user-id/27871"&gt;@Bob_Zimmerman&lt;/a&gt;&amp;nbsp;: To answer your question, our units are&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;not running VSX or MDPS&lt;/STRONG&gt;&lt;SPAN class=""&gt;; they are configured as standard Security Gateways.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;Regarding the drops, a significant part of the mystery is that these &lt;/SPAN&gt;&lt;STRONG&gt;fwFullyUtilizedDrops&lt;/STRONG&gt;&lt;SPAN class=""&gt; are not reflected when running &lt;/SPAN&gt;&lt;CODE class=""&gt;fw zdebug drop&lt;/CODE&gt;&lt;SPAN class=""&gt; or even &lt;/SPAN&gt;&lt;CODE class=""&gt;fw ctl zdebug + drop&lt;/CODE&gt;&lt;SPAN class=""&gt;. Although it is understood that these drops typically occur when the CoreXL input queue is full, they remain invisible in the debug output despite SNMP reporting a consistent rate of &lt;/SPAN&gt;&lt;STRONG&gt;2,000 to 2,500 drops/sec&lt;/STRONG&gt;&lt;SPAN class=""&gt; during peak hours&lt;/SPAN&gt;&lt;SPAN class=""&gt;. At this moment, we haven't found a reliable way to capture these events or pinpoint exactly where in the datapath the discard is happening.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;SPAN class=""&gt;We have opened a case with the TAC to disable SMT. Since this modification requires BIOS access—and the password is held exclusively by Check Point support—we are unable to perform this change ourselves. We will post an update once SMT has been disabled to let you know if the drops have subsided. Thanks again for your valuable insights!&lt;/SPAN&gt;&lt;/DIV&gt;</description>
      <pubDate>Thu, 26 Mar 2026 18:30:48 GMT</pubDate>
      <guid>https://community.checkpoint.com/t5/Firewall-and-Security-Management/Persistent-fwFullyUtilizedDrops-on-26000-appliances-R81-20-Is/m-p/274273#M104478</guid>
      <dc:creator>CSSBE_Avenger</dc:creator>
      <dc:date>2026-03-26T18:30:48Z</dc:date>
    </item>
    <item>
      <title>Re: Persistent fwFullyUtilizedDrops on 26000 appliances (R81.20) – Is disabling SMT a valid next ste</title>
      <link>https://community.checkpoint.com/t5/Firewall-and-Security-Management/Persistent-fwFullyUtilizedDrops-on-26000-appliances-R81-20-Is/m-p/274763#M104680</link>
      <description>&lt;P class=""&gt;After extensive troubleshooting, we finally identified the root cause — and it turned out to be a monitoring configuration error on our end.&lt;/P&gt;&lt;P class=""&gt;We had assigned OID &lt;CODE class=""&gt;iso.3.6.1.4.1.2620.1.1.25.13.0&lt;/CODE&gt; (&lt;CODE class=""&gt;fwLoggedTotal&lt;/CODE&gt;) to our Zabbix item instead of &lt;CODE class=""&gt;iso.3.6.1.4.1.2620.1.1.25.26.0&lt;/CODE&gt; (&lt;CODE class=""&gt;fwFullyUtilizedDrops&lt;/CODE&gt;). We were never actually observing &lt;CODE class=""&gt;fwFullyUtilizedDrops&lt;/CODE&gt; at all — we were monitoring the total number of logged connections.&lt;/P&gt;&lt;P class=""&gt;In hindsight, the most obvious clue was right there from the beginning: the drops were completely invisible in &lt;CODE class=""&gt;zdebug drop&lt;/CODE&gt; and in SmartConsole logs. That inconsistency should have immediately pointed to a measurement problem rather than an actual performance issue — but none of us caught it.&lt;/P&gt;&lt;P class=""&gt;We want to thank Timothy Hall and Bob Zimmerman for their thorough and technically sound analysis of our CoreXL/SMT configuration. Even though it was based on a false premise, the findings regarding SND/Worker core overlap on our dual-socket setup are real and worth addressing independently.&lt;/P&gt;&lt;P class=""&gt;Lesson learned: always validate your data source before analyzing the data.&lt;/P&gt;&lt;P class=""&gt;Closing this thread as resolved.&lt;/P&gt;</description>
      <pubDate>Thu, 02 Apr 2026 13:35:09 GMT</pubDate>
      <guid>https://community.checkpoint.com/t5/Firewall-and-Security-Management/Persistent-fwFullyUtilizedDrops-on-26000-appliances-R81-20-Is/m-p/274763#M104680</guid>
      <dc:creator>CSSBE_Avenger</dc:creator>
      <dc:date>2026-04-02T13:35:09Z</dc:date>
    </item>
  </channel>
</rss>

