<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Interface Instability Causing Cluster Failover in Firewall and Security Management</title>
    <link>https://community.checkpoint.com/t5/Firewall-and-Security-Management/Interface-Instability-Causing-Cluster-Failover/m-p/30589#M2428</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;These "fixes" sound more like a workaround. Proper way would be to find root cause and fix it. Are these security gateways virtual machines or servers ? Also, how are they connected to rest of the network? Is it multicast cluster ?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Tue, 05 Feb 2019 06:23:55 GMT</pubDate>
    <dc:creator>HristoGrigorov</dc:creator>
    <dc:date>2019-02-05T06:23:55Z</dc:date>
    <item>
      <title>Interface Instability Causing Cluster Failover</title>
      <link>https://community.checkpoint.com/t5/Firewall-and-Security-Management/Interface-Instability-Causing-Cluster-Failover/m-p/30588#M2427</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;First time caller. &lt;/P&gt;&lt;P&gt;We are running a clustered pair of HA 13000 gateways on R77.30s. They are managed by an R80.10 server. Probably since March of 2018 we started seeing the gateways fail over due to policy pushes. We could actually force the interfaces to fail, by doing a policy push. This caused the CPU associated with the worker to hit &amp;gt;100%. The CPU would have the same affinity as an interface, and would crash the interface. Sometimes this would happen on the standby, sometimes this would happen on the active member. To mitigate the issue in the mean time, we would do policy pushes during off work hours. No load on the firewall. We would still see failures. About October of 2018, we started to see this more frequently and started to work more with checkpoint technicians. They have suggested a series of fixes. We have implemented a fex of the suggestions by the technicians, dynamic dispatcher, edit freeze state, CPU stability hotfix (can be found here &lt;A _jive_internal="true" href="https://community.checkpoint.com/message/28542-clusterxl-improved-stability-hotfix"&gt;https://community.checkpoint.com/message/28542-clusterxl-improved-stability-hotfix&lt;/A&gt;). None of them have seemed to address the issue. After installing the Stability hotfix, we stopped seeing the failovers during policy pushes. But now, it fails over randomly. At this point, even our sales engineer is saying "Post on Checkmates" to see if anyone else is having these issues.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am open to suggestions, questions, queries and answers. Here is a high level list of the suggestions by the technician.&lt;/P&gt;&lt;OL style="margin-left: .375in; direction: ltr; unicode-bidi: embed; margin-top: 0in; margin-bottom: 0in; font-family: Arial; font-size: 11.0pt; font-weight: normal; font-style: normal;" type="1"&gt;&lt;LI style="margin-top: 0; margin-bottom: 0; vertical-align: middle;"&gt;&lt;SPAN style="font-family: Arial; font-size: 11.0pt; font-weight: normal; font-style: normal;"&gt;CPU stability hotfix&lt;/SPAN&gt;&lt;OL style="margin-left: .375in; direction: ltr; unicode-bidi: embed; margin-top: 0in; margin-bottom: 0in; font-family: Arial; font-size: 11.0pt; font-weight: normal; font-style: normal;" type="1"&gt;&lt;LI style="margin-top: 0; margin-bottom: 0; vertical-align: middle;"&gt;&lt;SPAN style="font-family: Arial; font-size: 11.0pt; font-weight: normal; font-style: normal;"&gt;Implemented Saturday January 26, 2019&lt;/SPAN&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;/LI&gt;&lt;LI style="margin-top: 0; margin-bottom: 0; vertical-align: middle;"&gt;&lt;SPAN style="font-family: Arial; font-size: 11.0pt;"&gt;Dynamic Dispatcher&lt;/SPAN&gt;&lt;OL style="margin-left: .375in; direction: ltr; unicode-bidi: embed; margin-top: 0in; margin-bottom: 0in; font-family: Arial; font-size: 11.0pt; font-weight: normal; font-style: normal;" type="1"&gt;&lt;LI style="margin-top: 0; margin-bottom: 0; vertical-align: middle;"&gt;&lt;SPAN style="font-family: Arial; font-size: 11.0pt; font-weight: normal; font-style: normal;"&gt;Implemented November 29, 2018&lt;/SPAN&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;/LI&gt;&lt;LI style="margin-top: 0; margin-bottom: 0; vertical-align: middle;"&gt;&lt;SPAN style="font-family: Arial; font-size: 11.0pt;"&gt;Edit freeze state&lt;/SPAN&gt;&lt;OL style="margin-left: .375in; direction: ltr; unicode-bidi: embed; margin-top: 0in; margin-bottom: 0in; font-family: Arial; font-size: 11.0pt; font-weight: normal; font-style: normal;" type="1"&gt;&lt;LI style="margin-top: 0; margin-bottom: 0; vertical-align: middle;"&gt;&lt;SPAN style="font-family: Arial; font-size: 11.0pt; font-weight: normal; font-style: normal;"&gt;Implemented Thursday January 31, 2019&lt;/SPAN&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;/LI&gt;&lt;LI style="margin-top: 0; margin-bottom: 0; vertical-align: middle;"&gt;&lt;SPAN style="font-family: Arial; font-size: 11.0pt;"&gt;Increase CCP timers&lt;/SPAN&gt;&lt;OL style="margin-left: .375in; direction: ltr; unicode-bidi: embed; margin-top: 0in; margin-bottom: 0in; font-family: Arial; font-size: 11.0pt; font-weight: normal; font-style: normal;" type="1"&gt;&lt;LI style="margin-top: 0; margin-bottom: 0; vertical-align: middle;"&gt;&lt;SPAN style="font-family: Arial; font-size: 11.0pt; font-weight: normal; font-style: normal;"&gt;Implementation TBD&lt;/SPAN&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;/LI&gt;&lt;LI style="margin-top: 0; margin-bottom: 0; vertical-align: middle;"&gt;&lt;SPAN style="font-family: Arial; font-size: 11.0pt;"&gt;Keep all connections during policy push&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI style="margin-top: 0; margin-bottom: 0; vertical-align: middle;"&gt;&lt;SPAN style="font-family: Arial; font-size: 11.0pt;"&gt;Increase Rx-ringsize&lt;/SPAN&gt;&lt;OL style="margin-left: .375in; direction: ltr; unicode-bidi: embed; margin-top: 0in; margin-bottom: 0in; font-family: Arial; font-size: 11.0pt; font-weight: normal; font-style: normal;" type="1"&gt;&lt;LI style="margin-top: 0; margin-bottom: 0; vertical-align: middle;"&gt;&lt;SPAN style="font-family: Arial; font-size: 11.0pt; font-weight: normal; font-style: normal;"&gt;Implementation TBD&lt;/SPAN&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;/LI&gt;&lt;LI style="margin-top: 0; margin-bottom: 0; vertical-align: middle;"&gt;&lt;SPAN style="font-family: Arial; font-size: 11.0pt;"&gt;Rulebase optimization&lt;/SPAN&gt;&lt;OL style="margin-left: .375in; direction: ltr; unicode-bidi: embed; margin-top: 0in; margin-bottom: 0in; font-family: Arial; font-size: 11.0pt; font-weight: normal; font-style: normal;" type="1"&gt;&lt;LI style="margin-top: 0; margin-bottom: 0; vertical-align: middle;"&gt;&lt;SPAN style="font-family: Arial; font-size: 11.0pt; font-weight: normal; font-style: normal;"&gt;Implementation TBD&lt;/SPAN&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;/LI&gt;&lt;LI style="margin-top: 0; margin-bottom: 0; vertical-align: middle;"&gt;&lt;SPAN style="font-family: Arial; font-size: 11.0pt;"&gt;IPS protections optimization&lt;/SPAN&gt;&lt;OL style="margin-left: .375in; direction: ltr; unicode-bidi: embed; margin-top: 0in; margin-bottom: 0in; font-family: Arial; font-size: 11.0pt; font-weight: normal; font-style: normal;" type="1"&gt;&lt;LI style="margin-top: 0; margin-bottom: 0; vertical-align: middle;"&gt;&lt;SPAN style="font-family: Arial; font-size: 11.0pt; font-weight: normal; font-style: normal;"&gt;Implementation TBD&lt;/SPAN&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;/LI&gt;&lt;LI style="margin-top: 0; margin-bottom: 0; vertical-align: middle;"&gt;&lt;SPAN style="font-family: Arial; font-size: 11.0pt;"&gt;Further optimizations via SK92348&lt;/SPAN&gt;&lt;OL style="margin-left: .375in; direction: ltr; unicode-bidi: embed; margin-top: 0in; margin-bottom: 0in; font-family: Arial; font-size: 11.0pt; font-weight: normal; font-style: normal;" type="1"&gt;&lt;LI style="margin-top: 0; margin-bottom: 0; vertical-align: middle;"&gt;&lt;SPAN style="font-family: Arial; font-size: 11.0pt; font-weight: normal; font-style: normal;"&gt;Implementation TBD&lt;/SPAN&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 04 Feb 2019 22:34:41 GMT</pubDate>
      <guid>https://community.checkpoint.com/t5/Firewall-and-Security-Management/Interface-Instability-Causing-Cluster-Failover/m-p/30588#M2427</guid>
      <dc:creator>Daniel_Zenczak</dc:creator>
      <dc:date>2019-02-04T22:34:41Z</dc:date>
    </item>
    <item>
      <title>Re: Interface Instability Causing Cluster Failover</title>
      <link>https://community.checkpoint.com/t5/Firewall-and-Security-Management/Interface-Instability-Causing-Cluster-Failover/m-p/30589#M2428</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;These "fixes" sound more like a workaround. Proper way would be to find root cause and fix it. Are these security gateways virtual machines or servers ? Also, how are they connected to rest of the network? Is it multicast cluster ?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 05 Feb 2019 06:23:55 GMT</pubDate>
      <guid>https://community.checkpoint.com/t5/Firewall-and-Security-Management/Interface-Instability-Causing-Cluster-Failover/m-p/30589#M2428</guid>
      <dc:creator>HristoGrigorov</dc:creator>
      <dc:date>2019-02-05T06:23:55Z</dc:date>
    </item>
    <item>
      <title>Re: Interface Instability Causing Cluster Failover</title>
      <link>https://community.checkpoint.com/t5/Firewall-and-Security-Management/Interface-Instability-Causing-Cluster-Failover/m-p/30590#M2429</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Dears&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks for sharing your experience.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I would like to share mine because we are experiencing the same problems in a very similar scenario, the TAC doesn’t find the root cause of the problem.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We have an Open Server Firewall Cluster (HP ProLiant DL380 G8) with Cluster XL version R77.30 with a management R80.20 (Virtual) with a high resources.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here is a brief summary of the problem history:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;October 2018&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We started having problems installing policies. The message error was "Operation incomplete due to timeout". The active node was completely freeze (we verified by ILO, the server doesn’t allow log in) and never do the failover to the other node.&lt;/P&gt;&lt;P&gt;We had to manually force the standby node to become active and in the freeze node the only solution was a force reboot and then reinstall policies.&lt;/P&gt;&lt;P&gt;&amp;nbsp;These Firewalls had the R77.30 version installed with the Take 317, also had the Dynamic Dispatcher command and the cpus normally between 40 and 60% of its load.&lt;/P&gt;&lt;P&gt;This problem had been presented a few times, but at the end of the year it began to be more constant. One thing that we can saw was when apply policy and if you are doing ping to some host through to this firewall the latency was very higher with a percent of loss packets.&lt;/P&gt;&lt;P&gt;The TAC suggest us to installed the Take 344 (Ongoing) because included the Stability Cluster XL feature.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;January 2019&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;After a week of having installed Take 344 and not having problems, we advise a slight improvement in performance when installing policies but AGAIN the same freeze problem.&lt;/P&gt;&lt;P&gt;The only difference in this point was the failover works correctly, which means there was no loss of services.&lt;/P&gt;&lt;P&gt;Finally, the TAC suggest us follow the SK 31511 because they don’t saw "core dumps" or logs according to the "freeze" problem.&lt;/P&gt;&lt;P&gt;The case in the TAC has been open for 4 months, it has gone through several engineers and escalated to high positions and nobody has told us what is the root cause of the problem. Always the suggestion is to update the last jumbo or the 80.10 version with a PC 24hours connected through serial cable with the kdb option enable until the next freeze.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt; &lt;SPAN style="color: #212121; background: white;"&gt;Please, if someone has the solution, we will be very grateful&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #212121; background: white;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #212121; background: white;"&gt;Regards&lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 06 Feb 2019 18:00:56 GMT</pubDate>
      <guid>https://community.checkpoint.com/t5/Firewall-and-Security-Management/Interface-Instability-Causing-Cluster-Failover/m-p/30590#M2429</guid>
      <dc:creator>Corporacion_Ame</dc:creator>
      <dc:date>2019-02-06T18:00:56Z</dc:date>
    </item>
  </channel>
</rss>

