<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic ClusterXL Not Automatically Failing Over in Firewall and Security Management</title>
    <link>https://community.checkpoint.com/t5/Firewall-and-Security-Management/ClusterXL-Not-Automatically-Failing-Over/m-p/54035#M4095</link>
    <description>&lt;P&gt;&lt;STRONG&gt;Appliances:&lt;/STRONG&gt; (2) 5400 16GB RAM Gaia R80.10&lt;/P&gt;&lt;P&gt;I have been experiencing this issue for over 18 months and haven't made progress with TAC. I am currently running R80.10 and was experiencing this issue in R77.30 as well (my upgrade to R80.10 was an attempt to resolve this issue).&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Description:&lt;/STRONG&gt; When physical memory approaches 16GB of consumption, traffic begins to drop. Running &lt;STRONG&gt;'fw ctl zdebug drop'&lt;/STRONG&gt; reveals a lot of &lt;STRONG&gt;'Reason: PSL Drop: TCP segment out of maximum allowed sequence.'&lt;/STRONG&gt; errors. If I'm lucky enough to catch things at this point, I can manually fail over to the standby node and the issue is immediately resolved. If I don't catch things at this stage, the primary node will eventually stop passing traffic and does not automatically fail over to the standby node. I cannot get in or out of my network and I cannot remotely manage the gateway without using the lights-out port (I've added lights-out because of this issue). This cluster is in my HQ office and all 26 remote locations are in a VPN community with this cluster (remote locations are 1450 appliances running R77.20.86). When this issue occurs, everyone in the company is impacted.&lt;/P&gt;&lt;P&gt;QoS definitely has an impact on this issue. Memory usage climbs by 1GB/day with QoS enabled. With QoS disabled, memory usage climbs by about 100MB/day. So with QoS disabled, the issue occurs much less frequently. With QoS enabled, I've got about a week before this issue occurs. In the past, when I manually fail over, I will reboot the non-active node. I tried something different last week. I failed over to the standby (cpstop &amp;amp;&amp;amp; cpstart) and when the primary was showing 'standby' I failed back over. At some point 2 days later after business hours, the primary stopped passing traffic and didn't fail over.&lt;/P&gt;&lt;P&gt;I find it hard to believe that I'm the only one experiencing this issue. If anyone has any ideas, I'd greatly appreciate the help.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 21 May 2019 14:04:05 GMT</pubDate>
    <dc:creator>John_Pinegar</dc:creator>
    <dc:date>2019-05-21T14:04:05Z</dc:date>
    <item>
      <title>ClusterXL Not Automatically Failing Over</title>
      <link>https://community.checkpoint.com/t5/Firewall-and-Security-Management/ClusterXL-Not-Automatically-Failing-Over/m-p/54035#M4095</link>
      <description>&lt;P&gt;&lt;STRONG&gt;Appliances:&lt;/STRONG&gt; (2) 5400 16GB RAM Gaia R80.10&lt;/P&gt;&lt;P&gt;I have been experiencing this issue for over 18 months and haven't made progress with TAC. I am currently running R80.10 and was experiencing this issue in R77.30 as well (my upgrade to R80.10 was an attempt to resolve this issue).&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Description:&lt;/STRONG&gt; When physical memory approaches 16GB of consumption, traffic begins to drop. Running &lt;STRONG&gt;'fw ctl zdebug drop'&lt;/STRONG&gt; reveals a lot of &lt;STRONG&gt;'Reason: PSL Drop: TCP segment out of maximum allowed sequence.'&lt;/STRONG&gt; errors. If I'm lucky enough to catch things at this point, I can manually fail over to the standby node and the issue is immediately resolved. If I don't catch things at this stage, the primary node will eventually stop passing traffic and does not automatically fail over to the standby node. I cannot get in or out of my network and I cannot remotely manage the gateway without using the lights-out port (I've added lights-out because of this issue). This cluster is in my HQ office and all 26 remote locations are in a VPN community with this cluster (remote locations are 1450 appliances running R77.20.86). When this issue occurs, everyone in the company is impacted.&lt;/P&gt;&lt;P&gt;QoS definitely has an impact on this issue. Memory usage climbs by 1GB/day with QoS enabled. With QoS disabled, memory usage climbs by about 100MB/day. So with QoS disabled, the issue occurs much less frequently. With QoS enabled, I've got about a week before this issue occurs. In the past, when I manually fail over, I will reboot the non-active node. I tried something different last week. I failed over to the standby (cpstop &amp;amp;&amp;amp; cpstart) and when the primary was showing 'standby' I failed back over. At some point 2 days later after business hours, the primary stopped passing traffic and didn't fail over.&lt;/P&gt;&lt;P&gt;I find it hard to believe that I'm the only one experiencing this issue. If anyone has any ideas, I'd greatly appreciate the help.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 21 May 2019 14:04:05 GMT</pubDate>
      <guid>https://community.checkpoint.com/t5/Firewall-and-Security-Management/ClusterXL-Not-Automatically-Failing-Over/m-p/54035#M4095</guid>
      <dc:creator>John_Pinegar</dc:creator>
      <dc:date>2019-05-21T14:04:05Z</dc:date>
    </item>
    <item>
      <title>Re: ClusterXL Not Automatically Failing Over</title>
      <link>https://community.checkpoint.com/t5/Firewall-and-Security-Management/ClusterXL-Not-Automatically-Failing-Over/m-p/54163#M4107</link>
      <description>&lt;P&gt;I suspect the memory leaks are the real issue here. Has any work been done on the TAC case(s) around that?&lt;/P&gt;</description>
      <pubDate>Thu, 23 May 2019 01:15:51 GMT</pubDate>
      <guid>https://community.checkpoint.com/t5/Firewall-and-Security-Management/ClusterXL-Not-Automatically-Failing-Over/m-p/54163#M4107</guid>
      <dc:creator>PhoneBoy</dc:creator>
      <dc:date>2019-05-23T01:15:51Z</dc:date>
    </item>
    <item>
      <title>Re: ClusterXL Not Automatically Failing Over</title>
      <link>https://community.checkpoint.com/t5/Firewall-and-Security-Management/ClusterXL-Not-Automatically-Failing-Over/m-p/54233#M4111</link>
      <description>&lt;P&gt;We have performed memory leak tests at TAC's request and not found anything definitive.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 23 May 2019 13:04:10 GMT</pubDate>
      <guid>https://community.checkpoint.com/t5/Firewall-and-Security-Management/ClusterXL-Not-Automatically-Failing-Over/m-p/54233#M4111</guid>
      <dc:creator>John_Pinegar</dc:creator>
      <dc:date>2019-05-23T13:04:10Z</dc:date>
    </item>
  </channel>
</rss>

