<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Unexpected cluster failover, CCP has been down for 1 second in Firewall and Security Management</title>
    <link>https://community.checkpoint.com/t5/Firewall-and-Security-Management/Unexpected-cluster-failover-CCP-has-been-down-for-1-second/m-p/101995#M7997</link>
    <description>&lt;P&gt;Hi Timothy,&lt;/P&gt;&lt;P&gt;Yes, we found out that our CP2 (standby cluster member) has high confd cpu usage. Our NTP config looks OK, clocks are synchronized. We just have restarted our standby gateway, and after the restart, everything looks normal.&lt;/P&gt;</description>
    <pubDate>Fri, 13 Nov 2020 15:32:32 GMT</pubDate>
    <dc:creator>MladenAntesevic</dc:creator>
    <dc:date>2020-11-13T15:32:32Z</dc:date>
    <item>
      <title>Unexpected cluster failover, CCP has been down for 1 second</title>
      <link>https://community.checkpoint.com/t5/Firewall-and-Security-Management/Unexpected-cluster-failover-CCP-has-been-down-for-1-second/m-p/101936#M7992</link>
      <description>&lt;P&gt;We have two 5800 gateways (naming them CP1 and CP2) in a High Availability cluster. Current version is R80.20. At November 11 at 10:46:25 we experienced an unexpected failover from master CP1 to stand-by cluster member CP2. According to the cluster log the failover reason is cluster interface eth1-02.307 CCP down at 10:46:25 so our CP1 cluster status went DOWN. One second later our cluster interface eth1-02.307 has recovered so CP1 cluster status went from DOWN - - &amp;gt; STANDBY.&lt;/P&gt;&lt;P&gt;This cluster interface flapping on a CP1 gateway occurred several times during the day, but because CP2 was now the master, we have not experienced cluster failovers any more.&lt;/P&gt;&lt;P&gt;CP1 and CP2 are interconnected over two Cisco 5600 Nexus switches (and also for the Sync there is a direct bonded connection between two cluster members).&amp;nbsp; At the first hand, we suspected that something is wrong with our Nexus switches, we have checked everything in regard to switch physical interface status, spanning tree events on our 307 VLAN, but we have not found anything related to the cluster failover. The Nexus switches seem to be OK, nothing has happen in our switching network.&lt;/P&gt;&lt;P&gt;Here is the output from our SmartConsole log section where we filtered type:Control, so you can see cluster failover log:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="MladenAntesevic_0-1605220175633.png" style="width: 400px;"&gt;&lt;img src="https://community.checkpoint.com/t5/image/serverpage/image-id/8908i1C17D7A1E15B05EA/image-size/medium?v=v2&amp;amp;px=400" role="button" title="MladenAntesevic_0-1605220175633.png" alt="MladenAntesevic_0-1605220175633.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We also have noticed a lot of uc_log_suppression_set_entry logs in CP1 /var/log/messages at the time of failover, also later during the day:&lt;/P&gt;&lt;P&gt;Nov 11 10:46:25 2020 CP1 kernel: [fw4_2];[10.10.57.80:61085 -&amp;gt; x.x.x.x:443] [ERROR]: uc_log_suppression_set_entry: Failed storing log data in log suppression table!&lt;/P&gt;&lt;P&gt;Nov 11 10:46:25 2020 CP1 kernel: [fw4_4];[10.10.0.254:57642 -&amp;gt; x.x.x.x:443] [ERROR]: uc_log_suppression_set_entry: Failed storing log data in log suppression table!&lt;/P&gt;&lt;P&gt;Nov 11 10:46:25 2020 CP1 kernel: [fw4_1];[10.10.57.80:61087 -&amp;gt; x.x.x.x:443] [ERROR]: uc_log_suppression_set_entry: Failed storing log data in log suppression table!&lt;/P&gt;&lt;P&gt;Nov 11 10:46:25 2020 CP1 kernel: [fw4_1];[10.10.57.80:61086 -&amp;gt; x.x.x.x:443] [ERROR]: uc_log_suppression_set_entry: Failed storing log data in log suppression table!&lt;/P&gt;&lt;P&gt;Nov 11 10:46:25 2020 CP1 kernel: [fw4_2];[10.10.57.80:61088 -&amp;gt; x.x.x.x:443] [ERROR]: uc_log_suppression_set_entry: Failed storing log data in log suppression table!&lt;/P&gt;&lt;P&gt;Nov 11 10:46:25 2020 CP1 kernel: [fw4_1];CLUS-110305-1: State change: ACTIVE -&amp;gt; ACTIVE(!) | Reason: Interface eth1-02.307 is down (Cluster Control Protocol packets are not received)&lt;/P&gt;&lt;P&gt;Nov 11 10:46:25 2020 CP1 kernel: [fw4_1];CLUS-110305-1: State change: ACTIVE! -&amp;gt; DOWN | Reason: Interface eth1-02.307 is down (Cluster Control Protocol packets are not received)&lt;/P&gt;&lt;P&gt;Nov 11 10:46:25 2020 CP1 kernel: [fw4_1];CLUS-214704-1: Remote member 2 (state STANDBY -&amp;gt; ACTIVE) | Reason: No other ACTIVE members have been found in the cluster&lt;/P&gt;&lt;P&gt;Nov 11 10:46:26 2020 CP1 kernel: [fw4_1];CLUS-114802-1: State change: DOWN -&amp;gt; STANDBY | Reason: There is already an ACTIVE member in the cluster (member 2)&lt;/P&gt;&lt;P&gt;Nov 11 10:46:27 2020 CP1 kernel: [fw4_1];CLUS-100102-1: Failover member 1 -&amp;gt; member 2 | Reason: Interface eth1-02.307 is down (Cluster Control Protocol packets are not received)&lt;/P&gt;&lt;P&gt;Nov 11 10:46:27 2020 CP1 monitord[12740]: Time shift detected !!!&lt;/P&gt;&lt;P&gt;Nov 11 10:46:36 2020 CP1 last message repeated 3 times&lt;/P&gt;&lt;P&gt;Nov 11 10:46:37 2020 CP1 kernel: [fw4_1];CLUS-110300-1: State change: STANDBY -&amp;gt; DOWN | Reason: Interface eth1-02.307 is down (Cluster Control Protocol packets are not received)&lt;/P&gt;&lt;P&gt;Nov 11 10:46:37 2020 CP1 kernel: [fw4_1];CLUS-114802-1: State change: DOWN -&amp;gt; STANDBY | Reason: There is already an ACTIVE member in the cluster (member 2)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Later, when CP2 become a master, we have seen similar messages at CP2, also we noticed by checking from cpview history that our CPU has been at significantly higer usage than regular.&lt;/P&gt;&lt;P&gt;Our R80.20 cluster is using unicast CCP messages as you can see here:&lt;/P&gt;&lt;P&gt;[Expert@CP1:0]# cphaprob -a if&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;CCP mode: Automatic&lt;/P&gt;&lt;P&gt;Required interfaces: 5&lt;/P&gt;&lt;P&gt;Required secured interfaces: 1&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;eth1-01&amp;nbsp;&amp;nbsp;&amp;nbsp; UP&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; non sync(non secured), unicast&lt;/P&gt;&lt;P&gt;eth1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; UP&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; non sync(non secured), unicast&lt;/P&gt;&lt;P&gt;eth4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Non-Monitored&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; non sync(non secured)&lt;/P&gt;&lt;P&gt;Mgmt&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Non-Monitored&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; non sync(non secured)&lt;/P&gt;&lt;P&gt;bond1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; UP&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;sync(secured), unicast, bond Load Sharing&lt;/P&gt;&lt;P&gt;eth1-02&amp;nbsp;&amp;nbsp;&amp;nbsp; UP&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; non sync(non secured), unicast&amp;nbsp; (eth1-02.7)&lt;/P&gt;&lt;P&gt;eth1-02&amp;nbsp;&amp;nbsp;&amp;nbsp; UP&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; non sync(non secured), unicast&amp;nbsp; (eth1-02.307)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What could be a reason for this unexpected failover, seems that our network is fine and we suspect that route cause for this failover is somewhere on our gateways. Maybe one important fact, we also had a policy install at the same time cluster failover has happened.&lt;/P&gt;</description>
      <pubDate>Thu, 12 Nov 2020 22:30:36 GMT</pubDate>
      <guid>https://community.checkpoint.com/t5/Firewall-and-Security-Management/Unexpected-cluster-failover-CCP-has-been-down-for-1-second/m-p/101936#M7992</guid>
      <dc:creator>MladenAntesevic</dc:creator>
      <dc:date>2020-11-12T22:30:36Z</dc:date>
    </item>
    <item>
      <title>Re: Unexpected cluster failover, CCP has been down for 1 second</title>
      <link>https://community.checkpoint.com/t5/Firewall-and-Security-Management/Unexpected-cluster-failover-CCP-has-been-down-for-1-second/m-p/101990#M7996</link>
      <description>&lt;P&gt;The excessive CPU load is probably caused by monitord, due to the known issue here:&lt;/P&gt;
&lt;P&gt;&lt;A class="cp_link sc_ellipsis" href="https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&amp;amp;solutionid=sk102988&amp;amp;partition=Advanced&amp;amp;product=Security" target="_blank" rel="noopener"&gt;sk102988: 'monitord' and 'confd' processes consume 100% CPU&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;However this is supposed to be fixed in R80.20.&amp;nbsp; Is there any chance you are actually experiencing a clock shift?&amp;nbsp; This will cause monitord to recalculate its database which eats up CPU.&amp;nbsp; Is NTP configured and verified to be working properly?&lt;/P&gt;</description>
      <pubDate>Fri, 13 Nov 2020 14:59:20 GMT</pubDate>
      <guid>https://community.checkpoint.com/t5/Firewall-and-Security-Management/Unexpected-cluster-failover-CCP-has-been-down-for-1-second/m-p/101990#M7996</guid>
      <dc:creator>Timothy_Hall</dc:creator>
      <dc:date>2020-11-13T14:59:20Z</dc:date>
    </item>
    <item>
      <title>Re: Unexpected cluster failover, CCP has been down for 1 second</title>
      <link>https://community.checkpoint.com/t5/Firewall-and-Security-Management/Unexpected-cluster-failover-CCP-has-been-down-for-1-second/m-p/101995#M7997</link>
      <description>&lt;P&gt;Hi Timothy,&lt;/P&gt;&lt;P&gt;Yes, we found out that our CP2 (standby cluster member) has high confd cpu usage. Our NTP config looks OK, clocks are synchronized. We just have restarted our standby gateway, and after the restart, everything looks normal.&lt;/P&gt;</description>
      <pubDate>Fri, 13 Nov 2020 15:32:32 GMT</pubDate>
      <guid>https://community.checkpoint.com/t5/Firewall-and-Security-Management/Unexpected-cluster-failover-CCP-has-been-down-for-1-second/m-p/101995#M7997</guid>
      <dc:creator>MladenAntesevic</dc:creator>
      <dc:date>2020-11-13T15:32:32Z</dc:date>
    </item>
  </channel>
</rss>

