AnsweredAssumed Answered

Interface Instability Causing Cluster Failover

Question asked by Daniel Zenczak on Feb 4, 2019
Latest reply on Feb 6, 2019 by Corporacion America Argentina

First time caller.

We are running a clustered pair of HA 13000 gateways on R77.30s. They are managed by an R80.10 server. Probably since March of 2018 we started seeing the gateways fail over due to policy pushes. We could actually force the interfaces to fail, by doing a policy push. This caused the CPU associated with the worker to hit >100%. The CPU would have the same affinity as an interface, and would crash the interface. Sometimes this would happen on the standby, sometimes this would happen on the active member. To mitigate the issue in the mean time, we would do policy pushes during off work hours. No load on the firewall. We would still see failures. About October of 2018, we started to see this more frequently and started to work more with checkpoint technicians. They have suggested a series of fixes. We have implemented a fex of the suggestions by the technicians, dynamic dispatcher, edit freeze state, CPU stability hotfix (can be found here https://community.checkpoint.com/message/28542-clusterxl-improved-stability-hotfix). None of them have seemed to address the issue. After installing the Stability hotfix, we stopped seeing the failovers during policy pushes. But now, it fails over randomly. At this point, even our sales engineer is saying "Post on Checkmates" to see if anyone else is having these issues.

 

I am open to suggestions, questions, queries and answers. Here is a high level list of the suggestions by the technician.

  1. CPU stability hotfix
    1. Implemented Saturday January 26, 2019
  2. Dynamic Dispatcher
    1. Implemented November 29, 2018
  3. Edit freeze state
    1. Implemented Thursday January 31, 2019
  4. Increase CCP timers
    1. Implementation TBD
  5. Keep all connections during policy push
  6. Increase Rx-ringsize
    1. Implementation TBD
  7. Rulebase optimization
    1. Implementation TBD
  8. IPS protections optimization
    1. Implementation TBD
  9. Further optimizations via SK92348
    1. Implementation TBD

Outcomes