The output of the cphaprob -l list command displays a list of tests (technical term is "pnotes") the cluster member runs against itself to determine if it is what I call "impaired" or not. The goal here is to detect partial failures and report them to the other members of the cluster via CCP, and whichever cluster member is in the best state with the fewest number of failures "wins" and goes active. If both cluster members experience a simultaneous equal failure, nothing happens and the currently active member continues to pass traffic.
Here are the tests and what they are trying to do:
Device Name: Interface Active Check
Are all monitored cluster interfaces up, and can we successfully see all other members of the cluster via CCP? Failure example: unplug a firewall's interface or put it on the wrong VLAN where it can't see the other cluster member(s). This test can also fail if all the cluster members can see only themselves on an interface, but can't see at least one other responding IP address on the VLAN with them, such as a router. This test can also fail occasionally if switches don't reliably handle the multicast CCP traffic.
Device Name: Recovery Delay
After coming back up after a reboot/crash and taking a full sync from the active member, wait a certain period of time before going standby or active. Similar to VRRP cold start delay and helps suppress cluster flapping if it is occurring.
Device Name: Synchronization
Can we successfully send and receive sync updates from the other member(s) on the private sync network? Failure example: reboot one member or unplug sync interface.
Device Name: Filter
Is a security policy currently loaded? Failure example: run fw unloadlocal
Device Name: routed
Is our routing process (formerly called FIB) up and running? Failure example: routed process crashes, firewall's routing tables go stale as a result if using dynamic routing protocol(s).
Device Name: cphad
Is Check Point HA function working? I think this used to be an actual process called cphad but is mostly in the kernel now. Failure example: run cphastop
Device Name: fwd
Is the fwd process which handles logs and is the parent process for many firewall processes (formerly called security servers) up and running on the gateway? Failure example: fwd process crashes or is killed by administrator
--
My book "Max Power: Check Point Firewall Performance Optimization"
now available via http://maxpowerfirewalls.com.
Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com