How to get interface status of particular time slo...

Prashan_Attanay · ‎2017-11-15

Hi Team,

Our checkpoint cluster devices change suddenly there are status Active to Standby and Standby to Active. I check the log from smartview tracker and i found this

"Record Details

cluster_info: (ClusterXL) member 1 (172.16.251.1) is down (Interface Active Check on member 1 (172.16.251.1) detected a problem (eth4.20 interface is down, 7 interfaces required, only 6 up).)."

So how to get interface status of particular time slot ?

Currently detected interface is up

Thanx

PhoneBoy · ‎2017-11-15

It's not necessarily an interface up/down thing.

Start with this SK: Interface flapping (down/up) in a ClusterXL environment

Prashan_Attanay · ‎2017-11-15

Thank you Dameon, but I'm not able to see the solution right now

Marco_Valenti · ‎2017-11-16

it does not mean necessary that the interface went down , probably ccp packet was not heard for a period of time , in my opinion the best way is still to check on the switch if the link went down you will have 100% accuracy with that and beware of igmp snooping switch side

Prashan_Attanay · ‎2017-11-16

Hi Marco,

Thank you for your reply, can u explain what is CCP packet ?

BR

Marco_Valenti · ‎2017-11-16

clustering packet that are sent over all clustered interface used to detect when a failover between cluster member is needed sent on port 8116 with a multicast mac address if I remember correct

Danny · ‎2017-11-16

Our ccc Script provides a solution:

show routed cluster-state detailed - Show ClusterXL failover history"

Additionally CPView provides history data that you can show via: cpview -t <timestamp>

SmartLog is also a great place to check for Cluster failovers.

Prashan_Attanay · ‎2017-11-16

Thank you Danny. I checked with Cpview, it didn't indicate the interface down

Danilo_Lara · ‎2017-11-23

I think the best way to know if the physical interfaces went down is at /var/log/messages. Grep for "down" to filter the results.

Increase the number of rotated /var/log/messages as per sk36798.

Daniel_Taney · ‎2017-11-16

If the interface is eth4.20, I'm assuming that eth4 is configured as a trunk port going up to the Firewall from the switch? Are there other VLANS besides 20 defined on eth4? If there are, and those stayed up, it sounds like that VLAN had some issue. Maybe there was a spanning tree event on VLAN 20 that cause some CCP packets to get missed?

R80 CCSA / CCSE

Prashan_Attanay · ‎2017-11-17

Hi Daniel,

Yes there is one Vlan. At particular time all interface are up according to CPView.

Vladimir · ‎2017-11-16

Prashan,

I am not sure what version you are running, but there was an issue with r77.30 prior to Take_189, described here:

Adding a new VLAN with lowest/highest VLAN ID causes the ClusterXL member to go "Down"

which may be applicable to your situation.

Prashan_Attanay · ‎2017-11-17

Hi Vladimir,

It is R77.30 T286, btw thank you for the SK

Vladimir · ‎2017-11-17

Please verify which mode the CCP is working in: multicast or broadcast.

If it is a multicast, in some instances and with some switches, complications are caused by the treatment of igmp.

I.e. Switch drops Check Point CCP packets when CCP is working in multicast mode

You can always switch to broadcast mode to eliminate the multicast as a culprit:

How to set ClusterXL Control Protocol (CCP) in Broadcast / Multicast mode in ClusterXL

Daniel_Taney · ‎2017-11-17

We had an issue with multicast on Cisco Nexus 7k's and VPC where some VLANS would just intermittently go down. Changing CCP to broadcast resolved our issues.

R80 CCSA / CCSE

Vladimir · ‎2017-11-17

Yep. When TAC troubleshooting clustering issues, this is typically one of the first steps they take.

I personally, like the multicast mode and was using it with HP ProCurve enterprise switches.

With Cisco it is hit or miss, depending on platform, iOS version, topology, etc..

I am curious if the unicast mode will be made available at some point for HA only clusters. I believe this is the only supported mode in vSEC.

In VSLS situations, you'll need the broadcast or multicast, but in HA only, unicast would work cleaner.

PhoneBoy · ‎2017-11-17

I've been doing this long enough to remember the days when sync was unicast.

I believe in the NG timeframe was when they changed sync to be multicast.

That's also when ClusterXL became a thing.

Note that even if sync is unicast, you still need multicast for the floating IP address.

In public clouds, which don't support multicast at all, we have to implement the "floating IP" concept differently (using API calls to change routing tables).

Prashan_Attanay · ‎2017-11-17

Thnx Vladimir

Johan_van_Somme · ‎2017-11-16

Hi,

we had this particular issue also, but on physical interfaces. if the eth4.20 is your sync interface (hope it's not btw) you could check using Smartlog as mentioned before.

In our case it was caused by high CP loads, which caused missed CCP packets. this issue doesn't appear only when doing a policy install does it?

Prashan_Attanay · ‎2017-11-17

Yes eth4.20 is one our sync interface.

Please find the log screenshot as follow

Vladimir · ‎2017-11-17

Can you explain to me the need to have the VLAN 20 tagged on the eth4 for SYNC interface if this is the only VLAN defined on it?

This just seem to add complexity to your environment without any tangible benefits.

ClusterXL does support SYNC via VLAN:

"In ClusterXL, the synchronization network is supported on the lowest VLAN tag of a VLAN interface. For example, if three VLANs with tags 10, 20 and 30 are configured on interface eth1, interface eth1.10 may be used for synchronization."

Since this is a SYNC interface in HA environment, if possible, use direct patch between cluster members.

If you have to traverse switches, have the switch ports configured in access mode (if Cisco: switchport access vlan 20) and use eth4 for SYNC without sub-interface.

Are you a member of CheckMates?

How to get interface status of particular time slot