- CheckMates
- :
- Products
- :
- Quantum
- :
- Security Gateways
- :
- ClusterXL Down
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Are you a member of CheckMates?
×- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ClusterXL Down
Hello,
I currently have a 3 member ClusterXL HA.
1 of the members that was in "Standby" status, since a few days ago, has gone to "DOWN" status.
-------------------------------------------------------------------------------------------------------------------------------------
[Expert@fw2:0]# cphaprob show_failover
Last cluster failover event:
Transition to new ACTIVE: Member 1 -> Member 2
Reason: Interface Mgmt is down (Cluster Control Protocol packets are not received)
Event time: Sat Jan 13 08:30:25 2024
Cluster failover count:
Failover counter: 139
Time of counter reset: Fri Jul 28 09:33:23 2023 (reboot)
Cluster failover history (last 20 failovers since reboot/reset on Fri Jul 28 09:33:50 2023):
No. Time: Transition: CPU: Reason:
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1 Sat Jan 13 08:30:25 2024 Member 1 -> Member 2 06 Interface Mgmt is down (Cluster Control Protocol packets are not received)
2 Thu Jan 11 21:23:41 2024 Member 3 -> Member 1 14 Incorrect configuration - Local cluster member has fewer cluster interfaces configured compared to other cluster member(s)
------------------------------------------------------------------------------------------------------------------------------------
[Expert@fw2:0]# ethtool Mgmt
Settings for Mgmt:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supported pause frame use: Symmetric
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
MDI-X: on (auto)
Supports Wake-on: pumbg
Wake-on: g
Current message level: 0x00000007 (7)
drv probe link
Link detected: yes
-------------------------------------------------------------------------------------------------------------------------------------
What I have found, is that the diagnostic commands, make reference to the "Mgmt" interface of the box being "Down", but the interface, physically and logically are normal (on and linking).
The "ethtool Mgmt" also tells us that the box does detect the connected cable.
Can this error be caused by the other equipment connected to the other side of the cable that is on the Mgmt port (either a SW, or other equipment)?
Greetings.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please send below from that member
Andy
cphaprob roles
cphaprob state
cphaprob -a if
cphaprob -i list
cphaprob -l list
cphaprob syncstat
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I share the result of the diagnostic commands.
Thank you for your comments.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yea, definitely something with Mgmt interface. Can you confirm you can get interface without topology in smart console cluster object and does not give any errors?
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried it, and I got the following error message.
Does this make the Firewall responsible for the error?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What does SIC show?
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I note this, in the SIC communication.
Unlike my other 2 GW's that work fine, where the "Test SIC Status" shows me a "Communicating".
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thats your issue then, so you can reset SIC without actually having to do cpstop; cpstart, which would load initial policy anyway if you do SIC reset
https://korkutozcan.com/how-to-reset-sic-without-restarting-check-point-gw/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Buddy,
Isn't this type of alert due to a connectivity problem?
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
yes sir
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey,
I followed the steps in the URL, but I get the following error.
Do you think I should validate something else?
I already reset the SIC in the GW CLI, and I also did it in the FW object that is "corrupted" from the SmartConsole.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You need to see why it fails...check routes, ping, traceroute, do some captures. It appears basic connectivity is not there, if even SIC cant be established, which is an absolute must for policy install to work.
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My ClusterXL HA has 3 members.
I think it is a problem with the SW to which the management interfaces of each box are connected.
Is it advisable, to check the other equipment, to which my failed box is connected?
---------------------------------------------------------------------------------
ACTIVE FW
[Expert@fw1:0]# ping 172.16.113.44
PING 172.16.113.44 (172.16.113.44) 56(84) bytes of data.
64 bytes from 172.16.113.44: icmp_seq=1 ttl=64 time=0.491 ms
64 bytes from 172.16.113.44: icmp_seq=2 ttl=64 time=0.176 ms
[Expert@fw1:0]# ip r g 172.16.113.44
172.16.113.44 dev Mgmt src 172.16.113.2
cache
[Expert@fw1:0]#
[Expert@fw1:0]# traceroute 172.16.113.44
traceroute to 172.16.113.44 (172.16.113.44), 30 hops max, 40 byte packets
1 172.16.113.44 (172.16.113.44) 0.634 ms 0.648 ms 0.731 ms
[Expert@fw1:0]#
---------------------------------------------------------------------------------
1st FW STANDBY
[Expert@fw3:0]# ping 172.16.113.44
PING 172.16.113.44 (172.16.113.44) 56(84) bytes of data.
64 bytes from 172.16.113.44: icmp_seq=2 ttl=64 time=0.970 ms
64 bytes from 172.16.113.44: icmp_seq=3 ttl=64 time=0.523 m
[Expert@fw3:0]# ip r g 172.16.113.44
172.16.113.44 dev Mgmt src 172.16.113.4
cache
[Expert@fw3:0]#
[Expert@fw3:0]# ip r g 172.16.113.44
172.16.113.44 dev Mgmt src 172.16.113.4
cache
[Expert@fw3:0]#
---------------------------------------------------------------------------------
2nd FW STANDBY (This is the one that is failing)
[Expert@fw2:0]# ping 172.16.113.44
PING 172.16.113.44 (172.16.113.44) 56(84) bytes of data.
From 172.16.113.3 icmp_seq=20 Destination Host Unreachable
From 172.16.113.3 icmp_seq=21 Destination Host Unreachable
[Expert@fw2:0]# ip r g 172.16.113.44
172.16.113.44 dev Mgmt src 172.16.113.3
cache
[Expert@fw2:0]# traceroute 172.16.113.44
traceroute to 172.16.113.44 (172.16.113.44), 30 hops max, 40 byte packets
1 * * *
2 * * *
3 * * *
4 * * *
5 * * *
6 * * *
7 * * *
8 * * *
9 * * *
10 * * *
11 * * *
12 * * *
13 * * *
14 * * *
15 * * *
16 * * *
Thanks. 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sort of goes without saying, you should go by process of elimination, ie check whatever equipment is "in the picture"
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For future reference, I would always recommend troubleshooting the connectivity before going straight to resetting SIC. If SIC was established and you then have a connectivity problem, resetting SIC only results in both a connectivity problem and also no SIC.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For sure, 100%. Personally, thats what I always do when people have such an issue.
Andy
