Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Matlu
Advisor

ClusterXL Down

Hello,

I currently have a 3 member ClusterXL HA.
1 of the members that was in "Standby" status, since a few days ago, has gone to "DOWN" status.

-------------------------------------------------------------------------------------------------------------------------------------

[Expert@fw2:0]# cphaprob show_failover

Last cluster failover event:
Transition to new ACTIVE: Member 1 -> Member 2
Reason: Interface Mgmt is down (Cluster Control Protocol packets are not received)
Event time: Sat Jan 13 08:30:25 2024

Cluster failover count:
Failover counter: 139
Time of counter reset: Fri Jul 28 09:33:23 2023 (reboot)


Cluster failover history (last 20 failovers since reboot/reset on Fri Jul 28 09:33:50 2023):

No. Time: Transition: CPU: Reason:
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1 Sat Jan 13 08:30:25 2024 Member 1 -> Member 2 06 Interface Mgmt is down (Cluster Control Protocol packets are not received)
2 Thu Jan 11 21:23:41 2024 Member 3 -> Member 1 14 Incorrect configuration - Local cluster member has fewer cluster interfaces configured compared to other cluster member(s)


------------------------------------------------------------------------------------------------------------------------------------

[Expert@fw2:0]# ethtool Mgmt
Settings for Mgmt:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supported pause frame use: Symmetric
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
MDI-X: on (auto)
Supports Wake-on: pumbg
Wake-on: g
Current message level: 0x00000007 (7)
drv probe link
Link detected: yes

-------------------------------------------------------------------------------------------------------------------------------------

What I have found, is that the diagnostic commands, make reference to the "Mgmt" interface of the box being "Down", but the interface, physically and logically are normal (on and linking).

The "ethtool Mgmt" also tells us that the box does detect the connected cable.

Can this error be caused by the other equipment connected to the other side of the cable that is on the Mgmt port (either a SW, or other equipment)?

Greetings.

0 Kudos
15 Replies
the_rock
Legend
Legend

Please send below from that member

Andy

cphaprob roles

cphaprob state

cphaprob -a if

cphaprob -i list

cphaprob -l list

cphaprob syncstat

0 Kudos
Matlu
Advisor

Hello,

I share the result of the diagnostic commands.

Thank you for your comments.

0 Kudos
the_rock
Legend
Legend

Yea, definitely something with Mgmt interface. Can you confirm you can get interface without topology in smart console cluster object and does not give any errors?

Andy

0 Kudos
Matlu
Advisor

I tried it, and I got the following error message.

CL1.png

Does this make the Firewall responsible for the error?

0 Kudos
the_rock
Legend
Legend

What does SIC show?

Andy

0 Kudos
Matlu
Advisor

I note this, in the SIC communication.

CL2.png

Unlike my other 2 GW's that work fine, where the "Test SIC Status" shows me a "Communicating".

0 Kudos
the_rock
Legend
Legend

Thats your issue then, so you can reset SIC without actually having to do cpstop; cpstart, which would load initial policy anyway if you do SIC reset

https://korkutozcan.com/how-to-reset-sic-without-restarting-check-point-gw/

0 Kudos
Matlu
Advisor

Buddy,

Isn't this type of alert due to a connectivity problem?

Thanks.

0 Kudos
the_rock
Legend
Legend

yes sir

0 Kudos
Matlu
Advisor

Hey,

I followed the steps in the URL, but I get the following error.

CL3.png

Do you think I should validate something else?

I already reset the SIC in the GW CLI, and I also did it in the FW object that is "corrupted" from the SmartConsole.

0 Kudos
the_rock
Legend
Legend

You need to see why it fails...check routes, ping, traceroute, do some captures. It appears basic connectivity is not there, if even SIC cant be established, which is an absolute must for policy install to work.

Andy

0 Kudos
Matlu
Advisor

My ClusterXL HA has 3 members.

I think it is a problem with the SW to which the management interfaces of each box are connected.

Is it advisable, to check the other equipment, to which my failed box is connected?

---------------------------------------------------------------------------------

ACTIVE FW

[Expert@fw1:0]# ping 172.16.113.44
PING 172.16.113.44 (172.16.113.44) 56(84) bytes of data.
64 bytes from 172.16.113.44: icmp_seq=1 ttl=64 time=0.491 ms
64 bytes from 172.16.113.44: icmp_seq=2 ttl=64 time=0.176 ms

[Expert@fw1:0]# ip r g 172.16.113.44
172.16.113.44 dev Mgmt src 172.16.113.2
cache
[Expert@fw1:0]#
[Expert@fw1:0]# traceroute 172.16.113.44
traceroute to 172.16.113.44 (172.16.113.44), 30 hops max, 40 byte packets
1 172.16.113.44 (172.16.113.44) 0.634 ms 0.648 ms 0.731 ms
[Expert@fw1:0]#

---------------------------------------------------------------------------------

1st FW STANDBY

[Expert@fw3:0]# ping 172.16.113.44
PING 172.16.113.44 (172.16.113.44) 56(84) bytes of data.
64 bytes from 172.16.113.44: icmp_seq=2 ttl=64 time=0.970 ms
64 bytes from 172.16.113.44: icmp_seq=3 ttl=64 time=0.523 m

[Expert@fw3:0]# ip r g 172.16.113.44
172.16.113.44 dev Mgmt src 172.16.113.4
cache
[Expert@fw3:0]#
[Expert@fw3:0]# ip r g 172.16.113.44
172.16.113.44 dev Mgmt src 172.16.113.4
cache
[Expert@fw3:0]#

---------------------------------------------------------------------------------

2nd FW STANDBY (This is the one that is failing)

[Expert@fw2:0]# ping 172.16.113.44
PING 172.16.113.44 (172.16.113.44) 56(84) bytes of data.
From 172.16.113.3 icmp_seq=20 Destination Host Unreachable
From 172.16.113.3 icmp_seq=21 Destination Host Unreachable

[Expert@fw2:0]# ip r g 172.16.113.44
172.16.113.44 dev Mgmt src 172.16.113.3
cache

[Expert@fw2:0]# traceroute 172.16.113.44
traceroute to 172.16.113.44 (172.16.113.44), 30 hops max, 40 byte packets
1 * * *
2 * * *
3 * * *
4 * * *
5 * * *
6 * * *
7 * * *
8 * * *
9 * * *
10 * * *
11 * * *
12 * * *
13 * * *
14 * * *
15 * * *
16 * * *

Thanks. 🙂

0 Kudos
the_rock
Legend
Legend

Sort of goes without saying, you should go by process of elimination, ie check whatever equipment is "in the picture"

Andy

0 Kudos
emmap
Employee
Employee

For future reference, I would always recommend troubleshooting the connectivity before going straight to resetting SIC. If SIC was established and you then have a connectivity problem, resetting SIC only results in both a connectivity problem and also no SIC. 

the_rock
Legend
Legend

For sure, 100%. Personally, thats what I always do when people have such an issue.

Andy

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events