Standby cluster member not logging to SMS

Mike_Jensen · ‎2023-06-02

I have a active / standby 2 gateway ClusterXL cluster running 80.40 with JHA take 196.

Whichever gateway is the standby member logs locally as it's unable to connect to the SMS.

A "tcpdump -nni any port 257" taken on the standby gateway itself shows logs being sent from the standby cluster member with a source IP of the cluster VIP and all of the packets are TCP SYN's as no connection to the SMS is actually made.

The logs are being sent on the interface closest to the SMS and not the sync interface => the active member like I thought it was supposed to work in 80.40.

If I do a admin failover the previously standby member that is now active starts logging to the SMS again and the new standby member stops.

A "tcpdump -nni any port 257" taken on the new standby member shows it basically logging to itself:

15:27:18.168847 IP 127.0.0.1.59453 > 127.0.0.1.257: Flags [P.], seq 84:88, ack 65, win 43, options [nop,nop,TS val 3386217330 ecr 3386217330], length 4
15:27:18.168906 IP 127.0.0.1.257 > 127.0.0.1.59453: Flags [P.], seq 65:69, ack 88, win 43, options [nop,nop,TS val 3386217330 ecr 3386217330], length 4
15:27:18.168939 IP 127.0.0.1.59453 > 127.0.0.1.257: Flags [P.], seq 88:92, ack 69, win 43, options [nop,nop,TS val 3386217330 ecr 3386217330], length 4
15:27:18.208444 IP 127.0.0.1.257 > 127.0.0.1.59453: Flags [.], ack 92, win 43, options [nop,nop,TS val 3386217370 ecr 3386217330], length 0

I looked through sk169154 "Asymmetric Connections in ClusterXL R80.20 and Higher" and have tried issuing " but can't seem to resolve this issue.

I have also tried the "fw ctl set int fwha_cluster_hide_active_only 0" command on both gateway members without any success.

the_rock · ‎2023-06-02

At least to me, reading your post, logically it sounds like its something related to clustering thats causing this, as it happens regardless which ons is backup. Just to make sure, can you please send output of below commands (blur out any sensitive info).

Andy

cphaprob roles

chpaorb state

cphaprob -a if

cphaprob syncstat

cphaprob -i list

Andy

the_rock · ‎2023-06-02

Also, another thing I forgot...make sure content of below is same on both gateways.

Andy

cat $FWDIR/conf/masters

Mike_Jensen · ‎2023-06-05

Here is the output from the commands with the hostnames and IP's redacted:

Taken on the standby member

[Expert@XXXXXX-FWB:0]# cphaprob roles

ID Role

1 Master
2 (local) Non-Master

[Expert@XXXXXXFWB:0]# cphaprob stat

Cluster Mode: High Availability (Active Up) with IGMP Membership

ID Unique Address Assigned Load State Name

1 172.25.1.2 100% ACTIVE xxxx_sg1
2 (local) 172.25.1.3 0% STANDBY xxxx_sg2

Active PNOTEs: None

Last member state change event:
Event Code: CLUS-114802
State change: DOWN -> STANDBY
Reason for state change: There is already an ACTIVE member in the cluster (member 1)
Event time: Fri Jun 2 15:28:46 2023

Last cluster failover event:
Transition to new ACTIVE: Member 2 -> Member 1
Reason: ADMIN_DOWN PNOTE
Event time: Fri Jun 2 15:28:41 2023

Cluster failover count:
Failover counter: 69
Time of counter reset: Wed Feb 9 22:16:32 2022 (reboot)

[Expert@XXXXXXFWB:0]# cphaprob -a if

CCP mode: Manual (Unicast)
Required interfaces: 10
Required secured interfaces: 1

Interface Name: Status:

eth1 UP
Sync (S) UP
bond1.1026 (LS) UP
eth6.32 UP
bond2.1028 (LS) UP
eth7.17 UP
bond2.51 (LS) UP
eth6.1027 UP
bond1.50 (LS) UP
eth7.19 UP

S - sync, LM - link monitor, HA/LS - bond type

Virtual cluster interfaces: 19

eth1 x.x.x.x
eth7 x.x.x.x
eth6.39 x.x.x.x
bond1.1026 x.x.x.x
eth6.32 x.x.x.x
eth6.52 x.x.x.x
bond2.1028 x.x.x.x
bond2.1025 x.x.x.x
eth7.17 x.x.x.x
eth6.38 x.x.x.x
bond2.51 x.x.x.x
eth6.45 x.x.x.x
eth6.36 x.x.x.x
eth6.1027 x.x.x.x
bond1.50 x.x.x.x
eth6.47 x.x.x.x
eth6.46 x.x.x.x
eth7.19 x.x.x.x
eth6.48 x.x.x.x

[Expert@XXXXFWB:0]# cphaprob syncstat

Delta Sync Statistics

Sync status: OK

Drops:
Lost updates................................. 0
Lost bulk update events...................... 0
Oversized updates not sent................... 0

Sync at risk:
Sent reject notifications.................... 0
Received reject notifications................ 0

Sent messages:
Total generated sync messages................ 5826087
Sent retransmission requests................. 29
Sent retransmission updates.................. 247
Peak fragments per update.................... 1

Received messages:
Total received updates....................... 171176321
Received retransmission requests............. 50

Sync Interface:
Name......................................... Sync
Link speed................................... 1000Mb/s
Rate......................................... 74870 [Bps]
Peak rate.................................... 1283 [KBps]
Link usage................................... 0%
Total........................................ 248215[MB]

Queue sizes (num of updates):
Sending queue size........................... 512
Receiving queue size......................... 256
Fragments queue size......................... 50

Timers:
Delta Sync interval (ms)..................... 100

Reset on Mon Apr 24 10:31:31 2023 (triggered by fullsync).

[Expert@XXXXXFWB:0]# cphaprob -i list

There are no pnotes in problem state

------------------

From the standby member:

cat $FWDIR/conf/masters

[Policy]
xxxx-fw1
[Log]
xxxx-fw1
[Alert]
xxxx-fw1
[Backup]
xxxx-fw2

From the active member:

[Policy]
xxxx-fw1
[Log]
xxxx-fw1
[Alert]
xxxx-fw1
[Backup]
xxxx-fw2

Are you a member of CheckMates?

Standby cluster member not logging to SMS