- CheckMates
- :
- Products
- :
- General Topics
- :
- Re: Sync interface DOWN after reboot of Standby Me...
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Are you a member of CheckMates?
×- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sync interface DOWN after reboot of Standby Member - any other TShoot options?
Hi all, we have an issue with our VSX HA Cluster (Two gateways, Active/Standby), where after rebooting the Standby for whatever reason the Sync interface remains DOWN. In the past when this occurred, a physical power down of the standby restored the link, but a normal reboot does not (nor bouncing the link).
We're in the process of eliminating physical problems, particularly replacing the cable and SFP for this link. But I was wondering if there is any other troubleshooting steps I might be able to do in the mean time?
[ACTIVE] SYNC (eth3-04) <----> (eth3-04) SYNC [STANDBY]
Currently we have no HA resiliency, all VS are DOWN on the standby which isn't ideal.
Interface counters show no incrementing RX or TX on either side.
cphaprob syncstat does show incrementing SENT sync messages, but no received messages.
My theory is maybe the SFP/Transceiver is faulty, and perhaps in a normal reboot the SFP doesn't lose power, but in a full physical power down it does? Which maybe causes the link to come back up, I'm not sure..
I appreciate any thoughts!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all, to confirm it was a faulty SFP, so indeed a physical issue. The SFP was allowed to be RMA'd with Checkpoint, and the replacement SFP brought the link back online.
Thanks all for your assistance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could you perform command below and share me result:
cphaprob stat
cphaprob -a if
tcpdump -nni <name interface sync> port 8116
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the reply Tron, please see below (I emitted some details like hostname/IP).
Even running tcpdump without port specified shows no packets at all on the interface.. so it seems the link is completely dead which makes me think it must be a physical issue.
Standby_Gateway:0> cphaprob stat
Cluster Mode: Virtual System Load Sharing (Primary Up)
ID Unique Address Assigned Load State Name
1 x.x.x.x 100% ACTIVE(!) Primary_Gateway
2 (local) x.x.x.x 0% DOWN Standby_Gateway
Active PNOTEs: IAC
Last member state change event:
Event Code: CLUS-110205
State change: ACTIVE(!) -> DOWN
Reason for state change: Interface eth3-04 is down (disconnected / link down)
Event time: Mon Aug 7 13:39:34 2023
Last cluster failover event:
Transition to new ACTIVE: Member 1 -> Member 2
Reason: Available on member 1
Event time: Mon Aug 7 13:39:01 2023
Cluster failover count:
Failover counter: 7
Time of counter reset: Tue Sep 6 17:00:37 2022 (reboot)
Cluster name: Cluster
Virtual Devices Status on each Cluster Member
=============================================
ID | Weight| Primary | Standby
| | |
| | |
| | | [local]
-------+-------+-----------+-----------
2 | 10 | ACTIVE(!) | DOWN
3 | 10 | ACTIVE(!) | DOWN
---------------+-----------+-----------
Active | 2 | 0
Weight | 20 | 0
Weight (%) | 100 | 0
Legend: Init - Initializing, Active! - Active Attention
Down! - ClusterXL Inactive or Virtual System is Down
Standby_Gateway:0> cphaprob -a if
vsid 0:
------
CCP mode: Manual (Unicast)
Required interfaces: 1
Required secured interfaces: 0
Interface Name: Status:
eth1-01 UP
eth3-04 (S) DOWN (72062 secs)
S - sync, HA/LS - bond type, LM - link monitor, P - probing
Virtual cluster interfaces: 1
eth1-01 x.x.x.x
[Expert@Standby_Gateway:0]# tcpdump -nni eth3-04 port 8116
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth3-04, link-type EN10MB (Ethernet), capture size 262144 bytes
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your respone,
As information your provide, we can see:
Interface eth3-04 is down (disconnected / link down)
This causes the HAstatus to Alert DOWN. Let's check what this interface is, where this physical interface is connected, is it through any switches device?
Are there any previous changes?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How are the links cabled - are the gateways directly connected to each other (not recommended) or via a switch.
My preferred way is to have two sync interfaces in a non-LACP bond (eg. round robin works) going to two separate switches.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
True for sure there is switches between them, not directly connected.. so this could be a factor also.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Bro,
Please check status of physical interface or compare VLAN access for that interface.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Why not recommended direct cable between FWs? In my opinion switch is an added point of failure
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I can think of this as user need. Because you can plug the cable directly between 2 devices as long as both things are in the same rack.
If both devices are located in 2 different racks, then plugging through the switch will create aesthetics and make it easier to change cables when there is a problem in the physical layer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
First off - there is Check Point's guidance on supported topologies for the sync network. Note how on all there is a switch specified.
I could build out a couple of failure scenarios - but @Bob_Zimmerman has already done a better job of it than what I can on this CheckMates post here.
If you are concerned about a switch being a single point of failure, then likely it is a SPOF for other things in your environment as well. Solve this issue with two sync interfaces in a non-LACP bond (eg. round robin works) going to two separate switches.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all, to confirm it was a faulty SFP, so indeed a physical issue. The SFP was allowed to be RMA'd with Checkpoint, and the replacement SFP brought the link back online.
Thanks all for your assistance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It is good news =))
![](/skins/images/74119E49EB1AA30407316FFB9151D237/responsive_peak/images/icon_anonymous_message.png)