Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
merscoob
Explorer

Packet loss in 1800 HA Cluster internet probes caused by logging into webui

Current image name: R81_996001397_10_07
Current image version: 397

Bit of and odd issue with a brand new 1800 HA Cluster which i wonder if anyone else has seen?.

All works well when left alone but if i log into the web ui on the secondary it seems to trigger packet loss on the internet probes and causes a wobble and a failover event.

2024 May 16 21:01:10 FIREWALLHOSTNAME auth.notice login: [WebUI] Local User 'admin' logged-in to WebUI from '172.16.10.1' as 'Super Admin'
2024 May 16 21:01:11 FIREWALLHOSTNAME daemon.info dnsmasq: reading /etc/resolv.conf
2024 May 16 21:01:11 FIREWALLHOSTNAME daemon.info dnsmasq: using nameserver 217.172.141.44#53
2024 May 16 21:01:11 FIREWALLHOSTNAME daemon.info dnsmasq: using nameserver 8.8.8.8#53
2024 May 16 21:01:11 FIREWALLHOSTNAME daemon.info dnsmasq: read /var/hosts - 17 addresses
2024 May 16 21:01:12 FIREWALLHOSTNAME user.info lua: [Security Settings] A policy change has been applied
2024 May 16 21:01:12 FIREWALLHOSTNAME user.info lua: [Security Settings] High Availability policy change has been applied
2024 May 16 21:01:14 FIREWALLHOSTNAME daemon.info dnsmasq: reading /etc/resolv.conf
2024 May 16 21:01:14 FIREWALLHOSTNAME daemon.info dnsmasq: using nameserver 217.172.141.44#53
2024 May 16 21:01:14 FIREWALLHOSTNAME daemon.info dnsmasq: using nameserver 8.8.8.8#53
2024 May 16 21:01:14 FIREWALLHOSTNAME daemon.info dnsmasq: read /var/hosts - 17 addresses
2024 May 16 21:01:16 FIREWALLHOSTNAME daemon.info dnsmasq: reading /etc/resolv.conf
2024 May 16 21:01:16 FIREWALLHOSTNAME daemon.info dnsmasq: using nameserver 217.172.141.44#53
2024 May 16 21:01:16 FIREWALLHOSTNAME daemon.info dnsmasq: using nameserver 8.8.8.8#53
2024 May 16 21:01:16 FIREWALLHOSTNAME daemon.info dnsmasq: read /var/hosts - 17 addresses
2024 May 16 21:01:28 FIREWALLHOSTNAME user.err cposd: [CPOSD] Error: Could not resolve name for probed server dns.cloudflare.com
2024 May 16 21:01:28 FIREWALLHOSTNAME user.err cposd: [CPOSD] Error: Could not resolve name for probed server dns.opendns.com
2024 May 16 21:01:28 FIREWALLHOSTNAME user.err cposd: [CPOSD] Error: Could not resolve name for probed server dns.cloudflare.com
2024 May 16 21:01:31 FIREWALLHOSTNAME user.err cposd: [CPOSD] Error: Could not resolve name for probed server dns.cloudflare.com
2024 May 16 21:01:31 FIREWALLHOSTNAME user.err cposd: [CPOSD] Error: Could not resolve name for probed server dns.opendns.com
2024 May 16 21:01:31 FIREWALLHOSTNAME user.info cposd: [CPOSD] WAN connection "Internet1": Internet connection probe status has changed to Disconnected. servers: 3, fails: 10, attempts: 30
2024 May 16 21:01:31 FIREWALLHOSTNAME daemon.info dnsmasq: reading /etc/resolv.conf
2024 May 16 21:01:31 FIREWALLHOSTNAME daemon.info dnsmasq: using nameserver 217.172.141.44#53
2024 May 16 21:01:31 FIREWALLHOSTNAME daemon.info dnsmasq: using nameserver 8.8.8.8#53
2024 May 16 21:01:31 FIREWALLHOSTNAME daemon.info dnsmasq: read /var/hosts - 17 addresses
2024 May 16 21:01:58 FIREWALLHOSTNAME user.info cposd: [CPOSD] WAN connection "Internet1": Internet connection probe status has changed to Connected. servers: 3, fails: 9, attempts: 30
2024 May 16 21:03:00 FIREWALLHOSTNAME daemon.info dnsmasq: reading /etc/resolv.conf
2024 May 16 21:03:00 FIREWALLHOSTNAME daemon.info dnsmasq: using nameserver 217.172.141.44#53
2024 May 16 21:03:00 FIREWALLHOSTNAME daemon.info dnsmasq: using nameserver 8.8.8.8#53
2024 May 16 21:03:00 FIREWALLHOSTNAME daemon.info dnsmasq: read /var/hosts - 17 addresses
2024 May 16 21:03:18 FIREWALLHOSTNAME user.err cposd: [CPOSD] Error: Could not resolve name for probed server dns.cloudflare.com
2024 May 16 21:03:18 FIREWALLHOSTNAME user.err cposd: [CPOSD] Error: Could not resolve name for probed server dns.opendns.com
2024 May 16 21:03:21 FIREWALLHOSTNAME user.err cposd: [CPOSD] Error: Could not resolve name for probed server dns.cloudflare.com
2024 May 16 21:03:21 FIREWALLHOSTNAME user.err cposd: [CPOSD] Error: Could not resolve name for probed server dns.opendns.com
2024 May 16 21:03:21 FIREWALLHOSTNAME user.info cposd: [CPOSD] WAN connection "Internet1": Internet connection probe status has changed to Disconnected. servers: 3, fails: 10, attempts: 30
2024 May 16 21:03:21 FIREWALLHOSTNAME daemon.info dnsmasq: reading /etc/resolv.conf
2024 May 16 21:03:21 FIREWALLHOSTNAME daemon.info dnsmasq: using nameserver 217.172.141.44#53
2024 May 16 21:03:21 FIREWALLHOSTNAME daemon.info dnsmasq: using nameserver 8.8.8.8#53
2024 May 16 21:03:21 FIREWALLHOSTNAME daemon.info dnsmasq: read /var/hosts - 17 addresses
2024 May 16 21:03:35 FIREWALLHOSTNAME auth.notice login: [WebUI] Local User 'admin' logged-in to WebUI from '172.16.10.1' as 'Super Admin'
2024 May 16 21:03:39 FIREWALLHOSTNAME daemon.info dnsmasq: reading /etc/resolv.conf
2024 May 16 21:03:39 FIREWALLHOSTNAME daemon.info dnsmasq: using nameserver 217.172.141.44#53
2024 May 16 21:03:39 FIREWALLHOSTNAME daemon.info dnsmasq: using nameserver 8.8.8.8#53
2024 May 16 21:03:39 FIREWALLHOSTNAME daemon.info dnsmasq: read /var/hosts - 17 addresses
2024 May 16 21:03:40 FIREWALLHOSTNAME daemon.info dnsmasq: read /var/hosts - 17 addresses
2024 May 16 21:03:41 FIREWALLHOSTNAME daemon.info dnsmasq: reading /etc/resolv.conf
2024 May 16 21:03:41 FIREWALLHOSTNAME daemon.info dnsmasq: using nameserver 217.172.141.44#53
2024 May 16 21:03:41 FIREWALLHOSTNAME daemon.info dnsmasq: using nameserver 8.8.8.8#53
2024 May 16 21:03:45 FIREWALLHOSTNAME user.info cposd: [CPOSD] WAN connection "Internet1": Internet connection probe status has changed to Connected. servers: 3, fails: 9, attempts: 30
2024 May 16 21:03:46 FIREWALLHOSTNAME user.notice discntd_ctrl: Started...
2024 May 16 21:03:46 FIREWALLHOSTNAME user.notice discntd_ctrl: Called as sender...
2024 May 16 21:03:46 FIREWALLHOSTNAME user.notice discntd_ctrl: File has changed...
2024 May 16 21:03:46 FIREWALLHOSTNAME user.notice discntd_apply: Started...
2024 May 16 21:03:57 FIREWALLHOSTNAME user.notice discntd_apply: Done...
2024 May 16 21:04:19 FIREWALLHOSTNAME user.notice discntd_ctrl: Started...
2024 May 16 21:04:19 FIREWALLHOSTNAME user.notice discntd_ctrl: Called as sender...
2024 May 16 21:04:19 FIREWALLHOSTNAME user.notice discntd_ctrl: File has changed...
2024 May 16 21:04:19 FIREWALLHOSTNAME user.notice discntd_apply: Started...
2024 May 16 21:04:30 FIREWALLHOSTNAME user.notice discntd_apply: Done...

 

 

Packet loss aggregations from yesterday correspond with the times i was testing my theory it was caused by the webui.

FIREWALLHOSTNAME> show internet probe-stats
wan1:
server: 8.8.8.8:
time avg[ms] min[ms] max[ms] packet loss[%]
10:00 4.00 3 5 0.08
11:00 4.00 3 5 0.00
12:00 4.06 3 12 10.56
13:00 4.14 4 12 0.00
14:00 4.14 4 13 0.00
15:00 4.17 4 15 0.00
16:00 4.13 4 12 0.00
17:00 4.15 4 16 0.00
18:00 4.14 4 20 6.94
19:00 4.14 4 14 0.16
20:00 4.16 4 15 0.08
21:00 4.13 4 15 2.25
22:00 4.16 4 13 0.00
23:00 4.14 4 10 0.00
00:00 4.11 4 7 0.00

0 Kudos
5 Replies
PhoneBoy
Admin
Admin

Please try upgrading to the most recent version.
Latest R81.10.10 for the 1800 is here: https://support.checkpoint.com/results/download/132304
(Note the link requires your UserCenter account to have an active Software Subscription)

0 Kudos
merscoob
Explorer

Managed to get both on the latest version and still behaving the same. Oddly its showing failures on the failover counter but there's nothing in the history.

Definitely seems to correlate with me logging into the management on the secondary via the remote access VPN on the primary , which seems to cause the internet stability on both nodes.

The WAN links are connected to a colo datacentre L3 switch which looks to be running HSRP across their 2 core Cisco switches.

 

 

 

Last cluster failover event:
Transition to new ACTIVE: Member 1 -> Member 2
Reason: Available on member 1
Event time: Wed May 22 12:58:51 2024

Cluster failover count:
Failover counter: 4
Time of counter reset: Wed May 22 12:03:47 2024 (reboot)


Cluster failover history (last 20 failovers since reboot/reset on Wed May 22 12:03:47 2024):

No. Time: Transition: CPU: Reason:
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

No failover was detected since last reboot/reset

 

 

 

0 Kudos
PhoneBoy
Admin
Admin

What was the reason you were logging into the secondary?

0 Kudos
G_W_Albrecht
Legend Legend
Legend

Open an SR# with CP TAC - this seems rather strange ! Remark: Usually, with SMB clusters you log into WebGUI with the VIP, as all config is only done on the active/primary node and then synced to the standby/secondary.

 

CCSP - CCSE / CCTE / CTPS / CCME / CCSM Elite / SMB Specialist
0 Kudos
merscoob
Explorer

Just to draw a line under this. Updating to the latest firmware made no difference.

Our support company escalated to checkpoint who suggested we disabled the internet connection monitoring probes which seems to have done the trick and logging into the standby device no longer destabilises the cluster.

Doesn't really explain what was causing the issue but these 2 devices aren't going to be terminating the internet for much longer so it doesn't really make much odds to us.

 

 

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events