- CheckMates
- :
- Products
- :
- General Topics
- :
- Re: State: Connection with 'fw-vsx-n01' is lost
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Are you a member of CheckMates?
×- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
State: Connection with 'fw-vsx-n01' is lost
Hello,
One of the VSs (out of 4), started reporting today:
State: Connection with 'fw-vsx-n01' is lost
There are no issues reported from the SSH session -- this node is active and handles the load.
cphaprob state, cphaprob -a if, cphaprob -ia list, revealed nothing wrong.
Tried to reboot the management but it didn't help.
Management: R81.20 Take 65
CP VSX: R81.20 Take 90
Thank you.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The https://support.checkpoint.com/results/sk/sk101484 said:
- In some scenarios cpwd.elg shows repeatedly:
[ERROR] CPD (pid_of_cpd) did not send keep-alive message for x number of times
\m/_(>_<)_\m/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What does 'vsx stat -v' show?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Looks healthy too.
The one that is complaining about one of the nodes being down is VS5 (fw-vs-cloud):
VSX Gateway Status
==================
Name: fw-vsx-ext-n01
Access Control Policy: fw-vsx-external-vsx
Installed at: 14Jan2025 15:27:58
Threat Prevention Policy: <No Policy>
SIC Status: Trust
Number of Virtual Systems allowed by license: 6
Virtual Systems [active / configured]: 3 / 3
Virtual Routers and Switches [active / configured]: 2 / 2
Total connections [current / limit]: 32393 / 96500
Virtual Devices Status
======================
ID | Type & Name | Access Control Policy | Installed at | Threat Prevention Policy | SIC Stat
-----+-------------------------+-----------------------+-----------------+--------------------------+---------
1 | S fw-vs-test | fw-vs-test-policy | 23Jan2025 16:18 | <No Policy> | Trust
2 | W vsw-ext | <Not Applicable> | | <Not Applicable> | Trust
3 | W vsw-transit | <Not Applicable> | | <Not Applicable> | Trust
4 | S fw-vs-ext | fw-vs-ext-policy | 27Jan2025 11:47 | <No Policy> | Trust
5 | S fw-vs-cloud | fw-vs-cloud-policy | 24Jan2025 11:00 | <No Policy> | Trust
Type: S - Virtual System, B - Virtual System in Bridge mode,
R - Virtual Router, W - Virtual Switch.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Does it happen periodically
Maybe the cpd process crashes.
Have a look at on this: https://support.checkpoint.com/results/sk/sk101484
What does cpwd.elg say?
Akos
\m/_(>_<)_\m/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @AkosBakos
No, this is the first time it happened.
I checked cpwd.elg on the affected node, and although I see 'did not send keep-alive message for 1 number of times' error messages, none of them are related to CPD, but rather MSGD:
[cpWatchDog 19785 4133372096]@fw-vsx-n01[27 Jan 11:47:49] [ERROR] MSGD (pid=30558) did not send keep-alive message for 1 number of times
[cpWatchDog 19785 4133372096]@fw-vsx-n01[27 Jan 11:49:34] [ERROR] MSGD (pid=30661) did not send keep-alive message for 1 number of times
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The https://support.checkpoint.com/results/sk/sk101484 said:
- In some scenarios cpwd.elg shows repeatedly:
[ERROR] CPD (pid_of_cpd) did not send keep-alive message for x number of times
\m/_(>_<)_\m/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @AkosBakos
Yes, but SK mentions that it's CPD daemon that is reporting the error. In my case it's MSGD -- do you think it's the same?
There are no traces of core dumps in the logs and no high CPU observed. SIC is also fine. I haven't tried to push the policy on the affected VS though.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Because the lack of information, I can't say this is the same or not, but there are symptomes which are the same.
In this case, the best thing what you can do to ask the TAC about this issue.
Akos
\m/_(>_<)_\m/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Concur, anything of note flagged in HCP ?
Otherwise some suggestions:
- Attempt policy install
- Restart cpd per sk97638
- Failover / Reboot gateways
- Patch with latest recommended JHF
- Open a TAC case (attach HCP & CPinfo)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for the hint @AkosBakos !
With the help of our CP partner the issue has been identified. CPD crashed and was failing to restart since then. No CPEPS database corruption has been observed, so killing the stale process and stopping/starting CPD manually in the context of affected VS fixed the issue.
![](/skins/images/74119E49EB1AA30407316FFB9151D237/responsive_peak/images/icon_anonymous_message.png)