Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Teddy_Brewski
Collaborator

State: Connection with 'fw-vsx-n01' is lost

Hello,

One of the VSs (out of 4), started reporting today:

State: Connection with 'fw-vsx-n01' is lost 

There are no issues reported from the SSH session -- this node is active and handles the load.

cphaprob state, cphaprob -a if, cphaprob -ia list, revealed nothing wrong.

Tried to reboot the management but it didn't help.

Management: R81.20 Take 65

CP VSX: R81.20 Take 90

Thank you.

0 Kudos
8 Replies
Chris_Atkinson
Employee Employee
Employee

What does 'vsx stat -v' show?

 

CCSM R77/R80/ELITE
0 Kudos
Teddy_Brewski
Collaborator

Looks healthy too. 

The one that is complaining about one of the nodes being down is VS5 (fw-vs-cloud):

VSX Gateway Status
==================
Name: fw-vsx-ext-n01
Access Control Policy: fw-vsx-external-vsx
Installed at: 14Jan2025 15:27:58
Threat Prevention Policy: <No Policy>
SIC Status: Trust

Number of Virtual Systems allowed by license: 6
Virtual Systems [active / configured]: 3 / 3
Virtual Routers and Switches [active / configured]: 2 / 2
Total connections [current / limit]: 32393 / 96500

Virtual Devices Status
======================

ID | Type & Name | Access Control Policy | Installed at | Threat Prevention Policy | SIC Stat
-----+-------------------------+-----------------------+-----------------+--------------------------+---------
1 | S fw-vs-test | fw-vs-test-policy | 23Jan2025 16:18 | <No Policy> | Trust
2 | W vsw-ext | <Not Applicable> | | <Not Applicable> | Trust
3 | W vsw-transit | <Not Applicable> | | <Not Applicable> | Trust
4 | S fw-vs-ext | fw-vs-ext-policy | 27Jan2025 11:47 | <No Policy> | Trust
5 | S fw-vs-cloud | fw-vs-cloud-policy | 24Jan2025 11:00 | <No Policy> | Trust

Type: S - Virtual System, B - Virtual System in Bridge mode,
R - Virtual Router, W - Virtual Switch.

 

0 Kudos
AkosBakos
Leader Leader
Leader

Hi @Teddy_Brewski 

  • Does it happen periodically

Maybe the cpd process crashes. 

Have a look at on this: https://support.checkpoint.com/results/sk/sk101484

What does cpwd.elg say?

Akos

----------------
\m/_(>_<)_\m/
0 Kudos
Teddy_Brewski
Collaborator

Hi @AkosBakos 

No, this is the first time it happened.

I checked cpwd.elg on the affected node, and although I see 'did not send keep-alive message for 1 number of times' error messages, none of them are related to CPD, but rather MSGD:

[cpWatchDog 19785 4133372096]@fw-vsx-n01[27 Jan 11:47:49] [ERROR] MSGD (pid=30558) did not send keep-alive message for 1 number of times
[cpWatchDog 19785 4133372096]@fw-vsx-n01[27 Jan 11:49:34] [ERROR] MSGD (pid=30661) did not send keep-alive message for 1 number of times

0 Kudos
AkosBakos
Leader Leader
Leader

Hi @Teddy_Brewski 

The https://support.checkpoint.com/results/sk/sk101484 said:

  • In some scenarios cpwd.elg shows repeatedly:
    [ERROR] CPD (pid_of_cpd) did not send keep-alive message for x number of times
----------------
\m/_(>_<)_\m/
0 Kudos
Teddy_Brewski
Collaborator

Hi @AkosBakos 

Yes, but SK mentions that it's CPD daemon that is reporting the error. In my case it's MSGD -- do you think it's the same?

There are no traces of core dumps in the logs and no high CPU observed.  SIC is also fine. I haven't tried to push the policy on the affected VS though.

0 Kudos
AkosBakos
Leader Leader
Leader

Hi @Teddy_Brewski 

Because the lack of information, I can't say this is the same or not, but there are symptomes which are the same.

In this case, the best thing what you can do to ask the TAC about this issue.

Akos

----------------
\m/_(>_<)_\m/
0 Kudos
Chris_Atkinson
Employee Employee
Employee

Concur, anything of note flagged in HCP ?

Otherwise some suggestions: 

- Attempt policy install

- Restart cpd per sk97638

- Failover / Reboot gateways

- Patch with latest recommended JHF

- Open a TAC case (attach HCP & CPinfo)

CCSM R77/R80/ELITE
0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events