Re: R82 ElasticXL & VSNext Issues

genisis__ · ‎2025-06-23

Built a R82 ElasticXL & VSNext Lab in Proxmox, JHFA installed is Take 25.

- I managed to delete ID0/VS0 which I should never be able to do.

- reassign magg1 from vs500 so in affect knocking at management, again should not be able to do this.

If you have had similar issues please post and hopefully Checkpoint can review and respond.

I had one other issue where I created a virtual gateway through gclish however for some reason I could not connect to the management interface on the VG. So attempted to delete via GUI and then the system crashed. I've logged a TAC case to investigate the crash logs.

the_rock · ‎2025-06-23

Hey,

I assume this is related to R82 jumbo 25 thread?

Andy

Best,
Andy

genisis__ · ‎2025-06-24

Yes Mate.

As advised by Chris just thought I would create a separate thread so it can be used by us all.

the_rock · ‎2025-06-24

Got it. Let us know if you end up doing remote with Matan and how it goes.

Andy

Best,
Andy

matanbe_chkpcp · ‎2025-06-23

Hey,
You're right, we haven't blocked deletion from the WebUI.
magg1 can be reassigned under the customer's responsibility.

I'd like to check the logs for your issue, as we didn’t observe it on our end.
Also, Take 25 is not recommended for VSNext, as we've encountered issues establishing SIC with the management server.

Thanks,

Matan

genisis__ · ‎2025-06-24

Hi Matan,

Happy to jump on a webex with you if you like. I have snapshots so at least I've got to the point I don't need to do a complete rebuild.

Yesterday re-created a virtual gateway and lost connectivity again.
In my mind some of these issues must be to do with setting this up in Proxmox, but thats only a guess. It would be really good to get Checkpoint to actually confirm how we should actually lab this in VMWare Workstation or Proxmox as I think it will help allot of people out with learning and adoption.

matanbe_chkpcp · ‎2025-06-24

Hey,

Currently, we don't support Proxmox as a VM platform — we only support VMware.
I'll send you a private message to schedule a meeting, understand the requirements, and see if we can find a workable solution.

Thanks,
Matan

genisis__ · ‎2025-06-24

Thanks Matan

the_rock · ‎2025-06-24

Hey Matan,

I suppose same applies to eve-ng as well?

Andy

Best,
Andy

PhoneBoy · ‎2025-06-24

We don't do any internal testing on eve-ng that I'm aware of.

Chris_Atkinson · ‎2025-06-24

It's also not supported as outlined in sk181128.

PMTR-107075 ElasticXL ElasticXL Cluster supports only physical Check Point appliances (Virtual Machines or Open Servers are not supported).

CCSM R77/R80/ELITE

the_rock · ‎2025-06-24

Thanks Chris. I somehow missed that in the sk.

Andy

Best,
Andy

genisis__ · ‎2025-06-25

Observation - which is not an issue but something I'm not sure is documented anywhere:
tp_dummy_5 Link encap:Ethernet HWaddr xx:xx:xx:BD:90:7E
inet addr:99.81.112.231 Bcast:0.0.0.0 Mask:255.255.255.255

The above interface appears when turning on TP (I've not turned off the blades to check to see which one actually creates this). The same interface and IP is present on different VSs. Why is this there (especially with a public IP) and what is its function? How is this secured (its not seen in the topology) and what communications requirements does it have to Checkpoint?

genisis__ · ‎2025-06-25

Thanks for jumping on with me Matan. I've pinged over some more observations to you, if they are valid happy to post here as well, if it helps.

genisis__ · ‎2025-06-26

Found another issue (not really sure if it is), when clustering, JHFA applied to active node is replicated to standby node, however in the cluster management section the standby device does not show the JHFA is installed, even if it becomes the active device.

That may be more of a known issue, but thought I would mention it.

matanbe_chkpcp · ‎2025-06-29

Hey,

We are aware of this issue and it will be addressed in the upcoming Jumbos.

I'll review your private messages.

Thanks!

genisis__ · ‎2025-06-29

Great thanks Matan.

Issue I have now is cphaprob reports (Active!P) status, however not entirely sure how I check the interface status on the standby member as you can't access it once its in a cluster?

matanbe_chkpcp · ‎2025-06-29

You can always move between members using "m <site_id>_<member_id>", for example - m 1_2.

What is the error shown on cphaprob stat?

genisis__ · ‎2025-06-29

Here what I see at the moment:

# cphaprob stat

Cluster Mode: HA Over LS

ID Unique Address Assigned Load State Name

1 (local) 192.0.2.1 100% ACTIVE(!P) FW-s01-01
15 192.0.2.15 100% ACTIVE FW-s02-01

Active PNOTEs: IAC

Last member state change event:
Event Code: CLUS-110805
State change: ACTIVE -> ACTIVE(!)
Reason for state change: Incorrect configuration - Local cluster member has fewer cluster interfaces configured compared to other cluster member(s)
Event time: Sun Jun 29 15:10:06 2025

I switched between the two nodes now and checked cphaprob -a if on the different VS's, can't really see any issue.

When issuing m <1_1> or <2_1> is this connect going over the Sync network. I believe it is, so just wanted confirm this is the transport between the two nodes and not the management interfaces?

Question:
How would be monitor both nodes using a NMS? Normally I would just point the NMS to the Mgmt IP of each node, so I assume there must be a way to monitor both nodes using SNMP v3?

Bob_Zimmerman · ‎2025-06-29

@genisis__ wrote:

When issuing m <1_1> or <2_1> is this connect going over the Sync network. I believe it is, so just wanted confirm this is the transport between the two nodes and not the management interfaces?

Yes, it goes over sync. It's a key-based SSH connection:

[Expert@DallasticXL-s01-01:0]# m 2
Moving to member 1_2

This system is for authorized use only.
Last login: Mon Jun  9 21:35:55 2025 from 192.0.2.1
[Expert@DallasticXL-s01-02:0]# who
admin    pts/1        Jun 29 17:28 (192.0.2.1)

[Expert@DallasticXL-s01-02:0]# netstat -anp | grep sshd
tcp        0      0 0.0.0.0:22                  0.0.0.0:*                   LISTEN      20160/sshd          
tcp        0      0 192.0.2.2:22                192.0.2.1:33556             ESTABLISHED 4843/sshd: admin [p 
unix  2      [ ]         DGRAM                    4266141855 4843/sshd: admin [p 
unix  3      [ ]         STREAM     CONNECTED     4266141861 4843/sshd: admin [p 
unix  2      [ ]         STREAM     CONNECTED     4266141839 4843/sshd: admin [p 

[Expert@DallasticXL-s01-02:0]# egrep "^$(date +"%b %e") " /var/log/secure
Jun 29 17:28:56 2025 DallasticXL-s01-02 sshd[4843]: Accepted publickey for admin from 192.0.2.1 port 33556 ssh2: RSA SHA256:+gbwzSST0ECeJLtwXYXcONnH//hQ32wOgoK82WjFekg
Jun 29 17:28:56 2025 DallasticXL-s01-02 sshd[4843]: pam_unix(sshd:session): session opened for user admin by (uid=0)
Jun 29 17:28:56 2025 DallasticXL-s01-02 sudo:    admin : TTY=pts/1 ; PWD=/home/admin ; USER=root ; COMMAND=validate

Incidentally, the private key which authenticates this connection is in /home/admin/.ssh/id_rsa, and it doesn't have a passphrase. Each member of the ElasticXL cluster appears to generate a new RSA key pair when it joins, and the public keys of all members go in /home/admin/.ssh/authorized_keys, which is synchronized to all members, so they all trust each other.

All of this has some pretty significant security implications. For example, if someone has administrative access to an ElasticXL cluster member, they can exfiltrate one of these keys which will ensure they continue to have direct access to the shared user "admin". I'm not yet sure if the keys are used for anything else which could complicate rotating them when an admin leaves the company.

genisis__ · ‎2025-06-29

Ok so know I have both nodes SSH sessions side by side I can see the issue in VS0 and VS5 on the standby node the below interface is listed:

tp_dummy_0 99.81.112.231

The above seems to be added when enabling TP blades, and you can't do anything about it and its not listed as an interface in the topology.

Active:

vsid 5:
------
CCP mode: Automatic

Interface Name: Status:

Sync (S) UP
wrp321 UP
wrp320 UP
wrp322 UP
eth3.40 UP
tp_dummy_5 Non-Monitored

S - sync, HA/LS - bond type, LM - link monitor, P - probing

Virtual cluster interfaces: 7

Standby:

vsid 5:
------

CCP mode: Automatic

Interface Name: Status:

Sync (S) UP
wrp321 UP
wrp320 UP
wrp322 UP
eth3.40 UP
tp_dummy_5 UP

S - sync, HA/LS - bond type, LM - link monitor, P - probing

Virtual cluster interfaces: 6

genisis__ · ‎2025-06-29

Not entirely sure why, but I did a policy push just to see what it would do to the status, and it actually fixed the status issue:

1 (local) 192.0.2.1 100% ACTIVE(P) VWGCOREFW-s01-01
15 192.0.2.15 100% ACTIVE VWGCOREFW-s02-01

Active PNOTEs: None

Last member state change event:
Event Code: CLUS-114904
State change: ACTIVE(!) -> ACTIVE
Reason for state change: Reason for ACTIVE! alert has been resolved
Event time: Sun Jun 29 20:46:06 2025

The interface counts on both nodes remains the same though. I believe the status of "Active(P)" is normal indicating this is the pivot node.

the_rock · ‎2025-06-29

Yup, thats exactly what P means in this context.

Best,
Andy

genisis__ · ‎2025-07-02

Wow that means I have everything working!

Now, any idea what the command is to see Active/Standby status for VSs? I used to just run cphaprob and they would all be listed.

next will take a look at vsls when I get time.

Chris_Atkinson · ‎2025-07-02

Try: "asg stat vs all"

Note in a single-site arrangement there wont typically be active/standby at the "VS" level everything is active.

CCSM R77/R80/ELITE

genisis__ · ‎2025-07-03

The configuration I did was dual site, not single. So that should hopefully mean Active/Standby at the node level and then the VS's (with VSLS) I should be able to move around so that I can distribute across the two sites.

command work Chris.

I attempted to failover from site 1 to site 2 using the following command:

set cluster vsls system primary_site 2

but nothing happened?

the status looks like its been updated:

show cluster vsls system primary_site
2

show cluster vsls member_ratio (default was 50)
100

genisis__ · ‎2025-07-09

Question about licensing:

My trial has expired in my lab, so generated some trial licenses from the UC (All in one Eval), applied gateway license using central licensing.

This license was applied to the active node, the active node then replicated this to the standby node.

I then attempted to push policy and it failed to due licensing. Are there special licenses required for ElasticXL with VSNext? I the standby shows as down in cphaprob, due to license issues as well, both gateway only have initial policy, even though a policy was previously pushed to the active node, so it should realistically retain this.

Will try to look at this again over the next few days, also not had any response to my failover/distribution of VS's question.

Bob_Zimmerman · ‎2025-07-09

Can’t use central licensing with ElasticXL. You have to use local licenses generated to an IP of the member. My lab boxes ended up generating their own licenses to their sync IPs.

emmap · ‎2025-07-09

Generate the eval as a local license to the magg1 IP address for the gateway portion and apply it to the SMO directly using 'g_cplic put'. That'll put it to all SGMs in the group and they'll work happily with it. No need to anything other than 'all-in-one' for VSX.

genisis__ · ‎2025-07-10

Will give that a go.

Are you a member of CheckMates?

R82 ElasticXL & VSNext Issues