Solved: R82 elasticXL lab

the_rock · ‎2024-07-01

Hey boys and girls, ladies and gents,

I built R82 elasticXL lab and though I followed below link by @HeikoAnkenbrand , not sure if I cant make it work cause Im using eveNG or for what reason, but I created 2 separate elasticxl instances, but clustering part fails for some reason, so if anyone has an idea, happy to hear it 🙂

I could care less if this lab breaks, its super easy to rebuid anyway.

This is the link I was referring to. I also attached some screenshots and outputs.

Andy

https://community.checkpoint.com/t5/Security-Gateways/R82-Install-ElasticXL-Cluster/td-p/206235

[Expert@CP-EXL-1-s01-01:0]# cphaprob state

Cluster Mode: HA Over LS

ID Unique Address Assigned Load State Name

1 (local) 192.0.2.1 100% ACTIVE(P) CP-EXL-1-s01-01

Active PNOTEs: None

Last member state change event:
Event Code: CLUS-114904
State change: ACTIVE(!) -> ACTIVE
Reason for state change: Reason for ACTIVE! alert has been resolved
Event time: Mon Jul 1 19:40:49 2024
[Expert@CP-EXL-1-s01-01:0]#

[Expert@CP-EXL-02-s01-01:0]# asg monitor
Mon Jul 01 20:44:20 EDT 2024

^C
[Expert@CP-EXL-02-s01-01:0]#

[Expert@CP-EXL-02-s01-01:0]# cphaprob -a if

CCP mode: Automatic

Interface Name: Status:

eth2 UP
eth3 UP
Sync (S) UP
magg1 (LS) UP

S - sync, HA/LS - bond type, LM - link monitor, P - probing

[Expert@CP-EXL-1-s01-01:0]#
[Expert@CP-EXL-1-s01-01:0]# cphaprob -a if

CCP mode: Automatic

Interface Name: Status:

eth2 UP
eth3 UP
Sync (S) UP
magg1 (LS) UP

S - sync, HA/LS - bond type, LM - link monitor, P - probing

Virtual cluster interfaces: 5

lo 127.0.0.1
eth2 192.168.10.238
eth3 169.254.0.238
Sync 192.0.2.1
magg1 172.16.10.238

[Expert@CP-EXL-1-s01-01:0]#

Virtual cluster interfaces: 5

lo 127.0.0.1
eth2 192.168.10.237
eth3 169.254.0.237
Sync 192.0.2.1
magg1 172.16.10.237

[Expert@CP-EXL-02-s01-01:0]#

And since elasticxl cluster object does NOT have an option to add cluster members, there is something obvious Im missing, but cant figure out what, so will check it later.

Andy

ShaiF · ‎2024-07-02

ok. since it's not officially supported product (the env-gn) do this WA:

From expert on the new member
1. vi /opt/ElasticXL/exl_detection/src/exl_detectiond.py

change this line

if __machine_info.sync_ifn != 'Sync' and not __machine_info.is_vmware and not __machine_info.is_kvm:
to this:

if False: #__machine_info.sync_ifn != 'Sync' and not __machine_info.is_vmware and not __machine_info.is_kvm

2. run this:
dbset process:exl_detectiond t

dbset :save

tellpm process:exl_detectiond t

See ifconfig will show you eth1 as 192.0.2.254

try ping 192.0.2.1 and from SMO ping 192.0.2.254

View solution in original post

ShaiF · ‎2024-11-10

HI @Yasushi_Kono1

If we're talking on VM then the best solution is to go to your VM setting and edit the network adapters.

there is also option to edit this file on the gw (per member):
/etc/sp_core/conf/vm_mapping.csv
so in your case content will be:
eth0 Mgmt
eth1 eth1
eth2 eth2

eth3 eth3

eth4 eth1-Sync

Regards,

Shai.

View solution in original post

emmap

I got my hands on a pair of 5100s that I have built into an EXL cluster, it's also not supported as they don't have Sync interfaces but with the info in this thread and some other trickery I made it work. I figured I'd document it here for posterity with the usual caveat - this is for educational purposes only and is not supported in an production environment.

Once R82 is fresh installed on the appliances, log in and get to expert mode. You need to edit the udev rules file, same as with a 3X00, but it's a bit more complicated because of the expansion card slot in the appliance. It'll look very different to a VM or a 3X00 - basically, delete what is in there and put this in:

ID=="0000:03:00.0", NAME="eth1"
ID=="0000:04:00.0", NAME="eth2"
ID=="0000:05:00.0", NAME="eth3"
ID=="0000:06:00.0", NAME="eth4"
ID=="0000:07:00.0", NAME="Sync"
ID=="0000:08:00.0", NAME="Mgmt"

Those ID line are PCI bus addresses for interfaces on a 5100, you can check what they are on your system by using the command 'lspci'. Those lines in there will rename the existing eth5 interface as Sync and keep the rest as-is.

The next step is to turn off the 'see if we have a line card installed' code. Edit the /etc/appliance_config.xml file and change the line <loop>yes</loop> to <loop>no</loop> - this file is not writable, so chmod it to +w before you edit it and then -w after.

Next you have to set the EXL detection like this:

dbset process:exl_detectiond t
dbset :save

Reboot the appliance. It will come up without the Sync interface enabled as it didn't exist in the config before. Set it to state 'on' in clish. I think at this stage you should be able to run the FTW on your gateway 1, but in my case I powered them down and racked them at this point, so there was effectively another reboot.

At this point I could build the cluster per normal steps, except that one of the 5100s has an -HA license on it. Due to this being an online install, the appliance constantly fetches this license and applies it. This license being applied breaks policy install, because the appliance is not in a CXL cluster - even with an eval on there, it refuses to take a policy install. Unfortunately for me, I lost the coin toss and this appliance as initially my gateway 1_1, which meant I could not actually build the cluster, I had to rebuild it again while swapping over to setting the other appliance as my gateway 1_1. As it is now, every time gateway 1_2 boots up it goes down due to not taking the policy. I can try to get it working again by:

Making sure I disable accelerated policy install
Pulling the policy file from the SMO (cpha_blade_config pull_config policy 192.0.2.1)
Deleting the appliance license
Doing a 'fw fetch [mgmt server ip]'.
Validate with asg_policy verify
See that the policy install times are different but the policy signature is the same
Flail about with repeating the above steps
Give up because it seems to be working ok for now and it's lunch time

So keep all that in mind if you're planning on using existing cluster gateways that were purchased with the old -HA licensing in your new EXL cluster. It's a pain in the neck.

View solution in original post

emmap · ‎2024-07-01

They need to be able to see each other over their Sync links (and it needs to have LLDP working as far as I know) and the second one should not be SIC'd to the management server as its own separate cluster if you want them to both be part of the same EXL gateway.

the_rock · ‎2024-07-01

Thank you. I may wipe out exl-02 tomorrow, re-crerate it again and see if I can sync them properly.

Andy

the_rock · ‎2024-07-01

I see that sync IPs are not pingable from either member, so thats 100% the issue. I will talk to one of my colleagues this week to see best way to make this work in eve-ng, as for regular cluster, its pretty simple, but same method does not work for eslasticxl sadly.

Andy

the_rock · ‎2024-07-02

FWIW, here is what I see on the FIRST one I installed:

[Expert@CP-EXL-1-s01-01:0]# cphaprob -a if

CCP mode: Automatic

Interface Name: Status:

eth2 UP
eth3 UP
Sync (S) UP
magg1 (LS) UP

S - sync, HA/LS - bond type, LM - link monitor, P - probing

Virtual cluster interfaces: 5

lo 127.0.0.1
eth2 192.168.10.238
eth3 10.254.10.238
Sync 192.0.2.1
magg1 172.16.10.238

[Expert@CP-EXL-1-s01-01:0]#

Then, 2nd one, which is not tied to the mgmt server, though for some odd reason. eth2 and 3 dont show up, though they definitely are enabled and on.

[Expert@CP-EXL-2-s01-01:0]# cphaprob -a if

CCP mode: Automatic

Interface Name: Status:

Sync (S) UP
magg1 (LS) UP

S - sync, HA/LS - bond type, LM - link monitor, P - probing

Virtual cluster interfaces: 5

lo 127.0.0.1
eth2 192.168.10.237
eth3 10.254.10.237
Sync 192.0.2.1
magg1 172.16.10.237

[Expert@CP-EXL-2-s01-01:0]#

the_rock · ‎2024-07-02

The only way to show same interfaces on 2nd member is to connect it to the mgmt server and install the policy, but that still does not change the fact cluster member cant be added to the 1st gateway.

I will check with our SE if this is expected or if there is any way to make this work with eve-ng.

Andy

[Expert@CP-EXL-2-s01-01:0]# cphaprob -a if

CCP mode: Automatic

Interface Name: Status:

eth2 UP
eth3 UP
Sync (S) UP
magg1 (LS) UP

S - sync, HA/LS - bond type, LM - link monitor, P - probing

Virtual cluster interfaces: 5

lo 127.0.0.1
eth2 192.168.10.237
eth3 10.254.10.237
Sync 192.0.2.1
magg1 172.16.10.237

[Expert@CP-EXL-2-s01-01:0]#

Yair_Shahar · ‎2024-07-02

Hi,

Once you have single member configure + configured as single gateway on smart console + SIC + Install Policy - then you can add other member to this ElasticXL cluster by WebUI or gclish > add cluster member.... (other members on same sync should be visible there)

the_rock · ‎2024-07-02

Hey @Yair_Shahar

Thanks a lot for helping with this, I greatly appreciate it.

Just for the context, I followed EXACT process that Heiko had in the post I referenced, but I have a feeling something the way eve-ng works might be the issue here. So, below are things I tested:

1) followed Heiko's link, but got below message when trying to add:

[Global] CP-EXL-1-s01-01> add cluster member method request-id identifier 6e3077466f10d3d99db1f62254297612 site-id 1 format json
{
"response": 401,
"body": {
"message": "No info for request-id with value 6e3077466f10d3d99db1f62254297612",
"errors": "",
"code": "generic_error"
}
}
[Global] CP-EXL-1-s01-01>

2) I then reinstalled 2nd member, exact same issue

3) Once I connect 2 member to smart console and push policy, I see shows same sync, but I have NO CLUE where it comes from. Sorry, Im totally ignorant if you will when it comes to maestro, I know very basics of it, so apologies if these comments Im making sound stupid, but I see same thing on both members and as I mentioned to emmap, ONLY once both are connected to mgmt server, can I see same via cphaprob state, see below.

Andy

member 1:

[Expert@CP-EXL-1-s01-01:0]# cphaprob -a if

CCP mode: Automatic

Interface Name: Status:

eth2 UP
eth3 UP
Sync (S) UP
magg1 (LS) UP

S - sync, HA/LS - bond type, LM - link monitor, P - probing

Virtual cluster interfaces: 5

lo 127.0.0.1
eth2 192.168.10.238
eth3 10.254.10.238
Sync 192.0.2.1
magg1 172.16.10.238

[Expert@CP-EXL-1-s01-01:0]#

member 2:

[Expert@CP-EXL-2-s01-01:0]# cphaprob -a if

CCP mode: Automatic

Interface Name: Status:

eth2 UP
eth3 UP
Sync (S) UP
magg1 (LS) UP

S - sync, HA/LS - bond type, LM - link monitor, P - probing

Virtual cluster interfaces: 5

lo 127.0.0.1
eth2 192.168.10.237
eth3 10.254.10.237
Sync 192.0.2.1
magg1 172.16.10.237

[Expert@CP-EXL-2-s01-01:0]#

web UI shows same for both:

Yair_Shahar · ‎2024-07-02

it seems like you did FTW on the second member as well.

in ElasticXL - FTW should run only on first member (AKA SMO), rest of the members should be just installed without any additional direct step on them.

the_rock · ‎2024-07-02

Hey Yair,

Not sure what FTW means in this context, but keep in mind, when I did this yesterday, I litereally powered on R82 image, did NOT go through first time wizard, then tried adding that member on 1st cluster member, failed with error I gave. Are you saying I should install it again, go through wizard and NOT select part of elastic XL or do it totally different way?

Andy

the_rock · ‎2024-07-02

I think I get it now, sorry, not great with some abbreviations lol. I think FTW in this context means first time wizard, which I did NOT run yesterday when I did this, but again, error was exactly the same when trying to add a cluster member.

Andy

ShaiF · ‎2024-07-02

LLDP is not necessary. They are communicating using UDP broadcast packet over the Sync network

the_rock · ‎2024-07-02

@ShaiF @Yair_Shahar

Just to make sure I got this right. So, should I delete 02 member from smart console, wipe it out, and then reinstall, go through wizard and NOT select part of elasticxl or select it and then try sync it?

Because again, I followed exact process Heiko gave in his initial link when adding a cluster member, it failed with error I provided and this was WITHOUT doing any initial config through web UI.

Andy

[Global] CP-EXL-1-s01-01> add cluster member method request-id identifier 6e3077466f10d3d99db1f62254297612 site-id 1 format json
{
"response": 401,
"body": {
"message": "No info for request-id with value 6e3077466f10d3d99db1f62254297612",
"errors": "",
"code": "generic_error"
}
}
[Global] CP-EXL-1-s01-01>

Yair_Shahar · ‎2024-07-02

not exactly

wipe out the second member, and reinstall it - that's it - do not run any FTW on it

the_rock · ‎2024-07-02

Right, which I did twice already and no matter what I do, always get below message : - (

Andy

[Global] CP-EXL-1-s01-01> add cluster member method request-id identifier 6e3077466f10d3d99db1f62254297612 site-id 1 format json
{
"response": 401,
"body": {
"message": "No info for request-id with value 6e3077466f10d3d99db1f62254297612",
"errors": "",
"code": "generic_error"
}
}
[Global] CP-EXL-1-s01-01>

ShaiF · ‎2024-07-02

before you add member. run from gclish

> show cluster info provision

and see you see in output the other member in REQUEST_TO_JOIN state. i you do not see it you have issues on your Sync network (is it VM)?

only once you see it you can add it to the cluster

the_rock · ‎2024-07-02

Correct, its eve-ng platform. I just find it odd, as I never had sync issues with regular cluster in it, but this is obviously different. Give me 15-20 mins and I will update the thread.

Andy

the_rock · ‎2024-07-02

Sadly, still the same, BUT, since Im very persistent dude, I want to leave it in broken state, so can be fixed.

Andy

ShaiF · ‎2024-07-02

ok. now your second member is in clean install (we can see by the prompt)

since it kind of vm (but not vmware) so i guess we will need to do some WA.

Please share output of (from new member)

1. ifconfig -a

2. ps auxww | grep exl_detectiond

the_rock · ‎2024-07-02

Kind of vm, right, its eve-ng, so its considered vm, but not like say regular esxi. Btw, ONLY interface configured is eth0 with 192.168.1.1 IP, no static route, nothing yet.

Andy

ShaiF · ‎2024-07-02

ok. since it's not officially supported product (the env-gn) do this WA:

From expert on the new member
1. vi /opt/ElasticXL/exl_detection/src/exl_detectiond.py

change this line

if __machine_info.sync_ifn != 'Sync' and not __machine_info.is_vmware and not __machine_info.is_kvm:
to this:

if False: #__machine_info.sync_ifn != 'Sync' and not __machine_info.is_vmware and not __machine_info.is_kvm

2. run this:
dbset process:exl_detectiond t

dbset :save

tellpm process:exl_detectiond t

See ifconfig will show you eth1 as 192.0.2.254

try ping 192.0.2.1 and from SMO ping 192.0.2.254

the_rock · ‎2024-07-02

Hey Shai,

No sweat honestly, if its not supported, lets leave it alone, I dont like to waste time on unsupported things, plus, its not fair to you guys either. I tried, but no luck.

Thanks again for everything.

Andy

the_rock · ‎2024-07-02

Btw, wanted to say, I left first one I created there for testing, so thats totally fine.

Andy

the_rock · ‎2024-07-11

Hey @ShaiF

Thanks again so much for this process, I did make it work doing so, I suppose I missed a character or something else first time I did it.

Really grateful for the help!

Andy

Arne_Boettger · ‎2024-07-14

Thanks for sharing this workaround - I would appreciate if there was a sincere warning for unsupported deployments, but an obvious way to install it anyway for labs and test environments.

In addition: I needed to apply the Workaround to BOTH gateways to be able to establish a cluster in my Proxmox lab.

the_rock · ‎2024-07-14

That would be nice warning, agree.

root · ‎2024-10-18

Thank you for the solution and it works for me in eve-ng

Few tips for running in eve ng

- eve ng template can't be use for the member fw, that will keep showing the same request id all the new Member and it will fail during the cluster add, you have to install from ISO separately for the new members.

- the python trick (from ShaiF), need to run on both members and SMO.

Finally It's nice to see ElasticXL running in my eve-ng.

the_rock · ‎2024-10-19

Nice to know, thanks for that!

Andy

genisis__ · ‎2025-02-10

Just trying to build R82 in ESXi and getting the following message:

"Verification failed: (0x1A) Security Violation" - We used Redhat 8 as the OS flavor.

Has anyone seen this before?

Bob_Zimmerman · ‎2025-02-10

"Verification failed: (0x1A) Security Violation" is an error from UEFI which means Secure Boot is enabled, but the bootloader you are trying to boot does not have a signature or was not signed by a key in the UEFI trust keystore. I don't believe Check Point supports Secure Boot at this time, so you probably just need to disable it in the VM settings. Not sure where the setting is in ESXi, but it should be near where you select whether you want the VM to have a BIOS or UEFI boot ROM.

Are you a member of CheckMates?

R82 elasticXL lab