How to resize Check Point Cloudguard NGFW High-Ava...

cezar_varlan1 · ‎2023-11-09

Procedure to upgrade Azure CKP Cloudguard firewalls (tested on R80.40)

[1] Connect to SSH on each of the Firewalls in the cluster

[a] Evaluate the cluster is working and take not of the current Primary member

# cphaprob state

# cphaprob roles

ID         Role

1 (local)  Master
2          Non-Master

[b] Check current core count and distribution

# fw ctl get int fwlic_num_of_allowed_cores

# fw ctl multik stat

[c] Check current contents of boot.conf file

# cat /var/opt/fw.boot/boot.conf

[Expert@ckpupgrademe1:0]# cat /var/opt/fw.boot/boot.conf
CTL_IPFORWARDING        1
DEFAULT_FILTER_PATH     /etc/fw.boot/default.bin
KERN_INSTANCE_NUM       3
COREXL_INSTALLED        1
KERN6_INSTANCE_NUM      2
IPV6_INSTALLED  0
CORE_OVERRIDE   4

[2] Connect to portal.azure.com and locate the Firewall VM’s. Proceed to stop the standby member identified at step [1][a] as Non-Master or Standby.

[3] On the Firewall VM’s properties - Browse to Settings > Size

Pick the new VM size of choice:

In our case we need 16 cores

Reference: https://support.checkpoint.com/results/sk/sk109360#Pricing%20in%20Azure%20Marketplace

[4] Press the Resize button

[5] Once the machine is resized you can Start the VM and check cluster status

Browse to Overview and press the Start button.

Once the VM has started check cluster status and core distribution

[Expert@ckpupgrademe2:0]# cphaprob role
ID         Role

1          Master
2 (local)  Non-Master

[Expert@ckpupgrademe2:0]# cphaprob state

Cluster Mode:   High Availability (Active Up) with IGMP Membership

ID         Unique Address  Assigned Load   State          Name

1          10.8.1.5        100%            ACTIVE         CKPUPGRADEME1
2 (local)  10.8.1.6        0%              STANDBY        CKPUPGRADEME2

Active PNOTEs: None

Last member state change event:
   Event Code:                 CLUS-114802
   State change:               DOWN -> STANDBY
   Reason for state change:    There is already an ACTIVE member in the cluster (member 1)
   Event time:                 Tue Nov  7 11:15:18 2023

Cluster failover count:
   Failover counter:           0
   Time of counter reset:      Tue Nov  7 03:07:17 2023 (reboot)

[Expert@ckpupgrademe2:0]#
[Expert@ckpupgrademe2:0]# fw ctl multik stat
ID | Active  | CPU    | Connections | Peak
----------------------------------------------
 0 | Yes     | 15     |          22 |       39
 1 | Yes     | 14     |          31 |       41
 2 | Yes     | 13     |          34 |       41
[Expert@ckpupgrademe2:0]#

Looks like the second member still has only 3 FW workers. But it has all the 16 cores.

[Expert@ckpupgrademe2:0]# cat /proc/cpuinfo  | grep proc
processor       : 0
processor       : 1
processor       : 2
processor       : 3
processor       : 4
processor       : 5
processor       : 6
processor       : 7
processor       : 8
processor       : 9
processor       : 10
processor       : 11
processor       : 12
processor       : 13
processor       : 14
processor       : 15

[6] Check the contents of the boot.conf file.

[Expert@ckpupgrademe2:0]# cat /var/opt/fw.boot/boot.conf

CTL_IPFORWARDING        1
DEFAULT_FILTER_PATH     /etc/fw.boot/default.bin
KERN_INSTANCE_NUM       3
COREXL_INSTALLED        1
KERN6_INSTANCE_NUM      2
IPV6_INSTALLED  0
CORE_OVERRIDE   16

This means that it sees all the cores but limits number of Kernel Instances to 3. If the two cluster members have different core numbers you would have issues with cluster sync. We do now have full cluster functionality even if the VMs are of different sizes so we can switch this member to active and perform the same steps before finally editing the boot.conf file.

[7] Switchover the cluster from Primary to Standby.

[a] Connect via SSH to the current Primary [Active] member

[b] Run the “# clusterXL_admin down” command

[c] Confirm that the cluster has been switched over

[8] Perform steps [1] -> [6] on the new Standby member.

[9] Edit boot.conf and make sure to correct the number of KERN_INSTANCE_NUM

[Expert@ckpupgrademe2:0]# cat /var/opt/fw.boot/boot.conf
CTL_IPFORWARDING        1
DEFAULT_FILTER_PATH     /etc/fw.boot/default.bin
KERN_INSTANCE_NUM       14
COREXL_INSTALLED        1
KERN6_INSTANCE_NUM      2
IPV6_INSTALLED  0
CORE_OVERRIDE   16

[10] Reboot the firewall to apply the changes. Now check the number of cores

[Expert@ckpupgrademe1:0]# fw ctl multik stat
Kernel fw_0: CPU 15
Kernel fw_1: CPU 14
Kernel fw_2: CPU 13
Kernel fw_3: CPU 12
Kernel fw_4: CPU 11
Kernel fw_5: CPU 10
Kernel fw_6: CPU 9
Kernel fw_7: CPU 8
Kernel fw_8: CPU 7
Kernel fw_9: CPU 6
Kernel fw_10: CPU 5
Kernel fw_11: CPU 4
Kernel fw_12: CPU 3
Kernel fw_13: CPU 2
Daemon cprid: CPU 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Daemon mpdaemon: CPU 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Daemon fwd: CPU 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Daemon in.asessiond: CPU 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Daemon lpd: CPU 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Daemon core_uploader: CPU 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Daemon cprid: CPU 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Daemon cpd: CPU 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Interface enP38308p0s2: has multi queue enabled
Interface enP47606p0s2: has multi queue enabled

[11] At this point the cluster would not work as one member has more FW instances than the other. Connect to the Primary member and perform the edit on boot.conf and reboot. Once the Primary member is rebooted, the remaining cluster member with modified core **bleep** will now become primary and once the second member comes back online the cluster will be functioning normally.

[12] Push policy and check logs to confirm normal operation.

[13] Futher optimization

[a] Edit Affinity $FWDIR/conf/fwaffinity.conf and allocate cores to specific interfaces. Also keep in mind you need to allocate at least one core to FWD for heavy logging.
Note: You should not go over the total VM core limit. If you are allocating cores to SND and FWD decrease the Kernel instance number to provide enough usable cores and not oversubscribe.

Shay_Levin · ‎2023-11-09

Nice

Thank you for sharing

Chris_Atkinson · ‎2023-11-09

Are you a member of CheckMates?

How to resize Check Point Cloudguard NGFW High-Availability Cluster in Azure