Management HA & CRL : concerns

Ob1lan · ‎2021-09-23

Hi,

In regard with our project to get rid of our main datacenter and migrate everything to AWS, I've had to find a solution to migrate the Check Point Management Server (cma).

In the effort of having the migration as smooth as possible, I decided to try with the Management HA. So I launched an EC2 instance from the Check Point R81.10 management AMI (BYOL), sized it as needed and configured the Management HA.

So far everything works great, the 'old' management is in constant sync with the new one in AWS, which I made active, and I receive most GW's logs (some need a little kick to start sending logs to the new cma). I'm also able to publish & install policies from that new management server.

From there, I wanted to simulate the decomission of the 'old' management server, and simply issued the cpstop command in there. Then, I decided to reboot some gateways after business hours and see how they would react... Well, the issue is the IPSEC S2S tunnel between those gateways and the central one got stuck in phase 1, and logs showed 'Invalid certificate' errors...

So I suspect the CRLs are unreachable, and/or the gateways still tries to verify it from the 'old' management server, which was inactive. Once I've issued the cpstart command in the 'old' management, everything started to work fine again, the tunnels established alright.

Checking the CRL DP in the certificate, I noticed it's using an URL with a unresolvable name : http://mgmt.company.tld:18264 (redacted).

How come this works if I can't resolve that name internally ?
How can I smoothly decomission the 'old' management without impacting our numerous (40+) IPSEC S2S tunnels from our remote gateways ?

Thanks in advance for your help, much appreciated !

PhoneBoy · ‎2021-09-24

I'm assuming this is because the gateways "know" the ICA is the management server.
When you migrated the management into AWS, what is the main IP on your management server?
Is it an elastic IP or one of the private VPC IPs?

Tomer_Noy · ‎2022-04-28

I know that this is an old thread, but I recently came across a similar issue and I may be able to assist with the solution. If not for your case, then perhaps for future people that come across this post.

Gateways fetch CRLs periodically and if they cannot fetch for over 24 hours, they stop accepting the certificate.

The CRL fetching is done according to the "masters" file on the gateway, which tells it which management machines should control it. This list also determines who to fetch policy from after reboot or upgrade.

By default, the list contains the primary management server. Although it's possible to manually alter this file, it's not recommended or necessary in this case. You can add additional servers per gateway by modifying the list in the "Fetch Policy" page in the gateway editor.

In your case, when you took down the primary, the gateway failed to fetch the CRL and dropped VPN after 24H. If you would have added the other server to the list and pushed policy to the gateway, then the gateway would also try the other server for the CRLs.

The following SK gives more background info and instructions on how to configure:
https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solut...

flachance · ‎2023-04-27

I’m glad I found this post we’ve been struggling with similar issues for awhile now. The SK you’re referring to looks like a good solution.

Would that SK also help with Remote Access VPN? For example, we have site A and Site B. Both sites linked by a VPN tunnel. Primary management is at Site A and secondary management is at site B.

If Site A was to go down (so primary management not accessible), can users do a Remote Access VPN to site B or will it failed because of CRL verification?

Chris_Atkinson · ‎2023-04-27

The following may be helpful if you need to investigate this as an option.

sk21156: Disabling CRL checking when authenticating with certificates

CCSM R77/R80/ELITE

flachance · ‎2023-04-27

Hi Chris,

yes we've followed that SK to disable CRL checking temporarily to solve previous issues. Management is nervous about permanently disabling it for security reasons. Do you know if CheckPoint has recommendations about doing this?

Chris_Atkinson · ‎2023-04-27

Understand, the consideration here is really at what point you declare the original primary management lost and promote secondary management and push policy to update CRL DPs.

CCSM R77/R80/ELITE

PhoneBoy · ‎2023-04-27

We provide the capability to disable CRL checks for troubleshooting purposes or situations where the (Internal) Certificate Authority is not available for a longer period of time due to, say, disaster recovery.
CRLs are an important security feature of a Certificate Authority and disabling CRL checks permanently is definitely not recommended.

Tomer_Noy · ‎2023-04-27

Yes, this SK should help with Remote Access VPN, in a similar way as it does for Site-to-Site VPN.

Are you a member of CheckMates?

Management HA & CRL : concerns