Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
lucafabbri365
Collaborator

Anti-Bot & Anti-Virus, IPS update error on Standby Member

Jump to solution

Anti-Bot & Anti-Virus and/or IPS on Check Point (R80.20) standby node report error "Error: Update failed. Contract entitlement check failed. Could not reach 'updates.checkpoint.com'..." while updating.

Details

1. From standby node - Gaia web console => "Check for Updates", I get the error: "Could not connect to the Check Point Cloud. Check your connection settings..."

2. From standby node, tests from SSH (sk83520) :

- curl_cli -v -k https://updates.checkpoint.com/ => most of the time it doesn't work (timeout); sometimes it works.
- curl_cli to any other URL => most of the time it doesn't work (timeout), sometimes it works.
- ping public FQDN => most of the time it doesn't work (timeout), sometimes it works.
- On active node => it works, always.

3. From standby node, I can reach Internet gateway, and the other active node => no internal communication issues.

4. Already verified and applied sk43807 (all points with the exception of point 4).
fwha_forw_packet_to_not_active parameter is enabled on both nodes.

5. Licenses are OK (sk98665); with the exception of command cpstat antimalware -f update_status that is returning the error below (the same I'm seeing from SmartConsole):

AB Update status: up-to-date
AB Update description: Gateway is up to date.
Database version: 1906061756.
Package date: Thu Jun 6 11:00:00 2019
AB Next update description: The next update will be run as scheduled.
AB DB version: 1906061756
AV Update status: failed AV Update description: Update failed. Contract entitlement check failed. Could not reach "updates.checkpoint.com". Check proxy configuration on the gateway. AV Next update description: The next try will be within one hour.
AV DB version: 1906070837

I already read these CheckMates posts:

Update failed. Contract entitlement check failed

Problem accessing standby cluster member from non-local network

Any advice ?
 
Thank you very much,
Luca
1 Solution

Accepted Solutions
Pieter_van_Stok
Participant

Hi Luca,

 

Not sure if you're still experiencing these errors, but they are solved in JHF 103.

See sk158312 

This problem was fixed. The fix is included in:

Regards,

 

Pieter

View solution in original post

11 Replies
Kim_Moberg
Advisor

Which jumbo take do you run on this cluster?
Can you ping anything on the internet from standby member?

suspect is could be related to a bug discussed in this post.

https://community.checkpoint.com/t5/General-Topics/R80-20-Issue-Monitoring-standby-cluster-members-v...

Possible install R80.20 jumbo take 80 could help you out here.

 

Best Regards
Kim
lucafabbri365
Collaborator

Hello Kim,
thank you for your reply.

Well, we have Take_47 installed (latest General Availability release). Take_80 is not in General Availability yet.

Ping from standby member to the Internet seems to work with a strange behavior: most of the time I have to wait 10 seconds before getting the answer since I run the command; some other it answers immediately. The active member doesn't have this issue.

Example

[Expert@Firewall01:0]# ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.

***WAIT 10 seconds***

64 bytes from 8.8.8.8: icmp_seq=6 ttl=57 time=16.9 ms
64 bytes from 8.8.8.8: icmp_seq=7 ttl=57 time=16.8 ms
64 bytes from 8.8.8.8: icmp_seq=24 ttl=57 time=16.9 ms
64 bytes from 8.8.8.8: icmp_seq=32 ttl=57 time=16.8 ms
64 bytes from 8.8.8.8: icmp_seq=34 ttl=57 time=16.8 ms

It's very strange issue.

However, I'm going to open an SR for this.

Bye,
Luca

lucafabbri365
Collaborator

Hello,
here an update.

I opened an SR and support suggested to install Take 80 they provided. The issue was resolved however a new problem appeared; a lot of these entries inside /var/log/messages:

Jun 10 17:33:52 2019 FIREWALL01 last message repeated 2 times
Jun 10 17:33:53 2019 FIREWALL01 kernel: [fw4_3];fwmutlik_do_sequence_accounting_on_entry: bad dir -1 (gconn_segment=8 flags=1 opcode=15)
Jun 10 17:33:53 2019 FIREWALL01 kernel: [fw4_3];fwmutlik_do_sequence_accounting_on_entry: bad dir -1 (gconn_segment=2 flags=1 opcode=15)
Jun 10 17:33:53 2019 FIREWALL01 kernel: [fw4_3];fwmutlik_do_sequence_accounting_on_entry: bad dir -1 (gconn_segment=8 flags=1 opcode=15)
Jun 10 17:33:53 2019 FIREWALL01 kernel: [fw4_3];fwmutlik_do_sequence_accounting_on_entry: bad dir -1 (gconn_segment=2 flags=1 opcode=15)
Jun 10 17:33:53 2019 FIREWALL01 kernel: [fw4_3];fwmutlik_do_sequence_accounting_on_entry: bad dir -1 (gconn_segment=8 flags=1 opcode=15)
Jun 10 17:33:53 2019 FIREWALL01 kernel: [fw4_3];fwmutlik_do_sequence_accounting_on_entry: bad dir -1 (gconn_segment=2 flags=1 opcode=15)
Jun 10 17:33:53 2019 FIREWALL01 kernel: [fw4_3];fwmutlik_do_sequence_accounting_on_entry: bad dir -1 (gconn_segment=8 flags=1 opcode=15)
Jun 10 17:33:53 2019 FIREWALL01 kernel: [fw4_3];fwmutlik_do_sequence_accounting_on_entry: bad dir -1 (gconn_segment=2 flags=1 opcode=15)
Jun 10 17:33:53 2019 FIREWALL01 kernel: [fw4_3];fwmutlik_do_sequence_accounting_on_entry: bad dir -1 (gconn_segment=8 flags=1 opcode=15)
Jun 10 17:33:53 2019 FIREWALL01 kernel: [fw4_3];fwmutlik_do_sequence_accounting_on_entry: bad dir -1 (gconn_segment=8 flags=1 opcode=15)
Jun 10 17:33:53 2019 FIREWALL01 kernel: [fw4_3];fwmutlik_do_sequence_accounting_on_entry: bad dir -1 (gconn_segment=2 flags=1 opcode=15)
Jun 10 17:33:53 2019 FIREWALL01 kernel: [fw4_3];fwmutlik_do_sequence_accounting_on_entry: bad dir -1 (gconn_segment=8 flags=1 opcode=15)

They are aware about this. It appears in some Check Point environments, and, in some of them it caused an impacts.

Bye,
Luca

0 Kudos
Pieter_van_Stok
Participant

Hi Luca,

 

Not sure if you're still experiencing these errors, but they are solved in JHF 103.

See sk158312 

This problem was fixed. The fix is included in:

Regards,

 

Pieter

KonstantinosT
Participant

Hello,

We experience the same problem in R81 Cloudguard IaaS with the Standby Node. CheckPoint support recommended to run sk93204 but it didn't resolve the issue.

rogo
Employee Alumnus
Employee Alumnus

Hello KT,

sk93204 is not relevant to R81 since the required changes are already enabled in R80.40 and later versions by default.

What is the exact experience and are there any drop logs?

KonstantinosT
Participant

Hello Rogo,

We have another CloudGuard IaaS deployment in R80.40 Take87 (latest build). In that one, since last June when we first experienced the same issue (I don't recall exactly the Take) with the help of support we modified the fwkern.conf file and we added the below arguments in order to make it work.Since that change the  behavior of the standby node is stable.

[Expert@ck_lanfw1:0]# cat fwkern.conf
fwha_cluster_hide_active_only=0
ccl_force_use_ccp=1

 

For both R80.40 and R81 deployments, in the Gateway Cluster Properties  - ClusterXL and VRRP  we do not enable the "Use Virtual MAC"

For R81, there's an active service request. SR#6-0002435699

The behavior from the standby node in R81, is that we can we cannot even ping the default gateway (10.0.10.1) From the logs, I don't see any drops, and the security policy allow both nodes lanfw1 (.10.3), lanfw2 (.10.4) and the cluster (10.2) to communicate to internet. From the active node, which works fine, there's a NAT that translates into the cluster VIP ( 10.0.10.2)

 

 

 

 

0 Kudos
rogo
Employee Alumnus
Employee Alumnus

 fwha_cluster_hide_active_only=0 means that the standby will try to reach the internet directly, not via Active (pre-R80.40 behavior). The problem with that is that requests are handled by standby while replies are processed by active (because of the VIP). Due to this asymmetry, the connection may be dropped by IPS protections on either active or standby, because both see only half of the connection. fwha_cluster_hide_active_only=1 (default) means that both the request and the reply are handled by the active member, solving the problem.

ccl_force_use_ccp=1 means the packets are corrected between the members encapsulated in CCP. Normally it should not be required and when used as a workaround could mask the real underlying issue, like no connectivity on some interfaces between the members. CCP encapsulation is only required for geographical clusters, where it's enabled automatically.

I suggest to disable both flags and test. If still doesn't work, we will debug it in order to find the root cause. You can ask in the ticket to involve me (Michael Rogovin). When we find the root cause, we can then share it with the forum.

KonstantinosT
Participant

Hello,

I have promised to reply back to this case. Since R81 Take 23 was on GA , I have installed it on a LANFW ( CloudGuard IaaS) cluster. Unfortunately the problem remains. Originally, in SR#6-0002435699, CP support has responded that R&D was looking at it and that they have opened PMTR-53064 to be included in the hotfix, which at that time was still ongoing Take 10.

Since the problem isn't resolved, I requested a new SR to be linked to the previous one, and awaiting further communication from CP Support. In order to overcome, standby node connectivity issues we changed again the kernel int settings as above.

 

0 Kudos
Kim_Moberg
Advisor

Hi Michael

I have the exact issue as being described above. 

I tried to use fwha_forward to not active and cluster hide but nothing helped.

I remeber you helped me on a TAC case to access stand-by member over vpn which was solved in R80.40

Now I have 3 IaaS cluster were stand-by member cannot uodate any Check Point services and red X while showing gateways overview in SMS because something is wrong.

I doesnt seems like TAC is trying to solve this issue. I created my TAC 2-3 weeks a go and still newer updates in regards to this issue.

Best Regards
Kim
0 Kudos
Kim_Moberg
Advisor

Hi Michael,
After long debug with TAC we managed to solve it.

Both cluster members had to have the following kernel parametres enabled in fwkern.conf on R81 VSEC gateways

fwha_forw_packet_to_not_active=1
ccl_force_use_ccp=1

Now it is working perfectly.

 

Best Regards
Kim
0 Kudos