Re: R80.40 policy install diff changes breaks the ...

Bogdan_Tatomir1 · ‎2020-08-14

Hello,

It looks like R80.40 introduces a new policy install "differential" or "incremental" only way of pushing policy to the gateways. While this was announced for R80.10 and never made it in, we believe it was silently introduced in this new version.

Unfortunately this creates far more issues that it helps with policy installation speeds.

We currently have multiple critical SRs open and support is clueless on how to fix this.

A few examples of what is happening:

- IPS exceptions under the "global exceptions" are not pushed or only sometimes pushed : the IPS exceptions file is missing from the gateways after policy install

- converting a gateway from single gateway to cluster results in the cluster dropping its own traffic because the implied rules (which supposedly allow that traffic out) are not updated after cluster policy install (e.g. gateway had IP 1.2.3.4; after cluster member 1 has 1.2.3.5, member 2 has 1.2.3.6, and interface VIP is 1.2.3.4 - all traffic NAT-ed to src 1.2.3.4 (former single IP) is allowed outbound; all traffic originated on the local FWs - for example AV updates, https categorization, etc, is dropped on cleanup rule and is clearly seen on fw ctl zdebug drop as matching cleanup rule)

We feel that for the past few years we are constantly doing QA for CheckPoint, as we keep having constant basic issues with every major upgrade.

We would surely appreciate a better tested GA version when this comes out.

Is there any attention we can get on this? Can we tag @PhoneBoy @Dorit_Dor and look into this with priority ?

/rant out

Edit: rephrased some unnecessary negative attitude in the message.

Dorit_Dor · ‎2020-08-14

Thank you for reporting issues but your assumption is incorrect. The policy does not operate by diff. You are probably facing a specific different issue causing the issues you described. Since we are not aware of any degradation regarding management of R80.40 with the recommended Jumbo's and certainly not something that will cause what you describe, please provide more data so that we can trace the issue and share root cause. Can you share more details? Is there service request open that you can share in private?

More generally, we are committed to quality and we have many thousands of management servers running R80.40 (plus more multi domains than previous releases) without degradations and issues. We appreciate the open dialog with Checkmates and we leverage this dialog to improve quality and adjust our roadmap.

Thank you all for your ongoing feedback

Dorit

Bogdan_Tatomir1 · ‎2020-08-14

Hello,

Thank you for casting your attention to this issue.

The problems seem related to gateways that were in-place upgraded from R80.10 to R80.40, and are reproducible consistently.

We never had these issues when we were on R80.10 (we had plenty others).

Thank you for annulling our assumptions on policy install delta/diff/incremental, that's at least one step forward. Nevertheless, most of the time it seems that during policy install files being pushed to the gateways are missing or incomplete, which triggered our initial thought.

I do have 3 SR numbers I can share. Should this be directly to you or someone else?

Much appreciated,

Bogdan

John_Fleming · ‎2020-08-14

I'm sure you will, but please keep everyone posted on the issue. Would like to know what caused and what fixed it.

PhoneBoy · ‎2020-08-14

Please PM me the SRs, will make sure the right people are engaged.

Bogdan_Tatomir1 · ‎2020-08-14

Hello,

I just messaged you with the details. Looking forward to some support.

Best regards,

Bogdan

Dorit_Dor · ‎2020-08-14

Interim update

(1) Case #1 - cant push policy. The case was closed with root cause (which based on our support records was understood by them - so as side note, if you want root cause and one was not communicated, please do ask our support). Anyway, in this case the root cause: Putting domain objects in https inspection rule base was not supported pre R80.40 but wasn't enforced before the upgrade.
The issue was opened with R&D to see if we can improve the error/message etc

(2) Case #2 & case #3 are both handled still with both support and R&D (case 3 may even be a GW behavior that just so happen to be triggered by policy install, not sure yet)

We are committed to quality and while from time to time we will have issues, we see excellent indications of quality in R80.40. Our recommended releases today are R80.40 and R80.30 (where GW R80.30 is still the GW most used version so it may serve conservative customers).

More updates will follow

Dorit

Bogdan_Tatomir1 · ‎2020-08-15

Thank you very much for the update.

As a side note for case #1 - as we have communicated to support as well - we do *NOT* have domain objects in the HTTPS inspection policy, but only "Custom application" with regex domain (e.g. ".*abc.com.*") . Policy install fails even if Domain Objects are used in regular FW Access Policy on gateways with no HTTPS inspection turned on.

Best regards,

Bogdan

Ilya_Yusupov · ‎2020-08-17

Hi @Bogdan_Tatomir1 ,

First of all i want to share that indeed domain objects are not allowed to be used in HTTPSi RB prior to R80.40.

Second we found an issue in R80.10 when you have a Network Group that contain domain object in HTTPSi RB then push policy succeed while it should fail.

Post upgrade to R80.40 you will get a failure in push policy.

I would like to get your confirmation that indeed this is the case in your system, you can validate it by looking on the error you got, you should get a referral to Rule Number that contain domain object, can you please check if this rule contain a network object with domain object inside?

If you find that indeed this is the case so in R80.40 such configuration is not allowed on GW's prior R80.40 and the bug is in R80.10 and not in R80.40.

Thanks,

Ilya

Bogdan_Tatomir1 · ‎2020-08-17

Hello Ilya,

Thank you for looking into this. So we are talking about case #1. I can confirm we do *NOT* use any domain object in the network groups used for SSLi RB. As I mentioned before, R80.40 manager push policy to R80.10 gateways fails even for gateways that have HTTPS inspection turned off completely, if the FW RB uses domain objects.

Best regards,

Bogdan

Ilya_Yusupov · ‎2020-08-17

@Bogdan_Tatomir1 - i will continue with you offline for further discussion, will update the thread once we have conclusions.

Ilya_Yusupov · ‎2020-08-17

Hello @Bogdan_Tatomir1 ,

Following our conversation via email you confirm that indeed the case is same as i explained above.

The issue is in R80.10 only when push policy succeed while should be failed, we are working to find best solution to that with RnD.

Regarding why the policy is still failing while SSLi disabled, i explained to you that this is by design as your GW is under "installed-on" column of the problematic rule,

In such case no matter if the SSLi is disable we are trying to push policy due to installed-on column and failing in verification stage, in order to succeed you need to disable SSLi and remove the GW from installed-on column.

if any further assistance is required please contact me directly.

Thanks,

Ilya

Dorit_Dor · ‎2020-09-05

As promised (when the thread started) I return to this thread to share the outcomes and action items

1. As explained R80,40 does not have policy install diff. In R81 we dramatically improved policy install times by accelerating it for daily/small policy changes (we moved away from plan to do diff and instead we did full install policy, accelerating the whole experience and reaching around 10 seconds when diff is small - without the need to install diff). You are welcome to join the EA

2. The issues reported in this thread were mapped to problems in R80.10 (missing errors on things that didnt work back then), some usability / simplicity in errors as well as one case of cluster upgrade issue when cluster member name was not well handled (still trying to find the exact scenario where this happens but if it does happen, the solution is simple). We are investing a lot in diagnostics tools to improve experience and self-handling (in management its centered around CPM Doctor and in GW we add more capabilities to CPView)

3. R80.40 is our recommended version - some of our large scale users already use it (and it resolve past issues for them) and we have large portion of multi domain users (leveraging improvements as well large scale features like multi domain migration that were added). While R80.40 is ramping fast, R80.30 is still our most used version and is a very good version too. R80.10 lacks many of the goodies added later and many of the quality improvements done on later versions and we do recommend to upgrade to later versions. With the recent versions improvements, we do also see a significant decrease in number of service requests & bugs per "Check Point" (management or GW) in R80.40 and in R80.30.

Last but not least, as always we appreciate the open direct dialog with CheckMates. We do try to provide transparency and visibility to action items and changes done as result of this dialog

Thank you CheckMates, Dorit

Bogdan_Tatomir1 · ‎2020-09-17

Hello everyone,

Again, thank you for the assistance and attention brought to these cases. I am nevertheless a little sad that we had to make so much noise until someone finally was able to take a look at our issues.

Still, after 1 month of starting this thread, our perspective seems a little different than the ones you have mentioned Dorit.

1. R80.40 manager simply cannot manage / push policy on R80.10 gateways. That is a hard fact. No matter how much we progressed on solving errors and unchecking feature blades, policy simply does not install. We ended up giving up and closing out that case, because engineers kept insisting it was due to dynamic objects used in SSL inspection and TP policies, where policy wouldn't install even with FW blade-only turned on. We just took the hard way of upgrading our 60+ gateways to R80.40. I have to say, upgrading Cloudguard IAAS on Azure and AWS from R80.10 to R80.40 is a nightmare, there is little to no documentation available from CheckPoint and we had to guess almost every step. Once we have everything running R80.40 the policy install issues are gone, can surely confirm that.

2. The gateway that was dropping its own traffic after the upgrade to R80.40 was solved by seeing as file "myown.C" was containing a wrong value, which was resulting in incorrect implied rules being generated. Nevertheless, engineers are still looking into what caused that fail.

3. The case where IPS exceptions do not work on R80.40 is still not solved. Actually we opened the case on July 23rd 2020, it took until around mid of August for engineers to acknowledge there is an issue with it, and around 2 weeks ago until they could replicate it. We are closing up on the 2 months mark that this critical ticket is opened, where we have to take our IPS offline for legitimate systems to work, and basically renouncing an important protection layer. This is beyond any words and nowhere near aligned with CheckPoint's top security and zero risk approach.

4. Recently we are running into an another issue on R80.40, where the the firewall is responding with SYN-ACK packets from its own MAC for other (thousands of) IPs completely messing network traffic and three-way-handshakes and asset detection systems. We have just opened a case with support, but we don't have high hopes of this being fixed any time soon.

5. We have started migrating our edge security (internet facing) to another firewall vendor, and we are extremely delighted, where everything actually works, and we are getting the premium support we are paying for. We of course ran into some issues with those as well, as we have all feature blades enabled, like we do on the CheckPoints, but the average resolution time was below 7 days. Hopefully by the end of the year we will be able to completely move away from CheckPoint after 5 hellish years.

I'm sorry for being so negative, but the quality of products is only going down, and support's approach and quality is not helping. I do understand CheckPoint trying to tackle all security aspects, but having resolutions such as "you have too many blades enabled" or "blade X doesnt work when blade Y is enabled" is not a proper answer. We bought a product to secure our data and if we payed for the whole lot, of course we want to enable and use it all, especially when sizing was done for the entirety of blades turned on.

Best regards,

Bogdan

cezar_varlan1 · ‎2020-11-12

+Discovered an additional issue where CPD crashes - according to sk170256 it is fixed. I added the HF and it still doesn't work.

I could also see R81 is on the maps - sk166715

I would like to take this opportunity to wish everyone a lucky upgrade! If R80.40 doesn't work let's by all means go R81. Maybe that works out better. Engineering and IT isn't what it used to be, things now are more "metaphysical" and based on luck. "If we know it doesn't work - we fix it". "If we don't know if it works - we test it".

How come we are now at "If we don't know if it works or not, let's just upgrade it anyway"?

Dorit_Dor · ‎2020-11-13

Thank you for the feedback.

We fix issues in jumbo’s and do not ask to upgrade for fixes.
Some problems are fundamentally handled in new release (new releases are used by us to introduce bigger changes that are not solvable in incremental fixes). If you face issue like that, you should be able to get clear explanation of the root cause why its differently handled by new release. This should be limited to rare cases. if you are asked to upgrade without sufficient reason, you are welcome to approach me in private for both answers and for process improvement.

We release jumbo’s on regular basis so that problems will be solved without upgrade. The specific problem you pointed out is supposed to be solved from jumbo 87 of r80.40. If you tried it and it didnt work, it may mean that there was mistake in identifying your issue and wrongly associated it w this sk. Notice that crash can happen for different reason so there may be a reason we missed and isnt included in the fix. If you have a TAC case please send it to me in private and if not please open one and send it in private.

R81 is targeted to improve functionality and experience - bugs are fixed in jumbo׳’s

Again, we appreciate the open dialog and take feedback seriously so help us get the details and we will publish the outcome for transparency and continues dialog

Dorit

cezar_varlan1 · ‎2020-11-13

Appreciate the reply. We have found the root cause and this time CPD was broken, but by a custom script that ran in a loop. So i attempted to remove this post earlier, it looks like my connection to community.checkpoint.com had timed out or there was already a reply that prevented me from removing the post.

So what had happened is that without the Jumbo + custom HF we would not see the real error crashing CPD. This was resolved after a support call when we debugged CPD again after applying the JHF+custom patch and this time it showed additional error messages pointing exactly what it was running and where it crashed. This cpd_sched_config added custom blocking script that had a frequency of 49 days but was running in a loop and loading cpd to the point where watchdog was killing it for not responding. Either CPD is trying to run scheduled scripts in a loop if "$? is 1 " or there is some other mechanism making it behave like this. So my custom script had an issue and CPD was obsessively trying to run it and kept getting $? of 1 which is ERROR.

Therefore this last post i made was an error and i am sorry for that.

Bogdan_Tatomir1 · ‎2021-01-05

Hello,

This is my final update on this thread. I have screenshotted this response as it will probably get deleted/not approved by the censorship machine that CheckPoint runs. It is pretty clear at this point that this Security Vendor is nothing of what it used to be back in the 1990s with great security products and great support. Today ChekcPoint is just a marketing, propaganda and censorhip giant, and no other comparisons are needed..

Following up on the issues raised on this thread, we still have production-breaking issues and severe- and critical-issues with service requests being opened with CheckPoint opened for as long as 110 days, with no progress since 90+ days, and no response for 7+ days. Moreover, on tickets that we have requested assistance from engineers, they had the audacity to mark the tickets as "Pending Customer" and simply ignore our requests, refusing to assign engineers to critical issues.

We have been approached by high levels of CheckPoint's Support organization to be "silenced" and asked to stop publicly commenting in forums such as this in rewards of getting better support levels. This was only a mere illusion, as we are still struggling with production issues, with production flows being blocked, IPS getting bypassed by attackers, and support recommending disablement of blades for functionality improvement. This is precisely what you do *NOT* want from a security vendor.

At this point we have had enough of the bad treatment and complete indiference from Checkpoint as an organisation. We respect their R&D, products and effort towards making the work a better and safer place, but they lack the vision, understanding, and competency to understanding what real life differs from a laboratory.

We have taken the decision to move our final CheckPoint gateway clusters to the vendor we have already been slowly migrating to and strongly recommend the same for anyone that is looking for a proper security posture, and not just checking checkboxes in a compliance list with features that simply do not work.

We thank CheckPoint for all their inexistent effort to keep us as a customer, and wish them good luck in their endeavours. Out of the 25+ Security vendors that resided in our portfolio in 2020, this was by far our worst customer experience.

I strongly believe that facts and evidence always prevails, at least in the Information Security sector, and can only hope that such a wake up call will only motivate Checkpoint to close the gap they currently hold towards their competition in the market right now, and that 2021 will be a push towards research rather than marketing campaigns.

The strong words in this statement reflect my personal opinion of an 8+ years CheckPoint expert and CCSE certified engineer and are not directly related to my employer which simply targets its own wellbeing and a strong Information Security stance.

Best regards,
Bogdan

Bogdan_Tatomir1 · ‎2021-01-05

@Dorit_Dor @PhoneBoy @Sharon Elmashaly for visibility and awareness. I can provide e-mails, screenshots and meeting recordings to support any and all of my statements above.

PhoneBoy · ‎2021-01-05

I'll reach out to you privately to get the relevant details on this.

_Val_ · ‎2021-01-05

@Bogdan_Tatomir1 we will reach out to you offline today.

_Val_ · ‎2021-01-06

@Bogdan_Tatomir1

Another point. To make it simple, I am quoting my colleague admin @PhoneBoy from internal email thread:

"One of our core values is transparency. That means, as a rule, we don’t pull down negative posts toward Check Point on the community. We do acknowledge these posts, make sure to address the issues raised with relevant facts, and work to resolve the underlying issues. We did contact your account team, who provided the relevant TAC SRs. We’re looking into these now to understand what happened, what the next steps are, and where we can improve."

That said, we will be back to you, @Bogdan_Tatomir1, and the community for the matter.

SharonElmashaly · ‎2021-01-06

Hello Bogdan,

I am truly sorry you feel this way, and I would like to take this opportunity to address some of the comments made here:

We never censor people and opinions, and we would never delete what you wrote even though it’s harsh and we may see things differently.

Our customers are the most valuable asset for us. We put tremendous efforts to ensure our customers get the best value and security, and in many cases, we have been doing so regardless of the associated cost and labor that we need to invest.

As you can imagine, each case presents a different challenge, and while different on many things, the bottom line is always the same: our highest level of commitment to your business continuity and security.

Reviewing the cases presented here, for example, I can share that there were several bugs in R80.10, which were surfaced upon upgrade. We invested a lot in quality to improve significantly since R80.10. Some of the other product insights included product behaviors that can be better documented, including alternatives in cases the behavior is undesired. An example would be the fact that Active Streaming responds to SYN packets before the handshake is met. This is an example of how this mutual dialogue we have with customers makes our services better.

As you mentioned, we’ve been doing this for almost 3 decades. Our Customer Support capabilities have only increased over the years, with a much higher level of customer engagement and technical coverage, and so has been the quality and bandwidth of our products. Thankfully we have tens of thousands of customers who help us do this and get even better through an open and candid dialogue, which correlates with the rapid changes in the cyber landscape.

However, I am not disregarding your experience and I accept it as a given challenge that we need to embrace in order to improve further.

We have been working with your company for years, and I have been personally sponsoring the special attention since you shared your concerns with me. It wasn’t in trade-in for eliminating criticism, it is our professional commitment.

I want to keep this dialogue going, here and elsewhere, and I guarantee that any respectful feedback will be met by our utmost dedication. These are not empty words, we do strive to do better.

If you have further feedback that can help ironing this out, I’ll be happy to meet and discuss it further.

I will respect your decision, whatever it may be.

Regards,

Sharon Elmashaly

VP, Customer Support

Are you a member of CheckMates?

R80.40 policy install diff changes breaks the network