SSL Inspection Engine Broken

Dan_Roddy · ‎2018-07-05

All outbound SSL traffic is broken today. This error is seen in Tracker: HTTPS Inspection error occured. Action Reason: Blocking request as configured in engine settings of Application Control. This occured after 4:00 AM today...ie. right after our IPS update completed.

This error is seen in SmartEvents for any and all domains: OCSP response time obsolete. Response considered unreliable.
Certificate DN: 'CN=*.agkn.com' Requested Server Name: d.agkn.com

I had to disable SSL inspection on the gateway to get the enterprise web browsing restored. Is the database corrupted?

HELP!

Thank you in advance, can I give you all my points for help?

Best,

Dan Roddy

PhoneBoy · ‎2018-07-05

As part of HTTPS Inspection, the gateway validates the certificate on the remote site.

This is partially done with OCSP.

You can disable this by unchecking the "Revoked server certificate" option here:

I suppose it's possible an IPS or App Control update somehow is blocking the OCSP queries.

Did you open a TAC case by chance?

Dan_Roddy · ‎2018-07-06

When I arrived at my desk at 6:30 this morning Kyle Danielson called me to discuss the case I had not opened. I must be thanking you Dameon for the connection - turns out this is a known problem with at least one other environment. I will be re-enabling SSL inspection this afternoon to do further troubleshooting. Here is a related error I found in Tracker that suggests and engine setting, can you tell me what and where the relevant engine settings are? Looking in Blades/Application Control & URL Filtering/Advanced Settings. Is this maybe a SSLv3 issue? How will the gateway react if the connection is SSLv3? I agree that no browsers should have SSLv3 enabled. Too many questions!

Record Details

Application Control - HTTPS Inspection error occurred (2)
Product Family	Network
Action Reason	Blocking request as configured in engine settings of Application Control

PhoneBoy · ‎2018-07-06

I can't claim full credit for this one, though yes, I helped connect the dots

If the issue were SSLv3 the error would be more clear.

Other than what I suggested above, I'm at a loss personally.

Timothy_Hall · ‎2018-07-06

Pretty sure OCSP verification happens in the various wstlsd daemons on the gateway, any interesting messages dumped into $FWDIR/log/wstlsd.elg around the time of the issue?

Any chance the IPS update started interfering with DNS lookups being performed by the firewall which is necessary for OCSP checks? Were any DNS signatures updated? Did you try backing out the IPS update which is now possible under R80+ management?

Any chance the clock on your systems jumped forward or backward an excessive amount during this period due to issues with NTP? The error message about the time being "obsolete" is kind of strange...

--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com

Dan_Roddy · ‎2018-07-12

Tim, Kyle Danielson from TAC was able to capture debug on this...firewall clocks are right on, no apparent time issue.

From our support case:

Debugged WSTLSD to confirm if the issue is the same as the other one that was reported.

We were able to replicate the issue with WSTLSD debug.

The problem here is that the GW is getting a 'next update' that's in the past.
-This could be some problem with parsing on the gateway or with the response sent by the CA.

I wanted to debug again and get captures to confirm what the server is sending back, however the issue wouldn't replicate again. We'd have to figure out some way to clear the CPTLS cache for HTTPS inspection to force it.

There is alot of clutter around this case maybe obscuring the root cause. One thing I found is that content delivery networks do this thing called OCSP splicing where they get the OCSP answer and cache it - so if that were the issue it would be outside our organization and maybe CDN was failing to update OCSP answers and their cached answer was stale. ( I really put a lot of credit to this theory)

NOW another thing jumps out as I continue to query the database I found these System Alerts that started at 4:16 AM and continued until 5:00 PM on July 5th (the day all 443 traffic failed)

Internal error occurred, could not connect to "cws.checkpoint.com:80". Check proxy configuration on the gateway.

This is the AntiBot Blade with a High Threat System Alert.

What do you think about this as contributing to the foray?

Daniel_Moore · ‎2018-07-12

In the App & URL filtering blade, there is an option for the protocol as an explicit choice. OCSP is done over http, thus not an option you necessarily want to allow servers in or limited access zones. Works well with combined strict FW rules.

Looking forward to the layers in R80.xx and applying this in more places.

Dan_Roddy · ‎2018-07-12

Daniel,

I'm not sure I understand. In our case the OCSP request was going out but the answer had a time that was in the past so the TLS sessions were considered NOT trusted and were blocked. And I mean every TLS browser session in our Enterprise was blocked and this included multiple network segments but no DMZ.

Daniel_Moore · ‎2018-07-12

The settings we have configured are more extensive 77.30 than Daemon's screen shot above, but we are not doing full SSL inspect either. If you can exclude the OCSP traffic from inspection on any blade, globally, you can artificially reduce "latency" of the return response.

Dan_Roddy · ‎2018-07-12

I don't think there would be any latency because OCSP is always done on port 80, thus no inspection necessary.

Daniel_Moore · ‎2018-07-12

Depending on the blades you have enabled, HTTP traffic does get inspected (and always does at the kernel level) no matter if you have IPS specific inspections enable or not. Latency being RTT for a server to respond. If there were a lot of errors, during a specific period of time, it's very possible upstream providers were doing maintenance or had an outage. Do you have external polling/ monitors setup for network health outside of your Checkpoint environment? Did Amazon have any service disruptions during the window of time? The interwobble ain't perfect.

I don't know the exact acceptable threshold for Checkpoint to say yeh or neh on a certificate revocation check. I do know explicitly allowing a port and protocol is different than any* any*. Mentioning the OCSP Protocol option for those who many not know it exists and future reference.

Dan_Roddy · ‎2018-07-13

OCSP acting up again but only on a limited number of domains, not all domains - response time obsolete (again)

OCSP response time obsolete. Response considered unreliable.
Certificate DN: 'CN=*.algovid.com,OU=EssentialSSL Wildcard,OU=Domain Control Validated' Requested Server Name: a.algovid.com

Does anyone have insight into CDNs and this new term to me: OCSP Stiching or OCSP Splicing? I am suspicious of this caching of OCSP responses, grrr.

Are you a member of CheckMates?

SSL Inspection Engine Broken