Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
WiliRGasparetto
MVP Diamond
MVP Diamond

HTTPS Inspection Troubleshooting   Evidence-Driven Runbook (Gateway CA, QUIC, pinning, proxy, and ws

HTTPS Inspection Troubleshooting   Evidence-Driven Runbook (Gateway CA, QUIC, pinning, proxy, and wstlsd debug)

Thesis (how TAC closes RCA fast)

When HTTPS Inspection “breaks,” it is rarely a single root cause. In the field, failures almost always map to one of these buckets:

  1. Endpoint trust chain (Gateway CA / internal CA not trusted)

  2. Application incompatibility (certificate pinning, mTLS, strict TLS requirements)

  3. Transport outside the expected path (QUIC/HTTP3 over UDP/443)

  4. Network interference (explicit/auth proxy, PAC, upstream SSL inspection)

  5. Capacity/crypto overhead under load (CPU, handshake pressure, aggressive policy scope)

  6. Unexpected bypass (rules/updatable objects/limitations) → “looks like it’s not inspecting”

TAC rule: don’t change configuration until you have minimum evidence and you’ve isolated variables (one change at a time).

 

0) TAC warnings (operational impact & governance)

Before any change (especially bypass rules and debug):

  • Take a configuration backup and document the baseline.

  • Run in a controlled maintenance window, ideally with console access.

  • Document every change to support rollback.

  • If this is a cluster, plan to collect evidence from all members, because the handshake may occur on any node.

 

1) 10-minute fast triage (no debug)

1.1 Confirm enforcement on the client (fastest proof)

Open an HTTPS site and inspect the certificate presented by the browser:

  • Issuer = Gateway CA / internal CA → outbound inspection is active

  • Issuer = public CA → bypass/no inspection/wrong scope

  • Certificate error → trust chain issue on the endpoint

TAC best practice: test with two browsers (Chrome/Edge and Firefox). Firefox can behave differently depending on trust-store behavior.

1.2 Confirm Gateway CA trust on endpoints (root cause #1)

The Gateway CA must be trusted on endpoints:

  • Windows: Trusted Root Certification Authorities

  • macOS: Keychain (System trust)

Typical signals: NET::ERR_CERT_AUTHORITY_INVALID, chain warnings, “connection not private.”

Recommended evidence: a screenshot of the certificate (Issuer/Subject/Validity) and the browser error.

 

2) Outbound vs Inbound — stop here if these are mixed up

2.1 Outbound inspection

  • Requires the Gateway CA to be trusted on endpoints.

  • The gateway dynamically re-signs certificates.

2.2 Inbound inspection (published internal services)

  • Requires the server certificate (and private key association, when applicable to your deployment model) to be correctly imported/assigned under the HTTPS Inspection certificate handling in SmartConsole (HTTPS Inspection → Certificates, as applicable).

  • Common symptom: an internal published service fails only when inbound inspection is enabled.

 

3) QUIC ≠ Pinning (separate the causes)

3.1 QUIC/HTTP3 (transport)

  • QUIC/HTTP3 uses UDP/443.

  • It can behave differently than TCP/443 and complicate inspection troubleshooting.

TAC test (variable isolation):

  • Temporarily block UDP/443 to force TCP/443, then compare behavior:

    • If the issue disappears, you’ve isolated QUIC as a major variable.

    • If it persists, move on.

3.2 Certificate pinning (application security mechanism)

  • The application expects a specific certificate/CA and rejects the Gateway CA.

  • Symptom: consistent failure for specific domains/apps, not for the whole Internet.

TAC treatment:

  • Use a domain-scoped bypass (minimum scope + governance: owner/justification/review date).

  • Avoid global bypass.

 

4) Explicit/auth proxy, PAC, and upstream SSL inspection (common blind spot)

If you have explicit proxy, authenticated proxy, PAC, or upstream SSL inspection, you may see:

  • certificate rewriting (double inspection → symptoms similar to trust failure)

  • authentication loops

  • timeouts/resets under peak load

  • inconsistent behavior by subnet/group (PAC-driven routing)

TAC tip: compare the same test from:

  • corporate network (with proxy/PAC) vs hotspot/4G (no proxy)

 

5) Logs: where to look and how to extract evidence

5.1 Primary log locations

  • $FWDIR/log/wstlsd.elg* (TLS handshake / inspection path)

  • /var/log/messages (daemon/system errors)

  • SmartLog on Management (where applicable)

5.2 Follow in real time (during reproduction)

tail -f $FWDIR/log/wstlsd.elg*

5.3 What to look for in wstlsd.elg* (practical patterns)

Always correlate with the exact test timestamp.

Look for patterns such as:

  • TLS handshake failures / negotiation mismatch (version/cipher/protocol)

  • Certificate validation failures (untrusted CA, incomplete chain, time/OCSP/CRL impacts)

  • Timeouts / resets tied to specific destinations (often pinning/incompatibility signatures)

  • Unexpected bypass indicators (traffic not intercepted as expected)

TAC method: timestamp → domain → handshake stage → error → confirm via controlled retest.

 

6) Advanced debug (wstlsd) — with correct START/STOP

TAC warning: debug can generate high log volume and affect performance. Use a maintenance window.

6.1 START (enable debug for all wstlsd PIDs)

for PROC in $(pidof wstlsd); do fw debug $PROC on TDERROR_ALL_ALL=5; done

Reproduce the issue (record URL + timestamp).

6.2 STOP (complete, corrected)

for PROC in $(pidof wstlsd); do fw debug $PROC off TDERROR_ALL_ALL=0; done

6.3 Minimal post-debug collection

tail -n 2000 $FWDIR/log/wstlsd.elg* > /var/log/wstlsd_last2k.txt
tail -n 2000 /var/log/messages > /var/log/messages_last2k.txt

 

7) Common symptoms → hypothesis → action (TAC-style)

7.1 “Browser certificate warning”

Hypothesis: Gateway CA/internal CA not trusted on endpoints.
Action: validate trust store (Windows Trusted Root / macOS Keychain) and CA deployment (GPO/MDM).

7.2 “Only specific site/app breaks”

Hypothesis: pinning/mTLS/strict TLS requirements.
Action: domain-scoped bypass with governance.

7.3 “Intermittent or Chromium-only failures”

Hypothesis: QUIC/HTTP3 variable.
Action: test UDP/443 block; document the decision and baseline.

7.4 “Slow after enabling inspection”

Hypothesis: CPU/crypto/handshake overhead under load.
Action: validate via cpview/system metrics; rollout by rings; tune scope and exceptions.

 

8) Operational best practices (what prevents incidents)

  • Gradual rollout (pilot → waves) with KPIs (tickets, failures, performance).

  • Exception governance: owner + justification + review date + record in change control (ticketing/spreadsheet).

  • One change at a time and document for rollback.

  • Periodic bypass audit to find undocumented exceptions.

  • In proxy environments: document the full chain and avoid double inspection where possible.

 

9) Evidence template (for CheckMates thread / TAC case)

  • Gateway version + Jumbo take

  • Browser(s) + version (Chrome/Edge/Firefox)

  • URL(s) + exact timestamp

  • Symptom (cert error / timeout / app break / slow / not inspecting)

  • Gateway CA installed? (yes/no)

  • CA distribution method: GPO / MDM / manual

  • Proxy/PAC/auth proxy present? (yes/no + details)

  • QUIC tested? UDP/443 blocked? (yes/no + result)

  • Logs: wstlsd.elg* snippet for the test window + /var/log/messages

 

Official references (direct links)

(2)
13 Replies
the_rock
MVP Diamond
MVP Diamond

Very nice!

Best,
Andy
"Have a great day and if its not, change it"
WiliRGasparetto
MVP Diamond
MVP Diamond

Thk's Andy
Best

the_rock
MVP Diamond
MVP Diamond

Really love all these write-ups, amazing.

Best,
Andy
"Have a great day and if its not, change it"
WiliRGasparetto
MVP Diamond
MVP Diamond

Thank you very much Andy, I've always had an excellent experience within our MVP community.

0 Kudos
PedroRFernandes
Participant

Excellent, congratulations on the article!

WiliRGasparetto
MVP Diamond
MVP Diamond

thk's Pedro

0 Kudos
Pedro139128
Participant

Great Effort!

WiliRGasparetto
MVP Diamond
MVP Diamond

Thank you bro 

0 Kudos
PhoneBoy
Admin
Admin

In releases prior to R82, I suggest blocking QUIC.
In R82 where QUIC is supported for both HTTPS Inspection and HTTPS Categorization, you can safely allow it.

WiliRGasparetto
MVP Diamond
MVP Diamond

Excellent placement, @PhoneBoy 

0 Kudos
PhoneBoy
Admin
Admin

Apparently, Chrome (and Chromium-based browsers) do not allow adding 3rd party trusted CA for QUIC.
That effectively knee-caps our ability to perform full inspection on this traffic (above and beyond categorization). 
Which means blocking QUIC entirely is still probably the best bet. 

0 Kudos
WiliRGasparetto
MVP Diamond
MVP Diamond

For me, blocking QUIC is the best practice to ensure full HTTPS inspection. This prevents the use of UDP port 443, forcing browsers to use HTTPS over TCP, where inspection works normally.

0 Kudos
vikaspg53
Explorer

Really well articulated. and very much informative, 

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events