Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
WiliRGasparetto
MVP Silver
MVP Silver

Practical Advanced VPN Troubleshooting (Check Point)

Using vpn debug + kernel evidence (only when needed) — CheckMates-style

⚠️ Operational impact warning: VPN debug—and especially kernel debug—can generate high log volume and impact performance. Use a controlled window, ideally with console access, and always apply filters before collecting kernel evidence.

 

1) The right mindset: what you must prove (and at which layer)

Before turning on debug, define the hypothesis and the evidence you expect:

Layer 0 — IKE / NAT-T Connectivity

  • Does the peer respond on UDP/500 (IKE) and, if NAT exists, UDP/4500 (NAT-T)?

  • Any upstream block? Asymmetric routing? ISP ALG interference?

Layer 1 — IKE Negotiation (Phase 1 / IKE SA)

  • Compatible proposals (encryption / integrity / DH group)?

  • Authentication matches (PSK / certificate / ID)?

  • Certificate chain / CRL / time sync (clock skew) OK?

Layer 2 — Child SA / Phase 2 (Traffic Selectors)

  • Do Encryption Domain / Traffic Selectors match on both sides?

  • Correct NAT exemption?

  • Does traffic match policy and enter VPN?

Layer 3 — Data Plane

  • Is ESP (IP proto 50) flowing both directions? Any kernel drops?

  • Is SecureXL affecting visibility/behavior? Do you need to reproduce with a non-accelerated path?

2) Preparation (before debug)

2.1 Minimum checklist (prevents unnecessary debug)

  • Verify time/NTP (common root cause for certificate/IKE failures).

  • Confirm policy is installed and rules allow UDP/500, UDP/4500, ESP (proto 50) between peers.

  • Confirm route to the peer and return path (especially multi-ISP).

  • Confirm Site-to-Site NAT exemption and ensure there is no “surprise NAT” in the path.

2.2 Define the debug “target”

Use real IPs from the case. Example safe test IPs (documentation ranges):

  • Remote peer (public): 203.0.113.10

  • Local (public): 198.51.100.5

  • Local network: 10.10.10.0/24

  • Remote network: 10.20.20.0/24

3) Standard evidence pack (CheckMates-ready): 3 sessions + consistent logs

Session A — VPN debug (user space)

  1. Enter Expert.

  2. Normalize/truncate before starting:

vpn debug trunc ALL=5
  1. Enable VPN debug:

vpn debug on
  1. (Optional, very useful) Enable timestamps:

vpn debug timeon
  1. Follow logs in real time (separate tab is ideal):

tail -f $FWDIR/log/vpnd.elg
tail -f $FWDIR/log/ike.elg

vpn debug writes evidence to vpnd.elg and ike.elg under $FWDIR/log/.

When to use ikefail
If the issue is intermittent and you want to see failures or error events in IKE trading

vpn debug ikefail

 

Session B — Network capture (tcpdump) for IKE/NAT-T/ESP

Goal: prove “Did the peer respond?”, “Did it migrate to NAT-T?”, “Is ESP flowing?”

tcpdump -ni <wan_iface> -vvv "(udp port 500 or udp port 4500 or proto 50)"

Practical interpretation

  • Only UDP/500 outbound and nothing returns → upstream block / routing / ACL.

  • UDP/500 then UDP/4500 → NAT-T in action (expected when NAT exists).

  • ESP outbound but not inbound → return path / upstream ACL / ISP / asymmetric routing / remote drop.

  • ESP both directions but application fails → likely Phase 2 / selectors / NAT exemption / policy.

 

Session C — (Optional) IKE monitor (snoop) when you must “see the conversation”

If you need deeper visibility:

vpn debug mon

This generates a snoop capture (for example ikemonitor.snoop) for analysis.
⚠️ Sensitive data warning: can record sensitive information (e.g., XAUTH/password) depending on the scenario. Use only when necessary and handle per policy.

To stop:

vpn debug moff

4) When (and how) to use Kernel Debug without hurting the environment

Golden rule: never run kernel debug “blind.” Filter by VPN peer or 5-tuple, then capture.

4.1 Filter by VPN peer (fastest for VPN incidents)

fw ctl set int simple_debug_filter_off 1
fw ctl set str simple_debug_filter_vpn_1 "203.0.113.10"

simple_debug_filter_vpn_<N> filters kernel debug output by VPN peer.

4.2 Filter by 5-tuple (flow-specific issues)

Example (test HTTPS between two hosts):

fw ctl set int simple_debug_filter_off 1
fw ctl set str simple_debug_filter_saddr_1 "10.10.10.50"
fw ctl set str simple_debug_filter_daddr_1 "10.20.20.80"
fw ctl set int simple_debug_filter_dport_1 443
fw ctl set int simple_debug_filter_proto_1 6

The filter logic supports AND (same index) and OR (different indices), including covering both directions if needed.

4.3 Disable filters at the end (mandatory hygiene)

fw ctl set int simple_debug_filter_off 1

 

5) End-to-end “copy/paste” sequence

Start (Session A)

vpn debug trunc ALL=5
vpn debug timeon
vpn debug on

Network capture (Session B)

tcpdump -ni <wan_iface> -vvv "(udp port 500 or udp port 4500 or proto 50)"

(Optional) IKE monitor (Session C)

vpn debug mon

Reproduce the issue

  • Bring up the tunnel / generate interesting traffic

  • Record the exact test timestamp for correlation

Stop (Session A)

vpn debug off
vpn debug timeoff

Stop monitor (if used)

vpn debug moff

If you enabled kernel debug filters

fw ctl set int simple_debug_filter_off 1

 

6) What to look for in the logs (evidence-driven diagnosis)

6.1 IKE (Phase 1 / IKE SA) failures

Common patterns:

  • “no proposal chosen” → proposal mismatch (encryption/integrity/DH).

  • “invalid ID / ID mismatch” → peer identity mismatch (DN/FQDN/IP).

  • “peer not responding” → network/ACL/routing/NAT-T/ISP (prove with tcpdump).

How to close the case with evidence

  • If tcpdump shows no reply on UDP/500/4500 → not “VPN config,” it’s connectivity/path.

  • If the peer replies and negotiation stops at a consistent point → correlate timestamps in ike.elg/vpnd.elg to identify the exact failure.

6.2 Phase 2 / Traffic Selector failures

Typical signs:

  • IKE comes up, but traffic doesn’t pass.

  • ESP is missing or only one direction.

  • Sessions match the wrong behavior due to NAT/routing.

Proof points

  • tcpdump seeing proto 50 confirms ESP.

  • vpnd.elg typically exposes TS/encryption domain mismatch.

7) VSX (when applicable)

Before collecting anything, switch to the correct VS:

vsenv <VSID>

Then run the steps above (each VS has its own contexts/logs).

 

 

(1)
3 Replies
Pedro139128
Explorer

Keep it up!

 

(1)
the_rock
MVP Diamond
MVP Diamond

Truly amazing, brother. So lucky to have you on community 🙌

Best,
Andy
"Have a great day and if its not, change it"
0 Kudos
PedroRFernandes
Explorer

Well done, brother! @WiliRGasparetto 

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events