Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
imamuzic
Contributor
Jump to solution

Packets lost after "fw post VM outbound"

Hello,

 

I'm fighting with strange issue. Today, the customer reported that their network monitoring system's pings suddenly stopped going to the other side of a VPN tunnel. Indeed, the log shows only "encrypted" events on this side, but no "decrypted" events from the other side since yesterday. Both tunnel ends are locally managed R81.10 Gateways (one is a physical device and another one is CloudGuard for Azure). What is really strange is the fact that everything else works trough that VPN tunnel, but the pings from the monitoring system just stopped going trough. Everything was OK before.

fw ctl zdebug + drop doesn't show anything at both gateways. 

fw monitor on the source gateway shows 'fw post VM outbound" as the last chain module for these icmp echo packets. For other traffic flowing tough the tunnel without issues, the last module is "SecureXL VPN before encryption". fw monitor on destination side does not shows any packet related to these pings. 

I've was examining connection and SecureXL tables, but cannot find this problematic icmp echo connection in these tables (altough kernel connection table shows the connection briefly which is I believe expected as this is about ICMP connection).

Any ideas where to look further? How come that fw ctl zdebug + drop doesn't show anything?

 

Regards,

Igor

 

0 Kudos
1 Solution

Accepted Solutions
imamuzic
Contributor

OK, problem solved 🙂 I'm an idiot who put that problematic host under a NAT by accident and that is why the traffic didn't made trough and that is why I could'nt trace it to the end of a chain with 'fw monitor' and fw ctl zdebug + drop didn't showed nothing as I was filtering by the original instead of a NAT IP address. 

I'm glad, though, that this was not about some Check Point bug, as I sold that Check Point to this customer over the Palo Alto and one of my strongest arguments for the Check Point vs. Palo Alto was fw ctl zdebug + drop and fw monitor 😄 I work with Palo Alto firewalls for like 15 years and prefer Check Point because the devil is in the details 😉 

Regards,

Igor

 

View solution in original post

13 Replies
Timothy_Hall
Legend Legend
Legend

If the pings are being allowed by the "Accept ICMP Requests" checkbox option on the FireWall screen of Global Properties, disable that and create an explicit rule permitting the ICMP traffic in your Access Control rulebase.  I've seen some odd behavior occur in VPN tunnels for traffic allowed by implicit rules instead of explicit ones.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
imamuzic
Contributor

I see now that it is not just about pings, but also https doesn't work anymore from this monitoring server. But again, I'm slightly shocked that I don't see anything on 'fw ctl zdebug + drop'. Is this possible or maybe I don't grep it correctly?

The IP in question is:172.18.24.211, so as usual I'm debuging exactly this: fw ctl zdebug + drop | grep 172.18.24.211.

I also tried zdebug + drop on the other side of the tunnel greping peer's IKE ID which is it's main IP address, but still nothing. 

 

 

0 Kudos
Timothy_Hall
Legend Legend
Legend

If you see the packet leaving post-outbound (O) but not being transmitted, and it is not showing up in fw ctl zdebug drop, all that means is that the Check Point code (including SexureXL) allowed it.  The packet was handed back to Gaia for transmission but it failed.  An example of how this could happen is a typo in the next hop gateway for the route the traffic is matching, and/or or the next hop is not responding to ARP requests.  You need to be looking for the problem at the Gaia network level.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
imamuzic
Contributor

I see... Are IPSec L2L tunnels between 2 locally managed gateways handled by Gaia by any means? Or anti spoofing?

0 Kudos
Bob_Zimmerman
Authority
Authority

Note that this could also mean the fw monitor filter is incomplete. "SecureXL VPN before encryption" is the last kernel module which should see the clear packet. After that, it will be encrypted and will have different source, destination, and so on. It won't match the filter which caught the clear version of the traffic.

0 Kudos
imamuzic
Contributor

Per logs, the packet from that problematic host leaves on premises gateway being encrypted in correct VPN community, but the  last module in fw module is "fw post VM outbound" ?! and remote gateway never logs "decrypted" event for it and for other traffic (that have no issues) flowing trough that same VPN tunnel fw monitor shows "SecureXL VPN before encryption" which is as you wrote the last module where we should see the packet if fw monitor filters for clear text packets only. So, my primary question is: what could be the cause that the last module is fw post VM outbound on the source VPN gateway without any traces in zdebug + drop only for one particular host while packets sourced from other hosts can reach the same destination IP with no problems ? 🙂

0 Kudos
Bob_Zimmerman
Authority
Authority

As you discovered, NAT changes the packet so it no longer matches your capture filter. The biggest issue with capture filters in fw monitor is they are evaluated individually at each step. If the traffic no longer matches the filter because something about it has been changed, the capture just won’t record anything further.

A kernel debug on xlate should make this situation more obvious in the future. You can run it with zdebug, since that’s just a macro which sets up the buffer and so forth. It's much more verbose than just debugging on drop, but when troubleshooting a connection which isn't working, it shouldn't be too bad. Here's an example from one of my lab standalone boxes showing the debug command, plus the output generated when I have the standalone send a single ping from itself to a destination which matches a rule to hide the source:

[Expert@DallasSA]# fw ctl zdebug -T -F "10.0.1.253,0,192.168.144.120,0,0" -m fw xlate drop
Defaulting all kernel debugging options
Debug state was reset to default.
PPAK 0: Get before set operation succeeded of simple_debug_filter_off
Initialized kernel debugging buffer to size 1023K
fw ctl set string simple_debug_filter_saddr_1 10.0.1.253 -a
PPAK 0: Get before set operation succeeded of simple_debug_filter_saddr_1
fw ctl set int simple_debug_filter_sport_1 0 -a
PPAK 0: Get before set operation succeeded of simple_debug_filter_sport_1
fw ctl set string simple_debug_filter_daddr_1 192.168.144.120 -a
PPAK 0: Get before set operation succeeded of simple_debug_filter_daddr_1
fw ctl set int simple_debug_filter_dport_1 0 -a
PPAK 0: Get before set operation succeeded of simple_debug_filter_dport_1
fw ctl set int simple_debug_filter_proto_1 0 -a
PPAK 0: Get before set operation succeeded of simple_debug_filter_proto_1
Updated debug variable for module fw
Kernel debugging buffer size: 1023KB
...
HOST:
Module: fw 
Enabled Kernel debugging options: xlate ipv6 drop 
Messaging threshold set to type=Info freq=Common

-----------------------------------------------------
...
Debug filter not set.
-----------------------------------------------------
VPN Simple Debug Filter Not Activated
-----------------------------------------------------
Simple Debug Filter Is Activated
Tuple   Protocol       Source:Port        Destination:Port
(1)      *         10.0.1.253:*       192.168.144.120:*
(2)      NOT DEFINED
(3)      NOT DEFINED
(4)      NOT DEFINED
(5)      NOT DEFINED

Number      IP Address
(1)          NOT DEFINED
(2)          NOT DEFINED
(3)          NOT DEFINED
-----------------------------------------------------
@;1;kiss_debug_report: start
@;1;kiss_debug_report: start
@;73882690;26Oct2024 14:47:28.051001;[cpu_0];[fw4_1];fwx_key_get_client_conn_key: possible security server gw->srv conn before rulebase.;
@;73882690;26Oct2024 14:47:28.051150;[cpu_0];[fw4_1];fwx_key_get_client_conn_key: possible security server gw->srv conn before rulebase.;
@;73882690;26Oct2024 14:47:28.051253;[cpu_0];[fw4_1];fwx_key_get_client_conn_key: possible security server gw->srv conn before rulebase.;
@;73882690;26Oct2024 14:47:28.051350;[cpu_0];[fw4_1];fwx_key_get_client_conn_key: possible security server gw->srv conn before rulebase.;
@;73882690;26Oct2024 14:47:28.051475;[cpu_0];[fw4_1];fw_xlate_new_conn: connection <dir 1, 10.0.1.253:6157 -> 192.168.144.120:0 IPP 1> ifnum=1, dir=1;
@;73882690;26Oct2024 14:47:28.051485;[cpu_0];[fw4_1];fwx_get_xlation: conn = dir 1, 10.0.1.253:6157 -> 192.168.144.120:0 IPP 1, mthd=ffffffff;
@;73882690;26Oct2024 14:47:28.051509;[cpu_0];[fw4_1];fwx_cache_lookup: flags = 0x0;
@;73882690;26Oct2024 14:47:28.051511;[cpu_0];[fw4_1];fwx_cache_lookup: NAT_X_SRC_IS_RANGE_L4: 0;
@;73882690;26Oct2024 14:47:28.051517;[cpu_0];[fw4_1];fw_xlate_match_epilog: There is already NAT on src/sport;
@;73882690;26Oct2024 14:47:28.051521;[cpu_0];[fw4_1];fw_xlate_match: connection matches rule:;
@;73882690;26Oct2024 14:47:28.051524;[cpu_0];[fw4_1];fw_xlate_match: < 7;
@;73882690;26Oct2024 14:47:28.051526;[cpu_0];[fw4_1];xlation type: ff000001,;
@;73882690;26Oct2024 14:47:28.051532;[cpu_0];[fw4_1];10.0.0.0, 10.255.255.255, 10.74.255.1,;
@;73882690;26Oct2024 14:47:28.051534;[cpu_0];[fw4_1];xlation type: ff010202,;
@;73882690;26Oct2024 14:47:28.051540;[cpu_0];[fw4_1];192.168.0.0, 192.168.255.255, 192.168.0.0,;
@;73882690;26Oct2024 14:47:28.051541;[cpu_0];[fw4_1];xlation type: 0,;
@;73882690;26Oct2024 14:47:28.051547;[cpu_0];[fw4_1];0.0.0.0, 0.0.0.0, 0.0.0.0,;
@;73882690;26Oct2024 14:47:28.051548;[cpu_0];[fw4_1];xlation type: 0,;
@;73882690;26Oct2024 14:47:28.051553;[cpu_0];[fw4_1];0.0.0.0, 0.0.0.0, 0.0.0.0>;
@;73882690;26Oct2024 14:47:28.051566;[cpu_0];[fw4_1];fwx_create_xlation: xlconn=dir 1, 10.74.255.1:6157 -> 192.168.144.120:0 IPP 1 mthd=ff000001, flags=100008;
@;73882690;26Oct2024 14:47:28.051569;[cpu_0];[fw4_1];fwx_get_xlation: fwxl->ex_flags = 0x0;
@;73882690;26Oct2024 14:47:28.051585;[cpu_0];[fw4_1];allocate_port_impl_static: using range: 35000 - 59999;
@;73882690;26Oct2024 14:47:28.051595;[cpu_0];[fw4_1];allocate_port_impl_static: hide_src=10.74.255.1, new_dst=192.168.144.120, new dport=0, first=88b8, last=ea5f, start=88c0, old_port=180d, not synchronized;
@;73882690;26Oct2024 14:47:28.051603;[cpu_0];[fw4_1];allocate_port: found a free port <1,10.74.255.1,88c1,192.168.144.120>;
@;73882690;26Oct2024 14:47:28.051610;[cpu_0];[fw4_1];fwx_alloc_get_port_type_and_member: ACTIVE + HIGH;
@;73882690;26Oct2024 14:47:28.051613;[cpu_0];[fw4_1];fwx_alloc_stats_update: called with update amount 1;
@;73882690;26Oct2024 14:47:28.051619;[cpu_0];[fw4_1];fwx_alloc_stats_update: stats_key: <1,10.74.255.1,14,192.168.144.120>;
@;73882690;26Oct2024 14:47:28.051622;[cpu_0];[fw4_1];fwx_alloc_stats_update: entry not found - setting a new entry.;
@;73882690;26Oct2024 14:47:28.051624;[cpu_0];[fw4_1];fwx_alloc_stats_update: updated amount to 1;
@;73882690;26Oct2024 14:47:28.051637;[cpu_0];[fw4_1];fwx_alloc_fill_conn_data: conn: dir 1, 10.0.1.253:6157 -> 192.168.144.120:0 IPP 1;
@;73882690;26Oct2024 14:47:28.051640;[cpu_0];[fw4_1];fwx_alloc_get_port_type_and_member: ACTIVE + HIGH;
@;73882690;26Oct2024 14:47:28.051647;[cpu_0];[fw4_1];fwx_apply_hide: xlation enabled;
@;73882690;26Oct2024 14:47:28.051693;[cpu_0];[fw4_1];fw_xlate_packet: connection <dir 1, 10.0.1.253:6157 -> 192.168.144.120:0 IPP 1>, OUTBOUND(1);

The -T adds timestamps to the output. The -F specifies a simple filter using the same syntax as fw monitor.

My real source is 10.0.1.253, translated source is 10.74.255.1, and the destination (which doesn't change in my rule) is 192.168.144.120. You can see that even though my filter only catches the original source, the debug records that the traffic is being translated. When there's no translation, the xlate debug produces no additional output.

(1)
imamuzic
Contributor

Thanks...I didn't knew for xlate debug procedure until now. That is great 😉

0 Kudos
CheckPointerXL
Advisor
Advisor

why did you say "my filter only catches the original source, the debug records that the traffic is being translated." ?

i see:

@;73882690;26Oct2024 14:47:28.051566;[cpu_0];[fw4_1];fwx_create_xlation: xlconn=dir 1, 10.74.255.1:6157 -> 192.168.144.120:0 IPP 1 mthd=ff000001, flags=100008;

  

anyway great filter, i guess it's lighter than fw ctl zdebug + drop ?

 
0 Kudos
Bob_Zimmerman
Authority
Authority

That's the decision. The filter as I specified only catches packets which matches these criteria:

Simple Debug Filter Is Activated
Tuple   Protocol       Source:Port        Destination:Port
(1)      *         10.0.1.253:*       192.168.144.120:*
(2)      NOT DEFINED
(3)      NOT DEFINED
(4)      NOT DEFINED
(5)      NOT DEFINED

Note that reply packets won't match this filter, so you won't see NAT decisions made for them. These filters match packets, not flows.

If I had something listening at 192.168.144.120, to catch the replies, I would need to add -F "192.168.144.120,0,10.74.255.1,0,0" to the debug command in order to see those replies. Note that the translated address appears there, as that is how the packet will look when it arrives at the firewall. That would then cause the kernel to log the decision to translate it back to 10.0.1.253.

0 Kudos
imamuzic
Contributor

But I have a problem only with traffic from this particular host and it stopped working suddenly. All other traffic from other hosts is flowing trough that same VPN tunnel without issues, so ARP and routing tables are hardly the cause of the issue. Per logs, the packet from that problematic host leaves on premises gateway being encrypted in correct VPN community, but last module in fw module is "fw post VM outbound" ?! and remote gateway never logs "decrypted" event for it. Before the issue occurred both encrypted and decrypted events where generated for every minute since this is a monitoring host that generates ICMP ECHO probes  in 1 minute intervals. The last chain module that fw monitor shows for the other packets from hosts that have no issue communicating with the other side is "SecureXL VPN before encryption" as it should be. So,  at first sight, it looks like my on premise gateway somehow eats packets after "fw post VM outbound" module, but being unable to show it in fw ctl zdebug + drop altough I'm using copy paste ihe source IP address from fw monitor to grep of fw ctl zdebug to avoid false results because of typo. 

0 Kudos
imamuzic
Contributor

OK, problem solved 🙂 I'm an idiot who put that problematic host under a NAT by accident and that is why the traffic didn't made trough and that is why I could'nt trace it to the end of a chain with 'fw monitor' and fw ctl zdebug + drop didn't showed nothing as I was filtering by the original instead of a NAT IP address. 

I'm glad, though, that this was not about some Check Point bug, as I sold that Check Point to this customer over the Palo Alto and one of my strongest arguments for the Check Point vs. Palo Alto was fw ctl zdebug + drop and fw monitor 😄 I work with Palo Alto firewalls for like 15 years and prefer Check Point because the devil is in the details 😉 

Regards,

Igor

 

the_rock
Legend
Legend

Mistakes happen man, dont be hard on yourself. I mean, palo alto fws are fine too, but I always found that troubleshooting with cp is much more intuitive and plus, management platform is best in the market and truth be told, nat with PAN is soooooo confusing, specially destination one lol

Andy

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events