Solved: Re: Troubleshooting dropped packets in Checkpoint ...

Zaid_Khan · ‎2023-01-05

Ever wished you had more insight into the traffic getting dropped by your Checkpoint Firewall?

Read on to learn a very powerful tool you to your rescue known as zdebug.

The fw ctl zdebug drop command lists all dropped packets in real time and explains the reasons for the drop

Use the expert mode fw ctl zdebug drop CLI command to set all the debugs flags and get an output on the command line.

The syntax for the command is:

[Expert@hostname]# fw ctl zdebug + <flags>

where <flags> could be any fw module flag.

For Example: The most common usage is the drop command:

[Expert@hostname]# fw ctl zdebug + drop

If you want to see drops only for a single IP use the grep filter:

[Expert@hostname]# fw ctl zdebug + drop | grep X.X.X.X

Replace X.X.X.X with the IP you want to filter for.

If you still cannot see the traffic, then most likely traffic is not even hitting the firewall. To verify, you can use tcmdump utility to capture packets:

Open a new session and:

[Expert@hostname]# tcpdump -nni any host x.x.x.x -s0 -w /var/log/tcpdump1.pcap

Note: The zdebug starts a debug in the background until it is stopped using CTRL + C.

Note: When I did CTRL +C to stop the captures, I got the following notification:

^C

Next time perform for exit: "fw ctl debug 0"

Cannot unset debug filter

So you might need to do this as well to completely stop all debugs:

[Expert@hostname]# fw ctl debug 0

Defaulting all kernel debugging options

Debug state was reset to default.

PPAK 0: Get before set operation succeeded of simple_debug_filter_off

shais · ‎2023-01-08

Hi,
Let me share some info regarding the above and the difference between the debug commands.

zdebug command is a wrapper for the full debugs, it will run the debug commands for you and will allow you to run debug from one debug module only.
By default, it will use a small debug buffer but if you wish, you can provide the "-buf" command to use your own size.

There are multiple tools inside our debug infra that can help when debugging the GW, for example:

Debug filter - allow you to run the debug on a specific connection (reducing the CPU load caused by the debug)
Pre-processing grep - allows you to filter for specific words in the debug, similar to grep command you run but as it's been done pre-processing, it reduces the CPU used by the debug.
Allow running debug flags on multiple modules.
And more.

So using zdebug will utilize the same debug infra but prevent you from using advanced debugging features.

If you are interested in debugging drops on your GW then I would suggest starting first with drop statistics

Cpview contains information regarding drops that happen on the GW

In addition, we have a new method to run debug using the "easy_debug" command which allows you to run debug commands without knowing the full CLI commands.

View solution in original post

the_rock · ‎2023-01-05

Definitely good post for any troubleshooting.

Chris_Atkinson · ‎2023-01-05

sk100808 also describes this and I know the topic is a trigger for @_Val_

Please also consider the alternatives to tcpdump being... fw monitor & cppcap

CCSM R77/R80/ELITE

_Val_ · ‎2023-01-05

Since you tagged me here, @Chris_Atkinson...

There is no need to run "fw ctl debug 0" just because this command is done by zdebug macros when you press ctrl-z on exit.

The rest... It is sad and disappointing that more and more people are using zdebug instead of a proper kernel debug, but I lost my battle after it leaked to SecureKnowledge and was widely adopted by TAC engineers. I would expect someone at R&D actually to open the code and at least add a decent debug buffer to it instead of 1024K as it is now. But no, nobody listens. And now we have long posts about that crap.

All I had to say is said here: http://checkpoint-master-architect.blogspot.com/2017/11/kernel-debug-best-practices-or-why-fw.html

I rest my hopeless case...

the_rock · ‎2023-01-05

Good sk for sure. Well, speaking of zdebug and kernel debug, I will share 2 quick stories that come to my mind about it. So once, I was working with this large bank and they had case with escalations and there was guy from R&D on a call who had kernel debug prepared and he assured the customer all was fine and would you know it, as luck would have it, as soon as he ran the debug, fw got stuck, no one could access it and poor client had to drive hour away to console into it and get it working again. Lets just say, they were NOT happy (to put it nicely).

Then, 2nd time, different customer and I were trying to fix weird issue they had after upgrade to R80.30 and since Tier 2 and Tier 3 could not help, they transferred us to escalations and guy asked us to run kernel debug and sure enough, box got stuck again and we had to physically powercycle it. Thank God device was 20 feet away and not 100 miles (phew).

So, in all honesty, I cant blame TAC for sticking with zdebug, I do the same all the time. I never ever recommend any customers to do kernel debug, even if TAC suggests it, unless they have super solid reason for it.

Just my opinion based on bad experiences with it in the past.

_Val_ · ‎2023-01-06

@the_rock Lol, did you just seriously write all this? What do you think is the difference between "fw ctl debug" process and "fw ctl zdebug"?

the_rock · ‎2023-01-06

@_Val_ lol, I did seriously write all that. Maybe if there was proper kernel debug out there, more people would be running it, instead of zdebug.

G_W_Albrecht · ‎2023-01-06

Please explain the difference between "fw ctl debug" process and "fw ctl zdebug"! Afaik, fw ctl zdebug runs a "fw ctl debug" kernel debug. It is only a shorthand way of defaulting all kernel parameters, setting the buffer to 1MB, and then adding fw module flags. ("fw ctl debug")

Issue here is that the 1MB buffer is so small that it is very restrictive in use. Manual "fw ctl debug" gives you much more parameters to fine tune debug and make it less hard for the GW.

CCSP - CCSE / CCTE / CTPS / CCME / CCSM Elite / SMB Specialist

_Val_ · ‎2023-01-06

Yes sir, @G_W_Albrecht, and if @the_rock was a bit more attentive, he could actually also discover that fact, instead of making ridiculous suggestions, just by reading my link referenced above, where I explain that.

Andy, please stop making these claims, it is no longer amusing 🙂

the_rock · ‎2023-01-06

@_Val_ Im not being amusive, Im being 100% serious. Speak to TAC engineers yourself and get a feedback from them. If firewalls getting locked up and not having access is ridiculous, then I truly have nothing else to say. I rest my case.

_Val_ · ‎2023-01-06

🤦

the_rock · ‎2023-01-06

@_Val_ ...lets have civil discussion. Please tell me what Im missing here.

_Val_ · ‎2023-01-06

You are missing a very simple fact: zdebug is in fact nothing but a set of "fw ctl debug" commands, wrapped into a macros.

You keep claiming there is a fundamental difference between using "fw ctl debug" and "fw ctl zdebug", while the main difference between these techniques is that zdebug reserves a very small output buffer, which makes it very impractical when used in a production environment.

This fact is thoroughly described in my blog post linked above, and then re-told to you by two other fellow community members. now I repeat it again, the fourth time.

You claim zdebug is somehow safer than the regular kernel debug, which is not true. You refer to an occurrence when kernel debug crashed a FW. You somehow convinced yourself that it would work better with zdebug, and once again, you are mistaken. I have seen multiple cases in my 25 years where zdebug had exactly the same effect, but referring to personal experience is pointless. Kernel crash is just bad luck plus kernel code bugs, and actually, with a smaller debug buffer (zdebug) the chances of a crash are somewhat higher than with a bigger buffer.

I kindly ask you to re-read carefully the thread and my blog post, and if you have any questions about the matter, I will be happy to answer them.

However, if you still do not see why your statements were incorrect, I am afraid I have to give up and close this discussion.

the_rock · ‎2023-01-06

THANK YOU @_Val_ . Now that I read your response and also your blog more carefully, its definitely way more clear to me. Having said that, I will add that its not just my own experience about kernel debug, but I had many customers tell me the same. Anyway, everyone would have had different "journey" when it comes to this...

I do have a suggestion though. Do you think it would be possible to maybe chase R&D and put an official public SK together that would outline proper debugs/commands to use? I believe it would be way better to have that on support site than people "venting" here about it...just my opinion 🙂

_Val_ · ‎2023-01-06

There are two SKs: sk98799 & sk171943, on how to run kernel debug. The latter mentions zdebug, and also some limitations, but misses the buffer argument. I had to submit an edit to fix that, let's see if it makes it through.

Also, there is another SK: sk160955 about memory usage with kernel debug.

Finally, there is a whole section on kernel debug in the documentation: https://sc1.checkpoint.com/documents/R81.20/WebAdminGuides/EN/CP_R81.20_SecurityGateway_Guide/Conten...

So I do not think we need to chaise R&D, but we do want to stress: RTFM 🙂

the_rock · ‎2023-01-06

No need to swear in abbreviation, I get the message loud and clear : - )

Have an amazing weekend!

_Val_ · ‎2023-01-06

To close the book on that, my edition to sk171943 was just approved. It says:

Important: "fw ctl zdebug" command allocates only 1024K for the buffer, which may not be sufficient when debugging in a production environment with a lot of traffic. It is best practice to use full set of "fw ctl debug" commands in such situations.

I am surprised and humbled to say, after so many years, I have won the battle!

the_rock · ‎2023-01-06

I was just pulling your leg man, us eastern Europeans dont get offended on human level, at least I dont lol. Anyway, IM actually so glad we had this discussion, because after reading your last response and the blog again, its definitely much more clear to me.

But hey, some battles take years, or decades to win ; - )

Tobias_Moritz · ‎2023-01-06

@the_rock : I also saw locked-up gateways due to kernel debugs suggested by TAC, but this has nothing to do with usage of zdebug macro or not. You are so long here, did you ever read the blog post from Val from 2017 (it is even linked here)? He clearly says that zdebug macro is nothing more like this:

fw ctl debug -buf 1024
fw ctl debug (your options)
fw ctl kdebug -f
-------(waiting for Ctrl-C)
fw ctl debug 0

Because of this and the way to small buffer, it should be avoided and is bad practice. I argue everytime with TAC, when they suggest using it. Beside other deprecated things like tcpdump (instead of cppcap), fw monitor -f (instead of -F) and other things.

I can only second what Val (and Günther) wrote. This way of troubleshooting is outdated and should not be used.

When you did not saw crashes with "fw ctl zdebug drop" but with some "fw ctl debug" procedure, this is not because the use of the zdebug macro, but because of the debug options (drop versus other options TAC gave your for the specific support case) or the specific environment.

_Val_ · ‎2023-01-06

@Tobias_Moritz I mentioned the link above 🙂

G_W_Albrecht · ‎2023-01-06

Debug on a productive environment without maintenance window ? No one of the customers i know would let anyone do that ! Also i do not see the point of telling such tales for zdebug - looks like you assume that the two GW downs would not have been happening if zdebug had been used instead of fw ctl debug. But this in only your - unprooven - thought, not the truth...

CCSP - CCSE / CCTE / CTPS / CCME / CCSM Elite / SMB Specialist

the_rock · ‎2023-01-06

Im so glad we are having this discussion @G_W_Albrecht . To answer your post, no, it was always done in maintenance window, NOT working hours and 2 stories I told, they are just 2 that stick in my mind, but there at least dozens more. And yes, I am confident had zdebug been used, this would have never happened, because many times, with TAC on the phone, we did zdebug on boxes running 99% CPU and never an issue, so if thats not good enough proof, then not sure what is.

@Tobias_Moritz Yes, I read @_Val_ link. I think it does bring up some good points, but again, if you guys believe those flags are outdated and should not be used, maybe everyone at CP should get on the same page and publish an official sk about it, so there is no confusion.

All I can say is this...and its 100% honest feedback. I had seen other TAC engineers from major vendors (PAN, Fortinet, Cisco) do debugs on their firewalls in production hours all the time and not a single crash.fw lock up.

I cant even count how many times when I talked to TAC and they mentioned kernel debugs, I would tell them one of those 2 stories from my last post and answer would be sort of "Well, I hear you, but escalations/R&D may need those debugs down the road". That does not inspire confidence to the customer if it will cause their firewall to crash...so, even TAC engineers know its bad.

Again, Im not trying to diminish the steps, all Im saying is I wish there were better steps to collect those debugs, thats all.

_Val_ · ‎2023-01-06

@the_rock Oh boy, I do not know what else I can tell you, other than beg you to re-read what is already said here.

G_W_Albrecht · ‎2023-01-08

So what is your issue with a forced crash in maintenance mode anyway ? I see the point with debug of remote units that are far away, though.

CCSP - CCSE / CCTE / CTPS / CCME / CCSM Elite / SMB Specialist

the_rock · ‎2023-01-08

I know what you mean, thats why I always make sure someone is on site now when doing any kernel debugs...better be safe than sorry.

G_W_Albrecht · ‎2023-01-06

Not any news here - why did you post this very old information ? Kernel debugs with a small buffer like here remember me of R65 debugs...

CCSP - CCSE / CCTE / CTPS / CCME / CCSM Elite / SMB Specialist

shais · ‎2023-01-08

Hi,
Let me share some info regarding the above and the difference between the debug commands.

zdebug command is a wrapper for the full debugs, it will run the debug commands for you and will allow you to run debug from one debug module only.
By default, it will use a small debug buffer but if you wish, you can provide the "-buf" command to use your own size.

There are multiple tools inside our debug infra that can help when debugging the GW, for example:

Debug filter - allow you to run the debug on a specific connection (reducing the CPU load caused by the debug)
Pre-processing grep - allows you to filter for specific words in the debug, similar to grep command you run but as it's been done pre-processing, it reduces the CPU used by the debug.
Allow running debug flags on multiple modules.
And more.

So using zdebug will utilize the same debug infra but prevent you from using advanced debugging features.

If you are interested in debugging drops on your GW then I would suggest starting first with drop statistics

Cpview contains information regarding drops that happen on the GW

In addition, we have a new method to run debug using the "easy_debug" command which allows you to run debug commands without knowing the full CLI commands.

Garrett_DirSec · ‎2023-05-16

Hello @shais and @PhoneBoy -- side topic that has likely be fully covered elsewhere. Why were the standard clustering and firewall maintenance commands deprecated and then re-introduced?

I encountered a firewall yesterday during troubleshooting that told me "fw ctl zdebug + drop" but decrepated and I was so taken aback I was momentarily speechless (expecially after completing the R81 CCTE -- troubleshooting -- class in past months).

how to restore commands when encounter "deprecated" msg:

https://support.checkpoint.com/results/sk/sk177685

Dynamic CLISH commands and "new" options:

https://support.checkpoint.com/results/sk/sk144112

updating with additional resource (to keep everything in one place to benefit others doing searches):

The Easy_Debug tool (introduced with R81.10):

https://support.checkpoint.com/results/sk/sk173024

ikafka · ‎2023-05-05

Hi,

We cannot use use these commands on 1600 series devices. Could it be because his device is in bridge mode?

PhoneBoy · ‎2023-05-05

fw ctl zdebug should be usable on SMB appliances in expert mode.
What precise command(s) are you executing and what precise output did you receive?

Are you a member of CheckMates?

Troubleshooting dropped packets in Checkpoint using zdebug