Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Parabol
Contributor

This debug command just about bricked our firewall

Hi all,

We were trying to obtain information about an IPS false positive, we have a ticket open with TAC. The Engineer advised us to run the following debug when we test:

fw ctl debug 0
fw ctl debug -buf 32768
fw ctl set int cmi_dump_buffer 1
fw ctl debug -m cmi_loader all
fw ctl debug -m WS info session global spii policy module ssl_insp connection pkt_dump address
fw ctl debug + cmi
fw ctl debug -m kiss pm kw
fw ctl set int https_inspection_show_decrypted_data_in_debug 1
fw ctl kdebug -f > kernel_debug.txt

So I turned the debug on and asked the Customer to start the data transfer. I had the debug on for a 3-4 minutes max, stopped it with ctrl+c after the transfer finished, and thought nothing of it.

Well, issues start appearing, alarms going off, connectivity issues. I check the .txt output file and it had taken up the entire storage of the firewall. So I removed the .txt file, and thought this would resolve the issue.

Issues continue..

I check the CPU utilization of the firewall, it's running at 1500%+ so the debug is obviously still working. I run fw ctl debug 0, this brought the CPU back down, normality restored.

In hindsight I should have observed CPU/Storage utilization the moment I enabled the debug. But I'm a little frustrated that we were advised to run this with no warning from the Engineer to be honest. It's an important production firewall, so running such a resource intensive debug even for a few seconds wouldn't have been done.

Anyways, all is good in the end. Just another tale of caution to add to my repertoire! Do not trust unknown debugs.. this will become my daily mantra. 

0 Kudos
16 Replies
G_W_Albrecht
Legend Legend
Legend

I have seen this happen a couple of times, too - so no kernel debugs without a maintenance window ! Usually, debug procedures include fw ctl debug 0 as the last step.

CCSE / CCTE / CCME / CCSM Elite / SMB Specialist
0 Kudos
_Val_
Admin
Admin

Yes, fw ctl debug 0 is a must after data collection is complete. 

0 Kudos
Alex-
Leader Leader
Leader

Kernel debugs should always be done during a maintenance window and properly closed as mentioned.

This is explicitly described in the admin guide as well.

https://sc1.checkpoint.com/documents/R81.10/WebAdminGuides/EN/CP_R81.10_Quantum_SecurityGateway_Guid...

In the same vein I've seen tcpdump bring firewalls to a standstill, cppcap is recommended instead.

0 Kudos
Parabol
Contributor

Thanks Alex, interesting with the tcpdumps as well. I imagine if it's a particularly large scope in the tcpdump it can do this.

0 Kudos
Lesley
Advisor
Advisor

If you cancel a debug with CTRL+C will give a warning that you should use fw ctl debug 0 , instead.

And indeed what the previous post say, always monitor CPU and file size during debug and if you do not trust -> maintenance window. 

-------
If you like this post please give a thumbs up(kudo)! 🙂
0 Kudos
Parabol
Contributor

Yep, lesson learned! I will not be taking debugs lightly again that is for sure..

the_rock
Legend
Legend

Im actually shocked if TAC engineer did not tell you to run this after hours, because any kernel debug can actually cause issues like this. I always tell customers if this has to be done, to have someone on site, just in case. Been through it before and let me tell you, it was NOT a pleasant experience.

Andy

Parabol
Contributor

Yes I know I can't put all the blame on the TAC Engineer, but I suppose it brought my guard down somewhat with the debug being provided by them. Safe to say I'll be treating any debug as needing a downtime window moving forward!

0 Kudos
the_rock
Legend
Legend

Put it this way....I learned long time ago when I hear words kernel debug, I stop asking any further questions. It means 100% of the time it will have to be done outside normal hours, preferable with console access available. Im just glad your firewall did not hose completelly...phew.

Andy

0 Kudos
Parabol
Contributor

Me too! On reflection the situation could have been much worse, so I'm grateful for that at least haha! There's nothing like an almost fatal production outage to remind you of the fundamentals!

the_rock
Legend
Legend

I remember one time poor guy from R&D caused a fw to crash because of debug he gave customer to run in production and said "Its 100% safe"...yeahhh, not so much.

0 Kudos
Alex-
Leader Leader
Leader

Generally TAC will always try to be helpful and give you the commands you need for the raised issue but they don't and can't know all the specifics of the environment, so it's also up to the customer to assess if it's OK to run a debug at 10:00AM when it's full production time. 😄

Bricking your FW can be a rite of passage for a CP engineer, like removing an access-list on a Cisco router before removing the access-group on the interface first - you need to do it at least once to never do it again. 🙂

0 Kudos
the_rock
Legend
Legend

It was probably honest mistake or person simply did not know debug like that is resource intensive and could cause issues. My experience that 9 times out of 10, TAC people will tell customers to always run kernel debugs off hours.

Andy

0 Kudos
Parabol
Contributor

I definitely feel like a fully qualified CP engineer now then 😅 I've done the dreaded "switchport trunk allowed vlan x" before on a cisco switch, missing the "add" syntax, and removing all the previous allowed VLANs!

0 Kudos
Alex-
Leader Leader
Leader

I actually hesitated to post that example, both the switchport allowed vlan x add an no access-list x seem to be obligatory Cisco experiences. 😛 

0 Kudos
the_rock
Legend
Legend

I like that @Parabol ...FQCPE = Fully qualified Check Point engineer 🤣🤣

Andy

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events