- CheckMates
- :
- Products
- :
- General Topics
- :
- Re: This debug command just about bricked our fire...
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Are you a member of CheckMates?
×- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This debug command just about bricked our firewall
Hi all,
We were trying to obtain information about an IPS false positive, we have a ticket open with TAC. The Engineer advised us to run the following debug when we test:
fw ctl debug 0
fw ctl debug -buf 32768
fw ctl set int cmi_dump_buffer 1
fw ctl debug -m cmi_loader all
fw ctl debug -m WS info session global spii policy module ssl_insp connection pkt_dump address
fw ctl debug + cmi
fw ctl debug -m kiss pm kw
fw ctl set int https_inspection_show_decrypted_data_in_debug 1
fw ctl kdebug -f > kernel_debug.txt
So I turned the debug on and asked the Customer to start the data transfer. I had the debug on for a 3-4 minutes max, stopped it with ctrl+c after the transfer finished, and thought nothing of it.
Well, issues start appearing, alarms going off, connectivity issues. I check the .txt output file and it had taken up the entire storage of the firewall. So I removed the .txt file, and thought this would resolve the issue.
Issues continue..
I check the CPU utilization of the firewall, it's running at 1500%+ so the debug is obviously still working. I run fw ctl debug 0, this brought the CPU back down, normality restored.
In hindsight I should have observed CPU/Storage utilization the moment I enabled the debug. But I'm a little frustrated that we were advised to run this with no warning from the Engineer to be honest. It's an important production firewall, so running such a resource intensive debug even for a few seconds wouldn't have been done.
Anyways, all is good in the end. Just another tale of caution to add to my repertoire! Do not trust unknown debugs.. this will become my daily mantra.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have seen this happen a couple of times, too - so no kernel debugs without a maintenance window ! Usually, debug procedures include fw ctl debug 0 as the last step.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, fw ctl debug 0 is a must after data collection is complete.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Kernel debugs should always be done during a maintenance window and properly closed as mentioned.
This is explicitly described in the admin guide as well.
In the same vein I've seen tcpdump bring firewalls to a standstill, cppcap is recommended instead.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Alex, interesting with the tcpdumps as well. I imagine if it's a particularly large scope in the tcpdump it can do this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you cancel a debug with CTRL+C will give a warning that you should use fw ctl debug 0 , instead.
And indeed what the previous post say, always monitor CPU and file size during debug and if you do not trust -> maintenance window.
If you like this post please give a thumbs up(kudo)! 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yep, lesson learned! I will not be taking debugs lightly again that is for sure..
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Im actually shocked if TAC engineer did not tell you to run this after hours, because any kernel debug can actually cause issues like this. I always tell customers if this has to be done, to have someone on site, just in case. Been through it before and let me tell you, it was NOT a pleasant experience.
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes I know I can't put all the blame on the TAC Engineer, but I suppose it brought my guard down somewhat with the debug being provided by them. Safe to say I'll be treating any debug as needing a downtime window moving forward!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Put it this way....I learned long time ago when I hear words kernel debug, I stop asking any further questions. It means 100% of the time it will have to be done outside normal hours, preferable with console access available. Im just glad your firewall did not hose completelly...phew.
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Me too! On reflection the situation could have been much worse, so I'm grateful for that at least haha! There's nothing like an almost fatal production outage to remind you of the fundamentals!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I remember one time poor guy from R&D caused a fw to crash because of debug he gave customer to run in production and said "Its 100% safe"...yeahhh, not so much.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Generally TAC will always try to be helpful and give you the commands you need for the raised issue but they don't and can't know all the specifics of the environment, so it's also up to the customer to assess if it's OK to run a debug at 10:00AM when it's full production time. 😄
Bricking your FW can be a rite of passage for a CP engineer, like removing an access-list on a Cisco router before removing the access-group on the interface first - you need to do it at least once to never do it again. 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It was probably honest mistake or person simply did not know debug like that is resource intensive and could cause issues. My experience that 9 times out of 10, TAC people will tell customers to always run kernel debugs off hours.
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I definitely feel like a fully qualified CP engineer now then 😅 I've done the dreaded "switchport trunk allowed vlan x" before on a cisco switch, missing the "add" syntax, and removing all the previous allowed VLANs!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I actually hesitated to post that example, both the switchport allowed vlan x add an no access-list x seem to be obligatory Cisco experiences. 😛
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I like that @Parabol ...FQCPE = Fully qualified Check Point engineer 🤣🤣
Andy
