Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Timothy_Hall
Champion
Champion

Content Awareness Overflowing /var/log/jail and /var/log/dlp Directories

I've now encountered two different clients that are running out of disk space in /var/log on their gateways, but there can be no monster files found for cleanup using something like find /var/log -type f -size +1000000 -ls.  It turns out both are using Content Awareness which is leaving tens if not hundreds of thousands of orphaned files under /var/log/dlp and /var/log/jail.  There are so many files I can't even list them or get a du -s sum of disk space used in those directories, because the sheer number of files hangs the commands I run trying to deal with those gigantic directories.  One client is R81 and the other is R81.10, and they are behind about 9 months on their JHFAs.  It seems similar to these apparently "rare" issues that are fixed in various Jumbo HFAs:

PRJ-30445,
PRHF-17552

Threat Prevention

In a rare scenario, the DLP process leaves open unused file descriptors in the $FWDIR/tmp/dlp directory which may take up a large amount of disk space.

 

PRJ-34645,
PRHF-21416

DLP

In a rare scenario, the DLP process may not delete temporary files used for scanning. 

 

My R81 client installed Jumbo HFA Take 79 in hope that the below would automatically clean up the mess:

 

PRJ-30606,
PRHF-18893

DLP

UPDATE: Added temporary files cleaner for file converting operation.

 

However it has been 24 hours and the hundreds of thousands of files are still there.  Can someone please tell me how to manually invoke the file cleaner mentioned above or provide another cleanup procedure?

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
6 Replies
the_rock
Legend
Legend

Hey Tim,

I worked with customer back in summer of last year who had same problem (except not nearly as many files as your clients) and the way we solved it with TAC was uncheck content awareness blade (leave all rules in place), reboot the fw, re-check, reboot again and ever since then its very stable, no issues.

I wish we would have gotten the actual reason why this was happening in the first place, but that was never provided.

Not saying it would work for your clients, but thats what fixed it for us.

0 Kudos
HeikoAnkenbrand
Champion Champion
Champion

The scrubd (Threat Extraction daemon) also creates orphaned files under /var/log/jail.
I have also had problems here with some customers environments.
In newer R8x versions the scrub_cleanup script runs every day at the specific time and delete all files in zombie state.
Maybe this script can help you to clean up the directory.

➜ CCSM Elite, CCME, CCTE
0 Kudos
Timothy_Hall
Champion
Champion

Just to follow up on my original thread, I had two clients with this issue.  In both cases disabling Content Awareness, reinstalling policy, rebooting and re-enabling it did seem to clean up files under /var/log/jail (which is strange as I thought only Threat Extraction used that directory) but there were still thousands/millions of files in /var/log/dlp/ftp that we were able to manually clean up with the following procedure, and the thousands of files do not seem to be returning after the cleanup as both customers are running a relatively recent JHFA.

THE FOLLOWING COMMAND IS DANGEROUS AND IF YOU MAKE A TYPO IT CAN DESTROY THE ENTIRE FIREWALL!  TAKE A SNAPSHOT, THEN TAKE A BACKUP OF THE FIREWALL AND EXPORT THE BACKUP VIA BROWSER, AND THEN ENSURE THE BACKUP FILE CAN BE SUCCESSFULLY BROWSED WITH 7-ZIP OR SIMILAR!

1) On the standby member of the cluster, run this command: 

find  /var/log/dlp/ftp/*  -mtime  +5  -exec  rm  -rf  {}  \;

VERIFY THIS COMMAND HAS NO TYPOS BEFORE RUNNING IT!  USE AT YOUR OWN RISK!

2) At one client site this took over three hours to complete, at the other one it only took about 20 minutes.  You may see a few error messages about trying to remove directories that do not exist, these can be ignored.  Once it completes, reboot the standby.

3) When the standby comes back up, make sure that it rejoins the cluster as STANDBY (not DOWN/PROBLEM).  I also poked around in $FWDIR/log/dlpu.elg looking for errors and ran cpwd_admin list to ensure the dlp* daemons were stable and not experiencing issues.

4) Fail over to the cleaned member, and verify basic functionality via manual tests or observing your Network Monitoring System.

5) Once validation has been achieved, repeat the above steps on the other firewall and fail back onto it and retest.  You may need to set ClusterXL to "maintain active member" ahead of time to ensure that a failover does not happen until you are ready.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
abihsot__
Advisor

I also got into this situation with 200k files there (ftp folder specifically), although on R80.40. Thank you for the post

0 Kudos
Timothy_Hall
Champion
Champion

Just to follow up, the customer needed a special hotfix to resolve this.  The hotfix ensures that the in.msd (commtouch) daemon is started even if the Anti-Spam & Email Security blade in not enabled (which very few sites utilize in my experience).  Apparently the automatic cleanup routine for these directories is implemented by this daemon, but by default if you don't have the Anti-Spam & Email Security blade enabled the daemon is not automatically started.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
abihsot__
Advisor

Thanks for the heads-up. We do not use Anti-spam/email security blades you mentioned. But first things first - journey to R81.10 🙂

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events