Most Frequent Errors in Checkpoint Firewall Admini...

Yuri_Slobodyany · ‎2017-09-12

In 10 years of my daily work with Checkpoint firewalls I have been to many troubleshooting sessions. Different versions, different topologies and technologies brought different issues with it, but what remains constant is the mistakes people do. Here is my ‘top’ list of the frequent errors which beginners (and not so much) do. The list relates to the firewall versions R55-R77.30 with the added remediation means of R80, thanks to Tomer Sole's remarks on CheckMates CheckPoint Community forum as I myself haven’t gathered enough experience with this version yet. Let’s begin.
Note: It is the same article I posted on my LinkedIn profile, so if you read it there you won't find any news here, I post it here for the benefit of the community .

Deletion of an object which is being used (cannot happen in R80 by design).

It happens especially with sysadmins coming from the Windows world where warnings are frequently ignored and you just click through them. In my experience this one can be most dangerous (read below). Checkpoint allows admins to delete the object being used in Security Rules while warning about the consequences. Not everyone unfortunately reads and follows them. My recommendation when getting such warning is NOT to delete the object, but to check all the Rules where it is being used, remove it with due diligence from them and only then to delete it. This warning message has a button named “Where used?”, which if pressed, will show all the places this object is part of. The consequence of deleting such an object is that in places where it is used the system will automatically replace it with the object “Any”, which in most of the cases not what the admin wants.

The real life case to illustrate the importance follows. I got a call from a client complaining on excruciatingly slow internet connection for the whole company – slow load time for pages, mails are stuck in the queue, high pings. After few short checks it was clear the company line to the Internet was overloaded 100%. The situation didn’t get better even when the client disconnected his LAN completely. Few more checks excluded the possibility of a DOS attack and by all signs seemed like the huge traffic in outbound (upload) direction was initiated by the firewall itself.

The SmartView Tracker logs showed lots of outbound SSH connections from the firewall to different IP addresses on the Internet. It became clear what happened when I logged in to the firewall via SSH and did the admin folder listing – there were Bash scripts like bruter.sh, uploader.sh and various files of large size with the names of the popular at that times movies. The firewall was being used as an SSH brute force Linux host and a storage for pirated movies. To understand how it happened I looked back in SmartView Tracker Audit tab (in newer versions it is called Management) and saw the following chain of events:

1) there was a Security Rule for admin access to this standalone firewall (the names and IPs are fictitious and have nothing to do with the real case):

 Where Vova_PC  is the object representing PC of the previous admin that left the company 
and Corporate-gw/management objects represent the firewall itself.

2) Then came some new admin and decided to do a ‘clean-up’ of the firewall objects, deleting this Vova_PC object as well. Sure, he got the warning like that, but decided to ignore it:

3) After installing the Policy (seemingly) nothing bad happened except that the aforementioned rule turned into:

Thus allowing SSH/HTTPS/CPMI access from any IP address. It didn’t help that the default OS user admin had the easy to guess password of qwe123/q1w2e3 kind. According to the logs their firewall was broken into about an hour after installing this Security Policy. They were ‘lucky’ that intruders just used this firewall as a Linux server and didn’t expand their presence into the LAN.

Using Dynamic Object in Security Rules to block access to web sites.

URL filtering has been available with Checkpoint for quite a long time, only that it requires appropriate license. Some resourceful admins decide to get the same functionality for free. The SmartConsole has a network object tree called Dynamic Objects which seems to be exactly what is needed. Let’s say the admin wants to block access to the Facebook. When he stumbles in the Dashboard on that tree, right clicks and picks Create New Dynamic object, the Name field looks just right. So he goes on and creates such object:

Which then uses in the Security Rule like that:

So far so good. Next he installs this policy and … usually network goes down and he loses the administrative access to the firewall (at least for the standalone installation). What happens is that Dynamic Object was meant to be a ‘logical placeholder’ in the Rule Base only to be used after being defined on the command line (in Expert mode) with the command dynamic_objects. The definition like that has to include its IP address as well. In our case, on the other hand, when creating this object in SmartDashboard as illustrated leaves this object undefined and from my real life experience causes 100% CPU load of the firewall .

The recommendation in this case is not to use something you are not fully aware of what it does. The variation of this problem is using Domain Object which does resolving and may overload CPU of the firewall as well. According to Checkpoint folks in R80 the Domain Object was improved and will now behave better.

Not realizing that Initial Policy on firewall install leaves HTTPS/SSH open from Any.

This one I can attribute to the Checkpoint documentation which is vague about what happens upon finishing First Time Configuration Wizard (FTW) and before the first security policy is created and installed. The documentation says that the preconfigured Initial Policy is automatically applied and “These rules forbid most communication yet allows the communication needed for the installation of the security policy”. And in no other place does it tell that actually this policy opens HTTPS/SSH access from ANY IP address! So basically just after finishing FTW and before installing some new security policy the firewall is open via SSH/HTTPS to brute force attacks and the only thing standing in the attackers way is the password you set for the default admin Gaia user during the install. So this is the 2nd reason to have strong passwords for any administrative user in the firewall . The best practice is during install change the default admin user to something else (although it depends on the Checkpoint version, R77.30 for example does not offer such option). The other moral of this is that be very careful doing the install remotely when the newly installed firewall is connected to the Internet.

Installing Security Policy on the wrong firewall.

In the Distributed type of set up one management server can hold many Policy Packages for different firewalls. It happens that when you log out of the SmartConsole having some specific Policy open – the next user logging in will see this Policy, possibly thinking of installing completely different one on a different firewall. Human errors happen and if not attentive one can install already open Policy to the wrong firewall. This usually causes downtime for the wrong firewall. You can easily fix it by installing the correct Policy, if you still have the ability to push to it. The ounce of prevention comes in Policy Installation Targets menu:

In which you pick the firewall this policy will be installed to:

That is, all future problems with that are prevented. In R80 you can open SmartConsole.exe.config file and change <add key="OverridePolicyWarningEnable" value="false"/> from false to true (tips about R80 again are from Tomer, thx).

Not checking available disk space of the firewall hard disk.

This is the first check I do before starting any debug of any issue – check available disk space. The catch here is that insufficient disk space may manifest in seemingly unrelated symptoms:

Impossible to install Security Policy getting general error “Could not install policy, error 0x123456 ..”
Impossible to open logs in SmartView Tracker, or logs are empty
Impossible to update IPS/URL Filtering/App Control – and again, the error message doesn’t mention the disk space at all.
Impossible to connect to the Management server with SmartDashboard
and the list goes on …

It is important not to forget that Checkpoint firewall is, after all, a Linux based server which requires free disk space to do the common stuff – to download and extract updates/archives from the Internet, to process and consolidate logs, to decrypt/encrypt files, to update local files with new information. In addition, especially with earlier versions, every Hotfix/upgrade used to leave after itself a lot of residual files that would slowly eat up all the root partition space. New hotfixes and upgrades are more careful and remove any leftovers. My subjective opinion – leave at least 1 Gb free space available on the root partition “/” where the operating system is installed. The easiest way to assess the situations is with df -h:

Using simple passwords for administrative users.

This comes as a surprise, after all we are talking about firewall the security device, but reality is that I’ve seen enough of the firewalls with OS level username admin having passwords of type qwe123/q1w2e3/etc. First, default users in general – the Checkpoint have been a bit inconsistent with that: depending on the firewall version the default admin user could/could and was encouraged to/was impossible to be changed during the installation. There is nothing magical about this admin user – we can assign the same permissions to some custom user and delete this one. The problem though is human factor. It may happen for many reasons, for example an integrator installing new firewall for a client has to type this password few times/places during the install and to ease on himself he uses a simple password thinking “Never mind, I will change it after the install”, of course forgetting to do so. I’ve seen firewalls that were first installed with such crazy simple passwords back in version R55 and then for 10 years the firewall was upgraded without changing the password because of the fear that something bad may happen.

As I mentioned before such easy to brute force password may have disastrous consequences. I remember once I had such firewall with easy to guess password for admin user. The firewall was managed cooperatively by us and the client and he refused, never mind my reasoning, to delete the admin username or even change the password. To still somehow secure the access I had to come up with a creative solution on how to limit admin access by SSH on the OS level, you can read it here https://yurisk.info/2011/04/05/two-tips-to-secure-ssh-access-from-specific-ips-to-specific-users-in-... but it really is a workaround, so better stick with the best practice and replace this admin user at all, or at least set an insanely complex password.

Forgetting to disable acceleration (SecureXL) before starting to debug.

This happened to me and even to the Checkpoint Support (today they usually use a ready-to-use Bash script to run debug which includes disabling SecureXL by default). SecureXL accelerates packets processing by firewall and does so by bypassing the usual full-blown firewall modules chain after the initial connection set up. It is more complex than that, read the Checkpoint documentation for the exact description but the end result is that not every packet is seen at every module. The consequence being that if we run fw monitor on say accelerated TCP traffic we will see just session establishment and not the data traffic itself. The capture like that is worthless for debug, so always make sure to disable the SecureXL before running the debug via (fwaccel stat / fwaccel off / fwaccel on). This of course means that load from accelerator will move to the CPU so be careful not to overload the firewall.

Not synchronizing firewall time via NTP.

It is really exasperating when solving some case and trying to correlate information with the firewall logs only to learn that firewall clock is out of sync which makes logs useless. It is best practice for any equipment, but for a firewall I see it as a must. The thing with the time drift is that it is non-linear, so looking at the drift at any moment does not allow us to extrapolate back in time what the difference was say a week or a month ago. Checkpoint, being the Linux based system has had NTP client capabilities since like forever, so make sure to enable it as soon as possible. On Gaia clish we can enable it easily:

firewall> set ntp server primary 13.13.13.1 version 2

firewall> set ntp server secondary 23.23.23.1 version 2

firewall> save config

Not verifying saved system backups.

Checkpoint has many ways to back up its configuration – on the cli, via Gaia portal, save it locally or upload automatically to a server in the LAN; in addition you can always back up by any tool of your own choosing. Most important is to back up Management server naturally. If the Management server is a virtual machine then it is easier – just do snapshots and you will be fine. My caution is about backing up using built-in tools of the Checkpoint or using your own scripts/software – verify your backups regularly by going through the complete recovery procedure. This advice comes from a case.

I got a client with a failed hard disk SmartCenter. The server became unbootable complaining about hard disk errors. The client wasn’t stressed too much as he had automatic backup running on this SmartCenter server weekly uploading backup file to the FTP server. The client with his integrator installed fresh SmartCenter server and now were trying to import the backed up configuration with upgrade_import utility. The problem was they were getting error that file cannot be open. The worst part was that they tried all the backup files and none of them worked. I tried to open those tar archives myself and it didn’t work as well – the files were damaged, also they were suspiciously small. In the end the client brought specialists that restored the SmartCenter directly from the damaged hard disk hardware.

The point of this story is that Checkpoint is a Linux server and as any server to do back up it needs to gather some files, add them to a tar archive then compress them before transferring and probably many more. At any of these stages something can go wrong – not enough space in /tmp folder, some faulty FTP server where backup is stored, faulty disk driver. The only way to make sure the backup was successful is to try and restore the system from the backup.

Using Reject instead of Drop in the Rule base.

The “Reject” probably sounds better to an ear so some folks in the very beginning of their administrator path might chose this option over the “Drop” action. The difference is that with “Reject” firewall not just stops the connection from being established but also sends notification about this to the initiator. It is of course unnecessary from both security and firewall resource saving standpoints. Just use “Drop” action when needed.

Restarting the whole firewall when only the Management restart is needed.

Admins sometimes forget that firewall and management are different and independent software components. It is true even when both are installed on the same physical/virtual machine as a Standalone type of installation. So when need arises to restart only the Management server (SmartCenter in older terminology) do not forget this and do the restart with the following commands and not reboot:

Shut down the management service:

cpwd_admin stop -name FWM -path "$FWDIR/bin/fw" -command " fw kill fwm”

Start it again:

cpwd_admin start -name FWM -path "$FWDIR/bin/fwm" -command "fwm”

Not using the “insurance” against configuration errors – Database Revision Control (it is now done automatically in R80).

This feature has been available from the very beginning and still I see a lot of administrators do not use it and at their own peril. This function allows us to save all the firewall objects and rule base, later to be restored to the saved state. This way if some misconfiguration happens that affects the firewall it is just a matter of few clicks and installing the policy to return to the good known state. The database backup can be made either manually any time we want or set to be done automatically on each policy install. The only possible concern is about the hard disk space each backup takes, but even if the disk space limited and those backups do not take much space, we can configure to keep just enough backups back in time. We can configure this option by going to Launch Menu -> File -> DataBase Revision Control:

Then in the window that opens set Create a new Database version upon Install Policy operation:

To restore the database click on Action -> Restore Version…

That’s it for today hope this helps you in a practical way to learn from others’ mistakes, not from your own. Keep your firewalls and yourself safe, arrivederci.

https://www.linkedin.com/in/yurislobodyanyuk/

Norbert_Bohusch · ‎2017-09-12

I have a comment to the first common error, that there is another thing to always check before deleting unused objects: Automatic NAT

And this can even happen in R80.10!

You might have configured some hosts or small networks to access the internet in your policy, but the hide-NAT happens on a bigger network-object not used in the policy. If you now delete the bigger network-object, there will be no warning but your NAT is not in place anymore!

Are you a member of CheckMates?