Re: log behavior under attack

Wolfgang · ‎2021-07-06

Hello CheckMates,

how about the logging behavior if a gateway is under attack?

- How many logs could be written to SMS ?

- Are there any log entries if logs could not be written (/var/log/messages "lost xxxx messages" or something else) ?

This is interesting to me only for normal rulebase logs, not something configured for DOS-protections like "fwaccel dos pbox". These type of blocked packets are most of the time not logged or configured to create only one entry per second.

I understand the behavior depends on the hardware of the gateway and logserver. But as an example if an attacker sends 100.000packets/s normally 100.00 log entries should seen in SMS. I'm quiet sure no logserver can handle this 😉

Are there any limitations on the gateway or SMS how many logs could created?

Any experience with values from production?

Timothy_Hall · ‎2021-07-06

You have posed an interesting question, and the answer on the gateway side depends on a number of factors. Most logs are generated in the INSPECT/Worker instances, and then shuttled to the fwd /fw_full process on the firewall which handles the transport of logs to the SMS/Log Server via SIC on TCP/257.

Let's start on a firewall running in the traditional "kernel mode", where the INSPECT engines (Firewall Workers) are located in the kernel of the Gaia OS. In this case the Dispatcher and INSPECT instances are all running in the kernel and have the ultimate power of preemption over the CPU, so if they are very busy or under attack the fwd/fw_full process can be starved for CPU and not be able to transport logs in a timely fashion. This is normally indicated by a "FW-1: Log buffer is full FW-1: lost N log/trap messages" log entry in /var/log/messages, as the fwd/fw_full process could not empty the fw_log_bufsize buffer fast enough before it was overflowed by logs pouring into it from the kernel. This buffer size on the gateway can be increased which may help with short bursts of logs, but if there is a prolonged period of an extremely high number of logs, even the increased buffer can overflow. You can read more about increasing the buffer here:

sk52100 - /var/log/messages shows 'log buffer is full'

The newer firewalls (like Quantum) run in User Space Firewall (USFW) mode, where the INSPECT instances are implemented in process space as fwk processes (SecureXL/sim is still located in kernel space). In this case the fwd/fw_full process is on more of an equal footing with the INSPECT instances as far as CPU access since they are both processes, so log buffer overflows are a bit less likely to happen.

However on systems with more than 20 total cores and using USFW, fwd is automatically given its own dedicated CPU core to ensure that it has enough CPU to accomplish its mission. I first noticed this on a 28-core Quantum firewall that had a default split of 4/23 which made no sense. Upon investigation it was revealed that one of the cores that would otherwise be assigned as an INSPECT instance was affined to fwd/fw_full by default.

So after that long-winded explanation, on the gateway it really comes down to the fwd/fw_full process and its ability to access CPU resources in a timely fashion. This process is rather old and I believe is single-threaded; the log transport mechanism for Check Point gateways has really not changed much over the years.

The SMS side also uses the fwd process to receive the gateway logs and write them to disk; the issue here does not tend to be CPU but the speed of the disk I/O path. If the disk is overwhelmed (especially in VMWare or Cloud environments) this can lead to sizable delays in getting the logs written to disk, and even SOLR indexing issues that keep new logs from appearing in the SmartConsole in a timely fashion.

Attend my 60-minute "Be your Own TAC: Part Deux" Presentation
Exclusively at CPX 2025 Las Vegas Tuesday Feb 25th @ 1:00pm

PhoneBoy · ‎2021-07-06

One of the items in the R81.10 release notes is:

Enhancements to logging services stability.

Which makes me wonder what changes happened under the hood related to this.

Timothy_Hall · ‎2021-07-07

Well I've seen that exact statement in release notes many times over the years, but it still seems like I run into scenarios where logging stops either on a Log Server or on a specific gateway, and restarting fwd is the only thing that fixes it. This is even mentioned explicitly in the CCTA courseware. Kind of wish Check Point would reimplement fwd as a multi-threaded process, or at least separate the log handling functionality of fwd into a separate daemon as fwd has a lot of other responsibilities on a gateway, such as being the parent process of security server daemons. Something like the ongoing replacement of the fwm process on SMS/MDS's with cpm among others.

Attend my 60-minute "Be your Own TAC: Part Deux" Presentation
Exclusively at CPX 2025 Las Vegas Tuesday Feb 25th @ 1:00pm

Miri_Ofir · ‎2021-07-10

In R81.10 we added several code changes to improve the stability of logging processes.

1. fwd process is more resilient and will not terminate when receiving bad input. We are working now on R&D labs to reimplement the fwd as multi-processes and separate the logging logic from main fwd, exactly what @Timothy_Hall is wishing, I hope this will be available in future releases.

2. We added logic to prioritize Indexing of online logs, while we limit the indexing of offline logs. This logic will reduce the impact of sending huge amount of logs to Log Server. @Wolfgang - hope this is answering your question.

In our largest machines (Smart-1 6000-XL) we support sustained indexing rate of 40K log/sec, higher log rate will create backlog, which will be handled in a rate of 1k per sec (configurable).

3. We added periodic validation of syncing newly created Log servers and Gateways across Log Servers, to ensure visibility of logs over all machines (remove the need to run full dbsync in some cases)

Vladimir · ‎2021-07-07

You can limit the logging rate for DoS traffic using fwaccel dos config set --notif-rate XX (See sk112454)

Have to mention that I really dislike the options of using samp or fwaccell rules due to the lack of apparent visibility.

If there would've been a flag in the policy view that would indicate that they are present, it would make their utility much better.

And since the term "single pane" is being used in the marketing, I'd like to see those in that pane as well.

P.S. I believe you can use SmartEvent policy for the handling of DoS. It will create sam rules and you can specify the option of either to log those or not. Do not remember if the logging rate is definable in there.

Are you a member of CheckMates?

log behavior under attack