Dynamic logs distribution

Simon_Macpherso · ‎2022-03-01

Can someone please outline how Dynamic logs distribution is configured in R81.10. The documentation on this feature is limited and a bit unclear.

My understanding is that by enabling this feature, as an example, I could configure 2 primary servers and 2 backup servers. The gateway would distribute the logs it sends between each of the primary servers i.e. not send a copy of each log to each server, but send logs to one server which are then distributed equally between all of the configured primary servers. In the unlikely event both primary log servers become unavailable, it would follow the same methodology on the back up servers.

If this summation is correct, I'm assuming if one of the primary log servers fails, all load is then assumed by the remaining primary server i.e. the logs aren't distributed to one of the configured back up servers.

Also if logs are sent to one primary server then distributed to the other primary servers, which server is selected as the server logs are sent to? Can a specific server be selected i.e. prioritized, for this role?

-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-

Below is the description from the R81.10 logging admin guide.

"With Dynamic Log Distribution, you can configure the gateway to distribute logs between the active Log Servers. Previously, each Log Server received a copy of every log. If one Log Server was disconnected, the gateway connected to the backup server and sent it a copy of every log. Now you can configure that each log is sent to only one Log Server and distribute the logs between the primary Log Servers. If all the primary servers are disconnected, logs are distributed between backup Log Servers. If no Log Servers are connected, the gateway writes the logs locally."

-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-

I currently have one dedicated log server configured as primary, and one back up log server which is the management server.

I will soon be introducing a second dedicated log server.

I want to configure both servers as primary to efficiently utilize resources.

My initial thought was to distribute logs between each log server by pointing each gateway to a specific log server i.e. split the gateways 50/50 between each servers, and configure the non-primary server as the backup.

Though id dynamic logs distribution works they way I've outlined above, it would allow me to efficiently utilize the resources allocated to the dedicated servers. In this scenario I would configure both dedicated log server and primary, and both management servers (active and standby) as backup.

Log indexing will continue be enabled on all servers.

Simon_Macpherso · ‎2022-03-01

Further to this, I have log exporter configured on the current primary log server.

If dynamic log distribution is configured, would a second log exporter also need to be configured on the other new primary server?

Or, as gateways send to logs to one server and then distribute, does the log exporter process logs before distributing the logs to the other primary log servers? If so, we would enable log exporter on the new log server which should be able to handle the exporter and filtering process as it will have much high resource capacity (Proliant DL360 Gen 10, Intel Xeon Gold 6258R CPU 2700.000 Mhz, 56 cores, 192GB memory).

The log exporter would be disabled on the existing log server.

Tomer_Noy · ‎2022-03-02

Hi Simon,

These are very good questions 🙂
I'll try to answer as best I can.

Dynamic logs distribution is a relatively new feature in R81.10. Both the Management and the gateways need to be at least on version R81.10 for this to be supported.

I can confirm your understanding that logs will be distributed among all primary log servers. If all primary log servers are unavailable, then the gateway will switch to send to the backup log servers. If a single primary log server is unavailable, then all logs will be distributed between the remaining primary log servers.

Actually, the logs are not sent to a single log server, then distributed among the others. The gateway has connections to each primary log server and will distribute the logs itself, according to how much each log server can accept. This lowers the risk of failure if a "main" log server goes down. Because of this, each log server is independent. If you are using Log Exporter, you need to define it on each log server. You can also use the new option to configure it in the SmartConsole UI, if it simplifies it for you.

It is good practice to have enough capacity in your primary log servers to handle the fault resilience that you need to support. For example, if each of your log servers can handle 10K logs per second, you are sending 15K logs per second and you want to cope with losing 1 log server, I would recommend using distributing between 3 primary log servers. That will allow you to handle the loss of 1 server and still accept all logs with the remaining primary log servers. When all are up, you still enjoy better performance for queries (which are also distributed) and you can better handle temporary log peaks without indexing backlog.

In many cases, it's better to have more primary log servers than some as backup log servers. The dynamic nature of the new solution copes with both load sharing and resilience.

Conceptually, you can view the backup log servers as your "backup site". If your entire main site (with the primary log servers) goes down, then you fail over to the backup site and the log servers there will serve as the backup. Of course, they don't actually have to be in different geographies, but this is a good use-case for it.

I hope the above helps to clarify.

Since this is a relatively new feature, I would be glad to get your input on your experience (either in this post, or private message).

Simon_Macpherso · ‎2022-03-03

Hi @Tomer_Noy

Thanks for your response 🙂

I have some questions.

What defines the "main" log server in the pool of configured primary servers? By main are you referring to any primary server?
As each primary sever does not receive a copy of all logs and if affectively independent in relation to the other primary servers, does that mean that logs and related indexes on a failed primary server will not be available to query from Smart Console?

Fault resilience is something we need to take in to consideration if we eventually enable dynamic logs distribution.

Our current primary log server is under resourced (ProLiant DL380 Gen9, Intel Xeon CPU E5-2643 v3, 3396.274Mhz, 12 cores, 64 GB), based on our logging rate (avg. 18K), so if we did add it as a primary server and we ever need to failover to it, it will struggle. We'll either need to upgrade it or consider adding a 3rd primary server in to the pool.

By which method do the gateways determine how much each log server can accept? Is this based on a query of server resource metrics of each log server?

Is there a guideline specific to log server sizing I can reference to accurately size a dedicated log server based on current logging rate (similar to cpsizeme for gateways)?

Regards,

Simon

Tomer_Noy · ‎2022-03-06

In our design there is no "main" server between the primary log servers. All are equal and will continue to function even if any of them goes down. My reference to "main" was to illustrate the benefit of this design, over an alternative that does have a "main" and single point of failure.

Indeed, if a log server fails, you will not be able to query the logs that have been stored on it. That is the difference between using log distribution versus the older default of sending multiple copies of logs to all log servers.

The gateway dynamically decides how many logs to send to each server. Instead of querying the server for load, we manage a queue for each log server and monitor how quickly logs are accepted and cleared from the queues. Future logs are balanced so that we send more to the queues that are less full.

Regarding sizing, the easiest way to size is to compare with our official Smart-1 Appliances. You can see how many cores and memory they have, and the sustained indexed log rate that they support.
Make sure to take into account the software version that you use. The 5050/5150's were tested with R80.x while the newer 6000-L and 6000-XL were tested with R81.x which increases the capacity.

Simon_Macpherso · ‎2022-03-07

Thanks @Tomer_Noy

I like the idea of it in conceptually with the exception of not being able to query some logs if a primary server fails.

If the failed server is still accessible, would it still possible to copy the log files and pointers (e.g. $FWDIR/log/*.log*) from the failed server and import them in to one of the remaining primary servers? If so, that is better though not ideal.

Tomer_Noy · ‎2022-03-07

Yes, there is a procedure for importing log files. You can see some info here:
https://sc1.checkpoint.com/documents/R81/WebAdminGuides/EN/CP_R81_LoggingAndMonitoring_AdminGuide/To...

In case of log server failure, you can either install a new log server to replace it and copy all files into it. If you want the logs to be available in searches and reports, you may need to set the "days-to-index" parameter to index them all. If not, you can still open them by selecting specific log files.

Alternatively, you can also import them into an existing log server, but note that indexing can take a lot of CPU compute and will add load to that server during the re-indexing.

If in your case having a "hot" backup for the log files is necessary, it might be that the solution to duplicate log sending is more appropriate than log distribution. It's a bit like the difference in storage between RAID0 and RAID1 and you need to choose resiliency versus performance.

Simon_Macpherso · ‎2022-03-09

Thanks for all your input here @Tomer_Noy

Martin_Raska · ‎2022-06-20

Hi Tomer,

one question from me. Here is an example setup, one SMS and two primary log servers A and B. From the user perspective where do I connect SmatConsole to view the logs all together.

Do I have to connect to log server A to search the logs and if not found then I have to connect to another log server B and do search?

Or I connect to SMS a and query and log view is pulled from both logs servers?

Thx

Tomer_Noy · ‎2022-06-20

Just connect to your SMS as usual. The log & report queries will be distributed by default across your log servers.

Are you a member of CheckMates?

Dynamic logs distribution