Solved: Re: Log distribution performance - Impact on GWs v...

Scottc98 · ‎2024-04-25

Curious if any one has some insights on when to use Log Distribution when you have two or more log servers.

Under the each GW object => Logs, there is a Log Distribution option when you have multiple loggers defined.

"Send a copy of every log to each of the primary log servers" - with logs going to defined backup server if primary fails
"Distribute logs between log servers for improved performance (applies to primary and backup log servers)"

What I am curious to understand is for the 2nd option and where performance is affected from both the log server AND GW side.

I can see from the log server side where performance can be improved with each of the log servers defined under the 'primary' section being balanced out so not one server is taking all of the load.

But.....does that come at a performance cost from the GW side? Does the GW have an increase of memory and/or CPU by maintaining multiple log server connections and what is sent to each.....verses with going with option 1 where the GW only has to maintain one log connection and just forward?

I have concerns with the memory usage on the GWs here and if it consumes more as I have a lot of 3000 series appliances where memory is both low (8GB) and can't be added.

If there is a known hit to this, I am more willing to go with option 1 here and just 'split' the primary/secondary log servers across the fleet so there is some 50/50 hard split (understanding that it would not be truly balanced from the log server side with log rates able to change dynamically from each GW).

In the middle of a complete management server/log server migration project and would like to make sure I have made the right choice 🙂 We had the option 1 with the hard split up and moved to option 2.....and noticed one of my standby cluster sites have a large increase of memory right after policy install. Want to sure its not an anomaly or if I am going to have some increased memory loads gradually across the infra with this deployment.

Appreciate any immediate input or past experience on this subject 🙂

Timothy_Hall · ‎2024-04-25

Agree with @the_rock here, the log balancing function for whatever reason seems to result in incomplete/missing logs that can make effective troubleshooting very difficult. The gateway logging transfer mechanism is implemented in R81.20 and earlier using the very old (think 1990s here), single-threaded fwd daemon which is well-known for causing gateway logging to stop randomly, and only a restart of the fwd daemon will fix it in most cases. Seems like this log balancing function is a bridge too far for fwd which already has struggles in today's world of heavy logging. Having CoreXL automatically allocate a dedicated core for fwd on gateways with 20 or more cores did extend its life somewhat though.

Thankfully the logging mechanism is planned to be taken away from fwd in R82 and completely reimplemented, similar to how features/functions have been slowly taken away from the legacy fwm daemon on the SMS and transferred to other/new processes such as cpm.

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course

View solution in original post

the_rock · ‎2024-04-25

I only worked with 1 customer who did this and I can tell you choosing 2nd option was not fun, I wont use any other words to describe it lol. When it was enabled, we noticed randomly logs would not be sent, cpu and memory was higher on the gateways, policy install was okay and Internet access too, but as soon as we would switch to 1st option, all was a go, no issues anywhere.

We even had TAC case open for it, tried so many things and debugs, but no joy, so simply decided not to bother with it, just left option 1 and all good.

I honestly wish we knew the reason why 2nd choice was failing, but never found out.

Hope that helps.

I sort of compare this to load sharing ISP redundancy, it sadly simply does not work properly.

Andy

Best,
Andy

Timothy_Hall · ‎2024-04-25

Agree with @the_rock here, the log balancing function for whatever reason seems to result in incomplete/missing logs that can make effective troubleshooting very difficult. The gateway logging transfer mechanism is implemented in R81.20 and earlier using the very old (think 1990s here), single-threaded fwd daemon which is well-known for causing gateway logging to stop randomly, and only a restart of the fwd daemon will fix it in most cases. Seems like this log balancing function is a bridge too far for fwd which already has struggles in today's world of heavy logging. Having CoreXL automatically allocate a dedicated core for fwd on gateways with 20 or more cores did extend its life somewhat though.

Thankfully the logging mechanism is planned to be taken away from fwd in R82 and completely reimplemented, similar to how features/functions have been slowly taken away from the legacy fwm daemon on the SMS and transferred to other/new processes such as cpm.

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course

the_rock · ‎2024-04-25

There you go @Scottc98 . If CP master @Timothy_Hall agrees, then 100% that is the RIGHT answer 🙂

Andy

Best,
Andy

Scottc98 · ‎2024-04-26

@the_rock @Timothy_Hall

Thank you so much for both of your inputs here and your contribution to the checkmates community. 🙂

Original plan of Option 1 is a go now ....

the_rock · ‎2024-04-26

You are welcome. @Timothy_Hall is way smarter than I am, so glad you got the answer 🙂

Andy

Best,
Andy

Timothy_Hall

Just follow up on what @Tomer_Noy mentioned in another thread with "fwd scale-out" in R82+, there is a very nice SK now describing this in more detail, and how to make tuning adjustments to increase log performance if needed: sk182215: "You have reached the maximum capacity this worker's configuration can handle" message in ...

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course

Are you a member of CheckMates?

Log distribution performance - Impact on GWs verse log servers