Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Wolfgang
Authority
Authority

dynamic balancing, VSX and core affinity

I need some advice and real world experience from the field.

In VSX environments you have to find a good distribution of your cores to virtual systems. Experience over the years shows it's a good configuration to allocate dedicated cores to heavy used VS. You have to have a look at all your cores and you have to play a little bit with the distribution between cores, VS and processes. But it works.

Now we have Dynamic Balancing for CoreXL with support for VSX. Anything changes with this ?

Can I set all my cores to shared for all VS and dynamic balancing did the work and will distribute everything balanced ?

Maybee it's better to disable dynamic balancing and set affinity with dedicated cores for VS ?

Great Security Gateway Performance Optimization - VSX  in the past initiated by @Kaspars_Zibarts but it would be interesting with an update regarding dynamic balancing.

 

0 Kudos
37 Replies
Chris_Atkinson
Employee Employee
Employee

Setting static affinity is less relevant in current versions in my opinion but there are still some exceptions such as scenarios that may benefit from HyperFlow in future.

CCSM R77/R80/ELITE
0 Kudos
Kaspars_Zibarts
Employee Employee
Employee

Yeah. I don't if I would dare to allow dynamic split on our core internal VSX. Too much at stake there. And VSes are really different in sizes. I'm afraid that it may lead to situation where a smaller VS can trigger some unwanted split changes leading to outages on bigger VSes.

But that's just me being old fashioned 👴

Probably would risk it in "provider" like environment with more equally sized VSes.

In all honesty we have not been in such situation where dynamic split would have saved us.

To give another scenario where one of external VSes got resources exhausted because of DDOS attack - I'm curious how would dynamic split would work in this scenario - just keep allocating FWKs till SNDs too are overloaded? 🙂

Too many questions to be honest. I'll let someone else to experiment with it in production 🙂

0 Kudos
Wolfgang
Authority
Authority

I‘m with you @Kaspars_Zibarts most of the time the old fashioned way will be the save way and let you sleep at night. But sometimes new features are really useful and save a lot of time.

Me too, I don‘t want to be the first to try. The systems we could try this have to be run as stable as possible. Possibly someone here tried and will share his experience 😀

0 Kudos
genisis__
Leader Leader
Leader

I agree - too many unknowns, and for control freaks like us we want predictable, measurable statics per VS so capacity can be managed correctly.

In a traditional gateway scenario, I would problem enable it, on a fresh installation.

0 Kudos
Itall
Contributor

Yeeeah (our case) 

you're right. In our environment, after switching to R81.20, dynamic balancing is absolutely unusable. We were experiencing meaningless allocation of CPU cores from FWK instances to SND. And this despite the fact that it was not needed (SND cores were not at/below 5% idle in large intervals. SND cores did not freeze at 100% utilization). Due to this, during the installation of the policies, outages and latencies also occurred with VS, where the policy was not installed at the given moment. We set SND/MQ to manual, as well as setting/allocating cores manually for each VS including processes (affinity), we turned off dynamic balancing. Since then as a benefit: - we have accelerated operation in the range of 95 - 98% even though we use URLF/APPCL/IPS (in automatic mode it was around 60%) - on average, we have 70% - 95% of accelerated traffic in distribution on the fast path (Cut-Trough) It seems to me that the dynamic balancing at runtime took more CPU cores used by FWK than is healthy. I'm also an "old man" in this, I recognize a fixed configuration depending on how the environment is normally loaded. Dynamic balancing would probably work correctly if all VS on a given box were balanced by the load with a difference of up to 10%. It certainly does not do well if I have, for example, 3 VS on the box, where VS 1 provides connections in the range of 15k - 30k, the second VS provides an average of 5k connections, and VS 3 provides a burst of 8k - 20k.

0 Kudos
AmitShmuel
Employee
Employee

Hi, 

Dynamic Balancing logs every action to its dedicated log file, dsd.elg, along with the average utilization of the FW and SND cores, you should see at least 10% diff between FWs and SNDs.

I cannot see how Dynamic Balancing would affect the traffic distribution (60% --> 95%+ accelerated), and if possible, I'd be happy to examine the env. further - you can contact me at amitshm@checkpoint.com.

It should not matter if VSs are balanced or not, since all VSs processes share the same cores, and moreover, the FWK threads are not statically affined (as opposed to SGW), meaning the Linux scheduler should run them in a balanced manner, making the FWK cores reflect the sum total of the VSs work.

The admin of the system can change the number of FWK threads (workers) in the Smart Console, to prioritize VSs according to the traffic they are expected to receive, so logically - the more threads, the more CPU time they'll get.

Thanks,
Amit

Itall
Contributor

Hi,

Thanks for the clarification, but I have my theory gained from practice, no automatic control/balancing has ever worked on Checkpoint products (at least never in the operation where I managed the firewall). Automation is a beautiful thing, but if my traffic changes too dynamically/impulsively, then the automation will never work as prospectively as a balanced static configuration tuned based on the obtained traffic sample (automated reconfiguration delays). Even on previous versions, I always used the CPU core configurations (FW/FWK/SND) statically, as well as possibly multiqueue, on old versions even sim affinity was recommended statically for effective balancing, and I believe that this idea still persists even though it is difficult he admits. With the arrival of R81, it was announced in the marketing that a high optimization was done in terms of the distribution of ACl´s in the rulebase, saying that the original theory about the placement of heavy rules no longer applies, again this is not true.

 

In general, I don't believe that the automatic balancing of FWK cores will be effective with the userspace firewall, precisely because it's a userspace process and it's not so much about Linux as about how Checkpoint fw is written. I might still believe in SND, it runs in kernel mode.

L.

0 Kudos
_Val_
Admin
Admin

@Chen_Muchtar  can you please advise?

0 Kudos
AmitShmuel
Employee
Employee

Similarly to SGW, Dynamic Balancing aims to balance the load between the FWKs cores and the SNDs cores.

A prerequisite to start Dynamic Balancing, is having all FWKs set to the default FWKs CPUs (for example in an 8 cores machine, 2-7).

Upon detecting an imbalance (SNDs working harder), Dynamic Balancing will set all VSs FWKs to a smaller set of CPUs, and have SNDs take over the CPU.

Average load calculation remain the same, Dynamic Balancing discards any outliers CPUs that may be working harder due to some specific VS.

I'd be happy to review your advanced configuration and share my feedback, feel free to contact me at amitshm@checkpoint.com.

Thanks,
Amit

0 Kudos
Cristian_F_CCSM
Contributor
Contributor

Hello, the situation vs 1 linked to internet and vs 2 as ISFW with CoreXL Dynamic balancing enabled is very intesting. In DDOS case from internet, i would't that there is an impact on vs 2 ISFW. How about this situation?

Second question, if I enable CoreXL Dynamic Balancing, the "fw workers" number is still editable from VS configuration on SmartConsole. I would have expected that this type of configuration is disabled after Dynamic Balancing activation. Can you explain (or copy a link) how interact these two feature / configuration please?

Thanks

0 Kudos
AmitShmuel
Employee
Employee

The number of FW workers is still editable.

On VSX, Dynamic Balancing only changes the amount of cores running FW workers, so you can configure any number of them.
Upon SND addition, it will set the FWKs of all VSs to the new set of cores.

Here is an example:

Default state:
Core 0: SND
Core 1-3: FWKs (fwk0_0, fwk0_1, ..., fwk1_0, fwk1_1, ..., fwkN_N)

Dynamic Balancing adds SND:
Core 0: SND
Core 1: SND
Core 2-3: FWKs (fwk0_0, fwk0_1, ..., fwk1_0, fwk1_1, ..., fwkN_N)

0 Kudos
CheckPointerXL
Advisor
Advisor

Hello Amit,

This is a great explanation.

What about reverse situation? A VS needs more fw worker instances: could they get othere cores for this purpose? I'm confused about this because after enabling Dynamic Balancing i can still edit from SmartConsole number of CoreXL.

I guess, if VS needs more fw worker, will they be assigned to VS anyway? So, i will have 10 fw worker on a VS with 8 CoreXL sticked on smartconsole, is this possibile?

I m confused

0 Kudos
AmitShmuel
Employee
Employee

By reverse situation, do you mean more FW instances, or more FW cores?

Dynamic Balancing can reduce the number of SND cores to utilize more FW cores, but since changing the number of FW workers results in VS restart, it is not possible to do it dynamically, hence it can only allocate more cores to the same amount of FW workers, leaving the minimum number for SNDs of course.

0 Kudos
CheckPointerXL
Advisor
Advisor

Ok, so let's assume I have a VSX GW with 32 core.

Then, I create a VS and I configure 10 CoreXL instances on CoreXL tab inside the VS Object in SmartConsole.

The dynamic balancing happens INSIDE that 10 cores, by balancing SND/Worker on that 10 core, is this right?

By reading documentation i Understood that with dynamic balancing one single VS could potentially use all 32 cores.

0 Kudos
AmitShmuel
Employee
Employee

There is a complete separation between FW cores and SND cores (see my illustration above).

The dynamic balancing happens on the system as a whole, if more FW cores are needed, it will allocate more FW cores, and vice versa.

The configured 10 CoreXL instances will run freely on the FW cores (whether there are 2 cores, 20 cores, or any other number).
The number of FW cores is usually determined by the number of SND cores. i.e. you have 4 SNDs on a 32 cores machine, then you'll have 28 cores used for FW.

CheckPointerXL
Advisor
Advisor

Amil, thank you very much for your explanation.

Now i understand that configuring 10 CoreXL Fw worker on SmartConsole it means that these worker can "join" 20 logical core or also 5; it helped me the attached illustration.

0 Kudos
Cristian_F_CCSM
Contributor
Contributor

Hello, OK, clear about the second question.

About the first scenario indicated (two vs: external and internal fw), do you have experience?

Thanks

0 Kudos
AmitShmuel
Employee
Employee

Can you please elaborate on that scenario? what is the concern here?

0 Kudos
Cristian_F_CCSM
Contributor
Contributor

Hello, yes sure, my doubt is with this situation:

- vs 1 (internet) receive a DDOS attack

- The requests to CPUs (for IPS, logging etc.) are high

- Dynamic balancing is enabled

In this case the vs1 and vs2 use the same CPUs fore CoreXL and, therefore, in this case, will be some issue also for vs2 (internal) and not only vs1 (internet)?!

If we assign some CPUs for vs 1 and, others CPUs to vs2, we can reduce this type of risk.

In this scenario, with dynamic balancig enabled, VSLS can minimize the risk (in my humble opinion).

Regards

0 Kudos
AmitShmuel
Employee
Employee

Current Dynamic Balancing implementation uses all FWK CPUs for all VSs, similar to the default out of the box configuration.

Are you suggesting to separately assign VSs CPUs in advance, or on the fly?

0 Kudos
Cristian_F_CCSM
Contributor
Contributor

Hello, to prevent the described problem i prefer configure static CPU affinity during the VSX GW first configuration.

0 Kudos
Piet_vd_Maas
Contributor

My experience with Dynamic Balancing is quite positive.

The load on all CPU's is better balanced as it should. And when the load of a interface is heavier the impact on the throughput is lower. 

CCSM - CCTE - CCVS - CCMS
genisis__
Leader Leader
Leader

I'm running dynamic balancing on a R81.10 VSX system with JHFA66, however additional private hotfix was required, which also contained fixes for dynamic balancing.

Symptoms I experinced:

- Dynamic balancing initially starts, but once all the VS have been fully loaded it turns off.

- Cores get stuck in OTHER state as they are transitioning between fwk and SND.

Please note that I've discussed these with TAC and if not done so already the fixes related to dynamic balancing will be integrated (I think they were introduced in JHFA75).

Kaspars_Zibarts
Employee Employee
Employee

Thanks for keeping us in the loop! Really appreciated. 

0 Kudos
genisis__
Leader Leader
Leader

Additional info which I think is useful, which as sent by the TAC engineer:

PRHF-25607  - is going to be part of next Jumbo PRJ-41482 
PRHF-25610  - ready for commit state, trying to push this one as well as soon as possible PRJ-41634 
PRHF-25603  - already part of the Jumbo take 75 - PRJ-39820 
PRHF-25597  - already part of Jumbo 75 PRJ-39324 
PRHF-25594  - ready for commit state, trying to push this one as well as soon as possible PRJ-41124 
PRHF-25611  – going to be part of next jumbo 

0 Kudos
_Val_
Admin
Admin

@genisis__ could you please send me offline the contact of that TAC engineer? Thanks in advance, Val

0 Kudos
genisis__
Leader Leader
Leader

Will do.

0 Kudos
Naama_Specktor
Employee
Employee

@genisis__ 

Hi 🙂

My name is Naama Specktor and I am checkpoint employee,

I will appreciate it if you will share TAC SR # with me , here or in PM.

thanks!

 

Naama 

0 Kudos
genisis__
Leader Leader
Leader

Ping Val - already sent info to him.

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events