Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
HristoGrigorov

CoreXL on SMB

This is likely a question for Timothy Hall‌ but if anyone else can elaborate on this please do so.

As you know on Gaia Embedded you may assign only fw instances to different cores. When I check connections distribution Instance 0 will always be getting the most connections. Sometimes that will be double the value of the other two instances.  This does not look like a true load distribution to me. Are connections served in a chain from instance 0 -> N based on current instance load ? And what are the criteria for the system to decide instance cannot currently accept more connections and hand-off to next instance is needed? 

Then, I have tried to give affinity to all 3 cores for all the 3 instances and I have noticed a slightly better change in system load compared to when it is one instance per core. When I think about it for a system with only 3 cores serving only fw workers that might be better strategy because it gives dispatcher more possibilities to quickly offload traffic to other cores as well thus achieving a better overall saturation of the CPU resources. Is my assumption correct or am I missing something here?

23 Replies
G_W_Albrecht
Legend
Legend

As already mentioned in my article SecureXL & CoreXL on SMB devices, according to CP:

- The 7x0/14x0 appliances have two cores and can use the 'sim affinity' command to assign interfaces to cores.

It usually makes no sense to manually configure CoreXL on two-core-systems.

- On 14x0 units only, CoreXL is supported (check with fw ctl mutik stat), and so two SNDs and two fw_worker processes exist. Settings are changed using fw ctl affinity

You are speaking of three cores, how that ?

CCSE CCTE CCSM SMB Specialist
0 Kudos
HristoGrigorov

# fw ctl affinity -l -a
fw_0: CPU 0
fw_1: CPU 1
fw_2: CPU 2

# fw ctl multik stat
ID | Active | CPU | Connections | Peak
----------------------------------------------
0 | Yes | 0 | 1129 | 2445
1 | Yes | 1 | 706 | 2147
2 | Yes | 2 | 529 | 1396

0 Kudos
G_W_Albrecht
Legend
Legend

Again i have learned something 😉  What you currently are disregarding is 'sim affinity' command to assign interfaces to cores. Maybe this is tied up with connections distribution ?

CCSE CCTE CCSM SMB Specialist
0 Kudos
HristoGrigorov

That would mean all LAN interfaces will be assigned to one core and then another core(s) for WAN and DMZ. Doesn't make much sense because number of connections on LAN interfaces can be much higher than on the WAN and DMZ.

Oh, wait... I forgot that all LAN interfaces are sharing same physical one. Smiley Happy

0 Kudos
Timothy_Hall
Champion
Champion

The Dynamic Dispatcher feature to keep the Firewall Workers/Instances balanced based on CPU loads was introduced in R77.30, but the Embedded Gaia appliances are running R77.20.XX.  In this release, allocations of new connections to a Firewall Worker by a dispatcher is handled by a simple hash function that uses only the source and destination IP addresses as inputs.  As a result lots of traffic between the same two systems will always draw the same Firewall Worker core and potentially overload it.  There is a kernel variable fwmultik_hash_use_dport that can be set to include the destination port in the hash calculation to (somewhat) distribute the Firewall Worker load a little more evenly, but I'm not sure if setting this is supported on Embedded Gaia or not.  This topic is covered on pages 243-244 of my "Max Power" book.

Also any VPN and VoIP traffic will always be sent to instance 0 no matter what; this limitation was lifted in R80.10+ gateway.

--
"IPS Immersion Training" Self-paced Video Class
Now Available at http://www.maxpowerfirewalls.com

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
HristoGrigorov

Thanx for the detailed explanation Tim. It totally confirms my observations. I checked and such kernel variable indeed exists on Gaia Embedded:

# fw ctl get int fwmultik_hash_use_dport
fwmultik_hash_use_dport = 0

I will try setting to 1 and see if it actually makes any difference or it just sits there and does nothing. Good point also about the VPN traffic I did not accounted for it. 

G_W_Albrecht
Legend
Legend

I just have issued the fw ctl get int fwmultik_hash_use_dport command on my 730 and received the same output. I have used the feedback of sk96068 to ask if it is relevant for SMBs. Interesting notes from sk96068:

  • This parameter can be enabled only permanently ('fw ctl set int' command is not supported).
  • Enabling this kernel parameter is useful when the traffic passes between the same Source IP address and Destination IP address, from the same Source Port to different Destination ports (e.g., scanner).
CCSE CCTE CCSM SMB Specialist
HristoGrigorov

I will ask RnD if this param makes sense on Gaia Embedded. It is interesting indeed.

0 Kudos
G_W_Albrecht
Legend
Legend

As i wrote, i did that already:

 

Thank you for providing your feedback to SecureKnowledge on sk96068, titled "Performance degradation on Security Gateway when port scan test is performed through Security Gateway".

 

Your feedback was:

------------------

Can this kernel parameter be also set on SMB appliances (e.g. 1490) ?

CCSE CCTE CCSM SMB Specialist
0 Kudos
HristoGrigorov

Sorry, I somehow missed that. 

The reply from RnD is that although there are no empirical data on the performance impact it shall work the same way as in Gaia. 

0 Kudos
Timothy_Hall
Champion
Champion

Sounds right, the Dynamic Dispatcher in R77.30+ does cause some additional load on the SND/IRQ cores versus just using the hash function, but this additional dispatcher overhead is worth it 99% of the time to keep individual workers from getting overwhelmed.  I seriously doubt adding the dport number to the worker hash calculation would cause any significant additional overhead on the dispatcher(s).

--
"IPS Immersion Training" Self-paced Video Class
Now Available at http://www.maxpowerfirewalls.com

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
HristoGrigorov

So, I implemented this SK and the result looks promising:

ID | Active | CPU | Connections | Peak
----------------------------------------------
0 | Yes | all | 699 | 1643
1 | Yes | all | 676 | 1601
2 | Yes | all | 660 | 1969

Note however that I assigned all 3 instances to work on all cores.

0 Kudos
HristoGrigorov

Implementing sk96068 definitely changes the way traffic is distributed between the available cores. A surprise for me is that now the peak number of connections is constantly more on CPU3 than on CPU1 as it was before.

Also, I have observed this for a long time now that mostly CPU2 and CPU3 are servicing software interrupts and very occasionally there will be any on CPU0. I do not know if that comes from the linux kernel itself or from the fw dispatcher. 

Timothy_Hall
Champion
Champion

The interrupts you see on CPU2 and CPU3 are probably hardware interrupts for frames coming up from from the NICs into ring buffers, try running sim affinity -l or cat /proc/interrupts to verify.

--
"IPS Immersion Training" Self-paced Video Class
Now Available at http://www.maxpowerfirewalls.com

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
HristoGrigorov

# sim affinity -l
Multi queue interfaces: DMZ LAN WAN

# cat /proc/interrupts

146: 0 0 0 GIC al-eth-msix-mgmt@pci:0000:00:02.0
147: 37 3935375 0 GIC al-eth-rx-comp-0@pci:0000:00:02.0
148: 4 0 3941023 GIC al-eth-rx-comp-1@pci:0000:00:02.0
149: 9 22768295 0 GIC al-eth-rx-comp-2@pci:0000:00:02.0
150: 256 0 5090930 GIC al-eth-rx-comp-3@pci:0000:00:02.0
151: 20 18235802 0 GIC al-eth-tx-comp-0@pci:0000:00:02.0
152: 6 0 4893096 GIC al-eth-tx-comp-1@pci:0000:00:02.0
153: 0 5817309 0 GIC al-eth-tx-comp-2@pci:0000:00:02.0
154: 0 0 0 GIC al-eth-tx-comp-3@pci:0000:00:02.0
155: 0 0 0 GIC al-eth-msix-mgmt@pci:0000:00:00.0
156: 2250 18708550 0 GIC al-eth-rx-comp-0@pci:0000:00:00.0
157: 282 0 5828076 GIC al-eth-rx-comp-1@pci:0000:00:00.0
158: 283 6689854 0 GIC al-eth-rx-comp-2@pci:0000:00:00.0
159: 425 0 6237466 GIC al-eth-rx-comp-3@pci:0000:00:00.0
160: 14 21603625 0 GIC al-eth-tx-comp-0@pci:0000:00:00.0
161: 6 0 8865820 GIC al-eth-tx-comp-1@pci:0000:00:00.0
162: 3 9522446 0 GIC al-eth-tx-comp-2@pci:0000:00:00.0
163: 0 0 0 GIC al-eth-tx-comp-3@pci:0000:00:00.0
164: 0 0 0 GIC al-eth-msix-mgmt@pci:0000:00:01.0
165: 348 282392 0 GIC al-eth-rx-comp-0@pci:0000:00:01.0
166: 0 0 66074 GIC al-eth-rx-comp-1@pci:0000:00:01.0
167: 2 122825 0 GIC al-eth-rx-comp-2@pci:0000:00:01.0
168: 1 0 71558 GIC al-eth-rx-comp-3@pci:0000:00:01.0
169: 4 183485 0 GIC al-eth-tx-comp-0@pci:0000:00:01.0
170: 0 0 272584 GIC al-eth-tx-comp-1@pci:0000:00:01.0
171: 0 31456 0 GIC al-eth-tx-comp-2@pci:0000:00:01.0
172: 0 0 0 GIC al-eth-tx-comp-3@pci:0000:00:01.0

Here is something interesting?:

IPI2: 34126463 3968474 5174095 Rescheduling interrupts

0 Kudos
G_W_Albrecht
Legend
Legend

I am currently working on your solution request SK-SR# 3-0699806904, titled:"Performance degradation on Security Gateway when port scan test is performed through Security Gateway" under ID:sk96068.

Your feedback was:
------------------
Can this kernel parameter be also set on SMB appliances (e.g. 1490) ?

------------------

I have confirmed that the kernel parameter does apply to SMB appliances. The sk has been updated based on your feedback.

CCSE CCTE CCSM SMB Specialist
HristoGrigorov

Wonderful! Thank you for taking the time to do it.

0 Kudos
G_W_Albrecht
Legend
Legend

In fact, this only has needed the 8 sec of time to write the line into feedback 😉

CCSE CCTE CCSM SMB Specialist
HristoGrigorov

I am wondering what took you so long Smiley Happy

0 Kudos
G_W_Albrecht
Legend
Legend

Explanation: First typing into an outlook note, copy it, paste it into feedback field, prove that i am no robot, and voila !

I would have saved 3-4 secs by typing directly into feedback field, but unsaved content always is prone to vanish very suddenly Smiley Happy.

CCSE CCTE CCSM SMB Specialist
0 Kudos
HristoGrigorov

You are lucky that they do not require you to solve polynomial equation to prove you are not robot Smiley Happy

Anyway, I am more and more satisfied how CoreXL works with this SK applied. Nearly perfect connection distribution:

ID | Active | CPU | Connections | Peak
----------------------------------------------
0 | Yes | 0 | 736 | 1970
1 | Yes | 1 | 752 | 1974
2 | Yes | 2 | 793 | 1981

I am going to leave it permanently enabled. 

G_W_Albrecht
Legend
Legend

So what are you waiting for ? Please let your experiences get condensed in a document, e.g. SMB CoreXL instances are overloaded (sk96068) !

CCSE CCTE CCSM SMB Specialist
0 Kudos
HristoGrigorov

Hehe, well... It is all written in Tim's book actually. It is awesome reading. Only that you are never sure what applies to Gaia Embedded and what not. 

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events