CoreXL on SMB

HristoGrigorov · ‎2019-02-26

This is likely a question for Timothy Hall‌ but if anyone else can elaborate on this please do so.

As you know on Gaia Embedded you may assign only fw instances to different cores. When I check connections distribution Instance 0 will always be getting the most connections. Sometimes that will be double the value of the other two instances. This does not look like a true load distribution to me. Are connections served in a chain from instance 0 -> N based on current instance load ? And what are the criteria for the system to decide instance cannot currently accept more connections and hand-off to next instance is needed?

Then, I have tried to give affinity to all 3 cores for all the 3 instances and I have noticed a slightly better change in system load compared to when it is one instance per core. When I think about it for a system with only 3 cores serving only fw workers that might be better strategy because it gives dispatcher more possibilities to quickly offload traffic to other cores as well thus achieving a better overall saturation of the CPU resources. Is my assumption correct or am I missing something here?

G_W_Albrecht · ‎2019-02-27

As already mentioned in my article SecureXL & CoreXL on SMB devices, according to CP:

- The 7x0/14x0 appliances have two cores and can use the 'sim affinity' command to assign interfaces to cores.

- It usually makes no sense to manually configure CoreXL on two-core-systems.

- On 14x0 units only, CoreXL is supported (check with fw ctl mutik stat), and so two SNDs and two fw_worker processes exist. Settings are changed using fw ctl affinity

You are speaking of three cores, how that ?

CCSP - CCSE / CCTE / CTPS / CCME / CCSM Elite / SMB Specialist

HristoGrigorov · ‎2019-02-27

# fw ctl affinity -l -a
fw_0: CPU 0
fw_1: CPU 1
fw_2: CPU 2

# fw ctl multik stat
ID | Active | CPU | Connections | Peak
----------------------------------------------
0 | Yes | 0 | 1129 | 2445
1 | Yes | 1 | 706 | 2147
2 | Yes | 2 | 529 | 1396

G_W_Albrecht · ‎2019-02-27

Again i have learned something 😉 What you currently are disregarding is 'sim affinity' command to assign interfaces to cores. Maybe this is tied up with connections distribution ?

CCSP - CCSE / CCTE / CTPS / CCME / CCSM Elite / SMB Specialist

HristoGrigorov · ‎2019-02-27

That would mean all LAN interfaces will be assigned to one core and then another core(s) for WAN and DMZ. Doesn't make much sense because number of connections on LAN interfaces can be much higher than on the WAN and DMZ.

Oh, wait... I forgot that all LAN interfaces are sharing same physical one.

Timothy_Hall · ‎2019-02-27

The Dynamic Dispatcher feature to keep the Firewall Workers/Instances balanced based on CPU loads was introduced in R77.30, but the Embedded Gaia appliances are running R77.20.XX. In this release, allocations of new connections to a Firewall Worker by a dispatcher is handled by a simple hash function that uses only the source and destination IP addresses as inputs. As a result lots of traffic between the same two systems will always draw the same Firewall Worker core and potentially overload it. There is a kernel variable fwmultik_hash_use_dport that can be set to include the destination port in the hash calculation to (somewhat) distribute the Firewall Worker load a little more evenly, but I'm not sure if setting this is supported on Embedded Gaia or not. This topic is covered on pages 243-244 of my "Max Power" book.

Also any VPN and VoIP traffic will always be sent to instance 0 no matter what; this limitation was lifted in R80.10+ gateway.

--
"IPS Immersion Training" Self-paced Video Class
Now Available at http://www.maxpowerfirewalls.com

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course

HristoGrigorov · ‎2019-02-27

Thanx for the detailed explanation Tim. It totally confirms my observations. I checked and such kernel variable indeed exists on Gaia Embedded:

# fw ctl get int fwmultik_hash_use_dport
fwmultik_hash_use_dport = 0

I will try setting to 1 and see if it actually makes any difference or it just sits there and does nothing. Good point also about the VPN traffic I did not accounted for it.

G_W_Albrecht · ‎2019-02-27

I just have issued the fw ctl get int fwmultik_hash_use_dport command on my 730 and received the same output. I have used the feedback of sk96068 to ask if it is relevant for SMBs. Interesting notes from sk96068:

This parameter can be enabled only permanently ('fw ctl set int' command is not supported).
Enabling this kernel parameter is useful when the traffic passes between the same Source IP address and Destination IP address, from the same Source Port to different Destination ports (e.g., scanner).

CCSP - CCSE / CCTE / CTPS / CCME / CCSM Elite / SMB Specialist

HristoGrigorov · ‎2019-02-27

I will ask RnD if this param makes sense on Gaia Embedded. It is interesting indeed.

G_W_Albrecht · ‎2019-02-27

As i wrote, i did that already:

Thank you for providing your feedback to SecureKnowledge on sk96068, titled "Performance degradation on Security Gateway when port scan test is performed through Security Gateway".

Your feedback was:

------------------

Can this kernel parameter be also set on SMB appliances (e.g. 1490) ?

CCSP - CCSE / CCTE / CTPS / CCME / CCSM Elite / SMB Specialist

HristoGrigorov · ‎2019-02-27

Sorry, I somehow missed that.

The reply from RnD is that although there are no empirical data on the performance impact it shall work the same way as in Gaia.

Timothy_Hall · ‎2019-02-27

Sounds right, the Dynamic Dispatcher in R77.30+ does cause some additional load on the SND/IRQ cores versus just using the hash function, but this additional dispatcher overhead is worth it 99% of the time to keep individual workers from getting overwhelmed. I seriously doubt adding the dport number to the worker hash calculation would cause any significant additional overhead on the dispatcher(s).

--
"IPS Immersion Training" Self-paced Video Class
Now Available at http://www.maxpowerfirewalls.com

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course

HristoGrigorov · ‎2019-02-28

So, I implemented this SK and the result looks promising:

ID | Active | CPU | Connections | Peak
----------------------------------------------
0 | Yes | all | 699 | 1643
1 | Yes | all | 676 | 1601
2 | Yes | all | 660 | 1969

Note however that I assigned all 3 instances to work on all cores.

HristoGrigorov · ‎2019-02-28

Implementing sk96068 definitely changes the way traffic is distributed between the available cores. A surprise for me is that now the peak number of connections is constantly more on CPU3 than on CPU1 as it was before.

Also, I have observed this for a long time now that mostly CPU2 and CPU3 are servicing software interrupts and very occasionally there will be any on CPU0. I do not know if that comes from the linux kernel itself or from the fw dispatcher.

Timothy_Hall · ‎2019-03-01

The interrupts you see on CPU2 and CPU3 are probably hardware interrupts for frames coming up from from the NICs into ring buffers, try running sim affinity -l or cat /proc/interrupts to verify.

--
"IPS Immersion Training" Self-paced Video Class
Now Available at http://www.maxpowerfirewalls.com

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course

HristoGrigorov · ‎2019-03-01

# sim affinity -l
Multi queue interfaces: DMZ LAN WAN

# cat /proc/interrupts

146: 0 0 0 GIC al-eth-msix-mgmt@pci:0000:00:02.0
147: 37 3935375 0 GIC al-eth-rx-comp-0@pci:0000:00:02.0
148: 4 0 3941023 GIC al-eth-rx-comp-1@pci:0000:00:02.0
149: 9 22768295 0 GIC al-eth-rx-comp-2@pci:0000:00:02.0
150: 256 0 5090930 GIC al-eth-rx-comp-3@pci:0000:00:02.0
151: 20 18235802 0 GIC al-eth-tx-comp-0@pci:0000:00:02.0
152: 6 0 4893096 GIC al-eth-tx-comp-1@pci:0000:00:02.0
153: 0 5817309 0 GIC al-eth-tx-comp-2@pci:0000:00:02.0
154: 0 0 0 GIC al-eth-tx-comp-3@pci:0000:00:02.0
155: 0 0 0 GIC al-eth-msix-mgmt@pci:0000:00:00.0
156: 2250 18708550 0 GIC al-eth-rx-comp-0@pci:0000:00:00.0
157: 282 0 5828076 GIC al-eth-rx-comp-1@pci:0000:00:00.0
158: 283 6689854 0 GIC al-eth-rx-comp-2@pci:0000:00:00.0
159: 425 0 6237466 GIC al-eth-rx-comp-3@pci:0000:00:00.0
160: 14 21603625 0 GIC al-eth-tx-comp-0@pci:0000:00:00.0
161: 6 0 8865820 GIC al-eth-tx-comp-1@pci:0000:00:00.0
162: 3 9522446 0 GIC al-eth-tx-comp-2@pci:0000:00:00.0
163: 0 0 0 GIC al-eth-tx-comp-3@pci:0000:00:00.0
164: 0 0 0 GIC al-eth-msix-mgmt@pci:0000:00:01.0
165: 348 282392 0 GIC al-eth-rx-comp-0@pci:0000:00:01.0
166: 0 0 66074 GIC al-eth-rx-comp-1@pci:0000:00:01.0
167: 2 122825 0 GIC al-eth-rx-comp-2@pci:0000:00:01.0
168: 1 0 71558 GIC al-eth-rx-comp-3@pci:0000:00:01.0
169: 4 183485 0 GIC al-eth-tx-comp-0@pci:0000:00:01.0
170: 0 0 272584 GIC al-eth-tx-comp-1@pci:0000:00:01.0
171: 0 31456 0 GIC al-eth-tx-comp-2@pci:0000:00:01.0
172: 0 0 0 GIC al-eth-tx-comp-3@pci:0000:00:01.0

Here is something interesting?:

IPI2: 34126463 3968474 5174095 Rescheduling interrupts

G_W_Albrecht · ‎2019-03-05

I am currently working on your solution request SK-SR# 3-0699806904, titled:"Performance degradation on Security Gateway when port scan test is performed through Security Gateway" under ID:sk96068.

Your feedback was:
------------------
Can this kernel parameter be also set on SMB appliances (e.g. 1490) ?

------------------

I have confirmed that the kernel parameter does apply to SMB appliances. The sk has been updated based on your feedback.

CCSP - CCSE / CCTE / CTPS / CCME / CCSM Elite / SMB Specialist

HristoGrigorov · ‎2019-03-05

Wonderful! Thank you for taking the time to do it.

G_W_Albrecht · ‎2019-03-05

In fact, this only has needed the 8 sec of time to write the line into feedback 😉

CCSP - CCSE / CCTE / CTPS / CCME / CCSM Elite / SMB Specialist

HristoGrigorov · ‎2019-03-06

I am wondering what took you so long

G_W_Albrecht · ‎2019-03-07

Explanation: First typing into an outlook note, copy it, paste it into feedback field, prove that i am no robot, and voila !

I would have saved 3-4 secs by typing directly into feedback field, but unsaved content always is prone to vanish very suddenly .

CCSP - CCSE / CCTE / CTPS / CCME / CCSM Elite / SMB Specialist

HristoGrigorov · ‎2019-03-07

You are lucky that they do not require you to solve polynomial equation to prove you are not robot

Anyway, I am more and more satisfied how CoreXL works with this SK applied. Nearly perfect connection distribution:

ID | Active | CPU | Connections | Peak
----------------------------------------------
0 | Yes | 0 | 736 | 1970
1 | Yes | 1 | 752 | 1974
2 | Yes | 2 | 793 | 1981

I am going to leave it permanently enabled.

G_W_Albrecht · ‎2019-03-07

So what are you waiting for ? Please let your experiences get condensed in a document, e.g. SMB CoreXL instances are overloaded (sk96068) !

CCSP - CCSE / CCTE / CTPS / CCME / CCSM Elite / SMB Specialist

HristoGrigorov · ‎2019-03-07

Hehe, well... It is all written in Tim's book actually. It is awesome reading. Only that you are never sure what applies to Gaia Embedded and what not.

Are you a member of CheckMates?

CoreXL on SMB