Re: Firewall priority queues setting

Jarvis_Lin1 · ‎2018-08-23

Hi Experts,

I have a customer PoC with high latency issue.

They will use CIFS for file access at the same time(usual at the beginning of work time), when FW throughput/connections become higher(1.5Gbps/50K)latency will increase from 2ms to 40ms above.

I would like to know if this setting could be optimized our situation.

sk105762

I tried to enable this feature on R80.10, could someone please help to verify the format is correct?

Thanks for help!

Summary my questions below, please share your experience to me. (high latency environment)

Many thanks!

1) As sk105762 mentioned, PrioQ will be activated only when CPU is overload.

Does it mean even one of CPU cores consumes 100% then PrioQ mechanism should be activated?

What’s the exact condition to trigger this feature?

2) Does this feature can optimize the network latency for some scenario such as CIFS file sharing? (Lan access)

3) Could someone help to verify my format is correct or need to modify?

Jarvis.

Jerry · ‎2018-08-23

sim affinity as a solution ?

Jerry

Jarvis_Lin1 · ‎2018-08-23

Hi Jerry

I use the default settings for SNDs (4 core with SMT)

CPU loading around 30%~40% at work time.

G_W_Albrecht · ‎2018-08-23

If the CIFS access is to internal ressources only you could exclude this traffic from TP...

CCSP - CCSE / CCTE / CTPS / CCME / CCSM Elite / SMB Specialist

Jarvis_Lin1 · ‎2018-08-23

Hi Günther

I only use FW blade.

JozkoMrkvicka · ‎2018-08-23

You can try to move rule which allows this traffic to the top of the rulebase, to be matched by SecureXL and be accelerated.

Kind regards,
Jozko Mrkvicka

Jarvis_Lin1 · ‎2018-08-23

Hi Jozko,

Thanks for your advice, I will try it.

Timothy_Hall · ‎2018-08-23

Modifying the Firewall Priority Queue settings is not the proper way to accomplish what you are trying to do; this feature is only intended to ensure that critical firewall control traffic is not inordinately delayed by heavy user traffic flows when a Firewall Worker core reaches 100% utilization. While Priority Queues are enabled by default in R80.10, they do not start actively prioritizing traffic until a Firewall Worker reaches 100% utilization, and only for that overloaded Firewall Worker. All other non-overloaded Firewall Workers are still doing FIFO.

You probably just need to tune the firewall to reduce the latency, please provide output from the following commands and I can advise further:

enabled_blades

fwaccel stat
fwaccel stats -s
grep -c ^processor /proc/cpuinfo

/sbin/cpuinfo
fw ctl affinity -l -r

netstat -ni

fw ctl multik stat

Most likely cause of high latency during busy times is APCL/URLF policy being forced to inspect high-speed LAN to LAN traffic due to the inappropriate use of "Any" in the Destination column of the APCL/URLF policy layer, or object Internet is not being calculated correctly due to incomplete or inaccurate firewall topology definitions.

--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com

Attend my online "Be your Own TAC: Part Deux" CheckMates event
March 27th with sessions for both the EMEA and Americas time zones

Jarvis_Lin1 · ‎2018-08-23

Hi Timothy

I run this command and save to a txt file, please refer the file, thanks.

Timothy_Hall · ‎2018-08-24

[Expert@CP15600:0]# /sbin/cpuinfo
HyperThreading=enabled

[Expert@CP15600:0]# enabled_blades
fw SSL_INSPECT

[Expert@CP15600:0]# fwaccel stats -s
Accelerated conns/Total conns : 16140/17125 (94%)
Delayed conns/(Accelerated conns + PXL conns) : 2871/16837 (17%)
Accelerated pkts/Total pkts   : 28029939/35118855 (79%)
F2Fed pkts/Total pkts   : 411002/35118855 (1%)
PXL pkts/Total pkts   : 6677914/35118855 (19%)
QXL pkts/Total pkts   : 0/35118855 (0%)

You have 16 physical cores (32 via SMT) and only 4 SND/IRQ cores which is the default. As shown above roughly 80% of the traffic crossing the firewall is being fully accelerated by SecureXL due to the limited number of blades enabled which is great, but 80% of the traffic crossing the firewall can only be processed by the 4 SND/IRQ cores which is almost certainly causing the bottleneck you are seeing.

I'd suggest disabling SMT/Hyperthreading via cpconfig which will drop you back to 16 physical cores, then assigning 10 CoreXL kernel instances via cpconfig which will allocate 6 discrete (non-hyperthreaded) SND/IRQ cores. Hyperthreading is actually hurting your performance in this particular situation; Multi-Queue is probably not necessary either and imposes additional overhead, but leave it enabled for now.

Other than that everything else looks good.

--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com

Attend my online "Be your Own TAC: Part Deux" CheckMates event
March 27th with sessions for both the EMEA and Americas time zones

Jarvis_Lin1 · ‎2018-08-24

Hi Timothy,

Thanks for your analyze,

I had setup 6 core for 2 physical NIC (012, 345) , Enable MQ, also disable SMT on other appliance two days ago.

I will increase 6 core to 10 core for SNDs as your suggest.

Does increase rx-ringsize is suggest ? or keep it to default value ?

I will replace the appliance on next week and hope everything is good.

Timothy_Hall · ‎2018-08-24

No need to adjust the ring buffer size, as that is a last resort and you have zero RX-DRPs anyway. Indiscriminately cranking up the ring buffer size can cause a nasty performance-draining effect known as BufferBloat between interfaces with widely disparate bandwidth available, at least with the typical FIFO processing of interface ring buffers.

Some Advanced Queue Management strategies such as Controlled Delay (CoDel) can be very effective in mitigating the effects of Bufferbloat on Linux systems with the 3.x kernel.

However for the first time publicly I present to all of you this very exciting (at least to me) screenshot from R80.20EA in my lab:

CoDel appears to be present in the updated R80.20EA security gateway kernel, and it is enabled by default as shown by the tc -s qdisc show command! Looks like my days of issuing dire warnings about increasing firewall ring buffer sizes are numbered...

--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com

Attend my online "Be your Own TAC: Part Deux" CheckMates event
March 27th with sessions for both the EMEA and Americas time zones

Jarvis_Lin1 · ‎2018-08-24

Hi Timothy,

Excuse me, I have another question...

In this case, if PXL pkts/Total pkts is 80%, how can I tune ?

by the way thanks for your info. Let me derive much benefit.

PhoneBoy · ‎2018-08-24

I assume this is a different gateway you're asking about?

PXL is not bad per-se, but it does mean that traffic is being subjected to blades like App Control, URL Filtering, IPS, etc.

If you were trying to get a little more performance out of a gateway, you might exclude some traffic from these blades through configuration.

JozkoMrkvicka · ‎2018-08-24

Hi Tim,

So you are saying that increasing RX/TX ring sizes to the maximum (4096) can cause some serious issues ? Even if there is valid reason for increasing it (dropped packets) ?

We have recently increased RX/TX ring sizes based on CP recommendation.

Kind regards,
Jozko Mrkvicka

Timothy_Hall · ‎2018-08-25

There is a common mantra that packet loss should be avoided at all cost, yet TCP's congestion control algorithm relies on timely dropping of packets when the network is overloaded so that all the different TCP-based connections trying to utilize the congested network link can settle at a stable transfer speed that is as fast as the network will allow. Increasing ring buffer sizes can increase jitter to the point it incurs a kind of "network choppiness" that causes all the TCP streams traversing the firewall to constantly hunt for a stable transfer speed, and they all end up backing off far more than they should in these types of situations:

1) Step down from higher-speed to lower-speed interface (i.e. 10Gbps to 1Gbps) and the lower-speed link is fully utilized

2) Traffic from multiple interfaces all trying to pile into a single fully-utilized interface

3) Traffic from two equal-speed interfaces (i.e. 1Gbps) running with high utilization, yet upstream of one of the interfaces there is substantially less bandwidth, like a 100Mbps Internet connection

Note that two equal-speed interfaces cannot have these issues occur assuming there is truly the amount of bandwidth available upstream of the two interfaces that matches their link speed. In my book I tell the sordid tale of "Screamer" and "Slowpoke", two TCP-based streams competing for a fully-utilized firewall interface that has had its ring buffers increased to the maximum size, thus causing jitter to increase by 16X. Increasing ring buffer sizes is a last resort, generally more SND/IRQ cores should be allocated first and then perhaps use Multi-Queue. I'd suggest reading the Wikipedia article about Bufferbloat.

--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com

Attend my online "Be your Own TAC: Part Deux" CheckMates event
March 27th with sessions for both the EMEA and Americas time zones

JozkoMrkvicka · ‎2018-08-25

Thank you for excellent explanation !

Time to buy your book

Kind regards,
Jozko Mrkvicka

Are you a member of CheckMates?

Firewall priority queues setting