Who rated this post

Timothy_Hall · ‎2024-03-26

Hi Hugo,

You are reporting performance problems in the later releases and surmising that the RX ring buffer size is the cause, are you seeing an RX-DRP rate in excess of 0.1% on any interfaces? Unfortunately in Gaia 3.10 RX-DRP can now also be incremented upon receipt of a undesirable Ethertype (like IPv6) or an improperly pruned VLAN tag (what I call "trash traffic" that we can't handle anyway), not just due to a full ring buffer which causes loss of frames that we want to process. In Gaia 2.6.18 this trash traffic was silently discarded and did not increment RX-DRP.

To compute the true RX-DRP rate for ring buffer misses you'll have to do some further investigation with ethtool -S. See sk166424: Number of RX packet drops on interfaces increases on a Security Gateway R80.30 and higher .... As long as the true RX-DRP rate is below 0.1% you shouldn't worry about it. I know this can be difficult to accept but not all packet loss is bad; TCP's congestion control algorithm is actually counting on it.

On every system I've seen the ring buffer sizes are: 1GBit (256), 10Gbit (512), 40-100Gbit (1024). Are you sure you are not looking at the TX ring buffer setting? Those are sometimes smaller. Generally it is not a good idea to increase ring buffer sizes beyond the default unless true RX-DRPs are >0.1%, (and you have allocated all the SNDs you can) as doing so will increase jitter, possibly confusing TCP's congestion control algorithm and causing Bufferbloat, which actually makes performance worse under heavy load.

As to the ring buffer size of 9368 for the hv_netvsc driver in Azure reported by @the_rock, it is my impression that Azure (or perhaps the current kernel version used by Gaia) does not support vRSS which means multi-queue is not supported either. Therefore no matter how many SND cores you have only one of them can empty a particular interface's ring buffer. I'd surmise that the strangely large 9368 RX ring buffer size is an attempt to compensate for this limitation, and it may even increase itself automatically as load demands. This has already been discussed to some degree here: distribution SNDs in hyper-v environment

But it may not matter since traditionally the ring buffer was used as a bit of a "shock absorber" to absorb the impact of a flood of frames making the transition across the bus from a hardware-driven buffer (on the NIC card itself) to a software-driven RX ring buffer in RAM. But in Azure everything is software and there is no physical NIC, at least that is exposed to the VM. That is why there can probably never be interface errors like RX-ERR or possibly even RX-OVR so the size of the ring buffer may not really matter either.

Once again if the "true" RX-DRP rate is <0.1% I'd leave the RX ring buffer size at the default for physical interfaces. Read the Bufferbloat article for a good explanation of why.

Attend my 60-minute "Be your Own TAC: Part Deux" Presentation
Exclusively at CPX 2025 Las Vegas Tuesday Feb 25th @ 1:00pm