Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
ChoiYunSoo
Participant

I have a question about the LACP (8023AD) Bond interface.

There was a service delay at the customer recently.

I think it's because of the structural problem of the bonding interface.

 

The customer firewall structure is as follows.

* Version: R80.20 Take161

* HA: VRRP

* Bonding mode: 8023AD

* xmit-hash-policy: Layer2

* Interface: 1Gbps Fiber

Finally, it is a structure divided into internal and external parts composed of bonding the upper two interfaces and bonding the lower two interfaces.

 

The service delay occurred when about 1.5 Gbps of traffic came in.

At this time, ping loss also occurred when ping check was performed with the interface outside the firewall.

 

While looking for something unusual, most of the TX traffic was being processed on only one interface.

However, when checking when service delay occurred in cpview history, it was confirmed that traffic was processed in the other interface when the throughput was exceeded in one interface.

 

I didn't think this would be the root problem, but I changed it to xmit-hash-policy: Layer3+4 for traffic distribution.

(refer to sk111823)

And when monitoring again, traffic distribution was good, and even when traffic close to 2Gbps came in, there was no service delay or ping loss.

 

From the above symptoms alone, the service delay seems to be caused by improper traffic distribution when more than 1 Gbps of traffic enters the firewall.

 

When the bandwidth of one interface becomes full in the layer2 method, does the other interface handle the additional traffic?

Or is it MAC-based, forwarding traffic to one interface even when the bandwidth is full?

 

And what were the other factors that caused service delays?

Note that no network configuration changes have been made.

 

Thanks for reading this long article.

0 Kudos
6 Replies
Timothy_Hall
Champion
Champion

For 802.3ad bonds on a firewall, the Layer3+4 distribution setting should ALWAYS be used.  This is because practically all traffic utilizing a firewall's bonded interface is transiting the firewall, and not destined for the firewall's interface IP itself.  If the Layer 2 setting is used, all traffic will congregate on only one physical link as the destination MAC address for all traffic is always the same (that of the firewall), and in most architectures the source MAC address will always be the same as well (that of a core router or other Layer 3 routing device adjacent to the firewall).  As a result the same physical interface will always be chosen for all traffic by the transmit hash function, no matter how high the traffic load goes.  When the load exceeds the physical link speed of the interface drops will start to occur, it will not try to put more traffic on a less-congested interface.

Also keep in mind that the hash on the firewall is for the TRANSMIT side of traffic only.  If there is a serious imbalance on the reception side of the physical interfaces for a firewall bond, you need to check the transmit hash setting on the other device and make sure it is set to Layer3+4 as well.  While I do not believe it is strictly required to set the identical transmit hash policy setting on both sides, it is strongly recommended to help keep the physical interface utilization as balanced as possible.  While the balancing will never be perfect, my own rule of thumb is that if there is more than a 25% imbalance between the bonded physical interfaces shown by the netstat -ni "OK" counters, some investigation is warranted.

This was all covered on pages 70-72 of my book.

New 2021 IPS/AV/ABOT Self-Guided Video Series
now available at http://www.maxpowerfirewalls.com
0 Kudos
ChoiYunSoo
Participant

Dear Timothy Hall

I've checked your answer.

Let me summarize what you were trying to convey to me.

1. Even if the traffic exceeds the bandwidth, the traffic flows to one interface because of the transmit hash.

 

2. The logic distributed and the bandwidth are irrelevant.

 

But there is something I don't understand.

When checking with CPVIEW History, it was confirmed that another interface was handling traffic when the bandwidth of the interface was exceeded.

also, If you check with the netstat -i command, no errors or drops occur.

 

I don't know how to interpret this part.

 

 

0 Kudos
Timothy_Hall
Champion
Champion

1. Even if the traffic exceeds the bandwidth, the traffic flows to one interface because of the transmit hash.

Correct, it is not adaptive based on utilization.  Basically compute the hash based on the setting and dump it out the selected physical interface.

2. The logic distributed and the bandwidth are irrelevant.

Yes, see above.

When checking with CPVIEW History, it was confirmed that another interface was handling traffic when the bandwidth of the interface was exceeded.

Yes, even if one physical interface is saturated some traffic will still happen to be selected for the other physical interface(s) by the hash function.  

also, If you check with the netstat -i command, no errors or drops occur.

Did you check the network error/drop counters on the peer device at the other end of the bond?  Most of the time drops/misses occur on the receive side, not the transmit side.  So the firewall may have been dutifully transmitting traffic out the interface at nearly 1Gbps, but the receiver may well not have been able to keep up.  The resulting frame loss probably caused the TCP streams utilizing the saturated link to back off thus decreasing performance.  Most 1Gbps interfaces start to run out of gas on the receive side at about 950Mbps and start dropping/missing frames, especially of there are a lot of small or minimum-size frames.

New 2021 IPS/AV/ABOT Self-Guided Video Series
now available at http://www.maxpowerfirewalls.com
0 Kudos
ChoiYunSoo
Participant

thank you for the reply.

It helped a lot.

Lastly, can you give me the sk or documentation you mentioned about what you're referring to?

 

0 Kudos
RamGuy239
Advisor

Hi, @Timothy_Hall 

I will just chime in here with a few questions as I'm not that knowledgeable when it comes to LACP algorithms in general. If what you are saying about "For 802.3ad bonds on a firewall, the Layer3+4 distribution setting should ALWAYS be used." is correct why is Layer2 the default setting when enabling LACP?

I've also noticed this:
PMTR-60804 -  Bond interface in XOR mode or 802.3AD (LACP) mode may experience suboptimal performance, if on the Bond interface the Transmit Hash Policy is configured to "Layer 3+4" and Multi-Queue is enabled.
To resolve: Configure the Transmit Hash Policy to the default "Layer 2".


And we have articles like sk169977 where issues recently got fixed.


As a result of all of this, I've begun to simply opt for Layer2 to avoid issues. But from what you are saying Layer3+4 should be the default and preferred option? 

0 Kudos
Timothy_Hall
Champion
Champion

Not sure why Layer 2 distribution is the default (lower overhead perhaps?) but you can get away with it as long as one of your physical interfaces does not approach the saturation point.  However as observed earlier in this thread once that happens traffic performance will be adversely affected, and LACP will not change its behavior when this occurs and will happily keep trying to dump traffic out of the saturated interface.

Yes I recall there was some kind of problematic interaction between Multi-Queue and LACP (PMTR-60804), but that has been fixed in all recent Jumbo HFAs.  

As I said earlier Layer 2 mode is fine until one of your physical interfaces gets saturated due to an imbalance, but if that is happening frequently and a physical interface dies or gets unplugged, the remaining interface would be heavily oversubscribed and you'd definitely notice that level of degradation.  So maybe you are actually better off using Layer 2 so you'll start to see performance issues before they become critical due to a physical interface failure, and you'll know you need to add more physical interfaces to the bond.

New 2021 IPS/AV/ABOT Self-Guided Video Series
now available at http://www.maxpowerfirewalls.com
0 Kudos