Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Wyatt_Zacharias
Explorer

IPv6 BGP peering crashing routed daemon

I'm currently setting up BGP peering over IPv6 with our ISP, who is sending us the full IPv6 route table (apprx. 61k routes). The BGP session establishes, and I receive route updates from the ISP, but within 60 seconds of the session establishing, the routed process seems to crash, and the BGP session resets. 

Looking in /var/log/routed_messages it's full of errors saying: “Dec 19 14:03:39 ERROR:   KRT SEND ADD    2620:1c0:61::   mask ffff:ffff:ffff:: router 2606:xxxx:xxxx::5: Cannot allocate memory” 

As far as I know there's no tuning to be done as far as memory allocation goes, so is this a bug in routed? This is running on a 5400 gateway with 8GB RAM, and usually only 40-50% consumed. 

3 Replies
PhoneBoy
Admin
Admin

You should definitely open a TAC case for this.

0 Kudos
Wyatt_Zacharias
Explorer

Alright, so for anyone else who comes across this issue, I believe I've found the correct solution. Annoyingly TAC has been running in circles and placing very little weight on the error messages that their own daemon is producing. The key piece of information that I was able to pick up on was the prefix of the routed error message: "KRT SEND ADD" which I realised was referring to the kernel route table. With the thought that I was dealing with a kernel issue and not a routed issue, I was able to find similar issues with other IPv6 users who were adding a high number of routes to the kernel.

I was able to confirm this by performing the following command while routed was running and attempting to add routes from the BGP session:

[Expert@gateway:0]# ip -6 route add 2001:db8::1 via 2606:1234::1

RTNETLINK answers: Cannot allocate memory

Attempting to manually add a route to the kernel produces the same error that is logged in routed_messages

Apparently the Linux kernel is shipping with a default parameter of: net.ipv6.route.max_size = 4096 which is of course far too small to fit the global v6 table (over 61,000 prefixes as of 2019). I updated the limit to match the IPv4 limit of 4194304, and now routed and my BGP session have been up for the last 48hrs with no issues. 

Honestly I'm a bit surprised that something as simple a kernel parameter limit was completely unchecked by development and clearly they haven't done any real world testing of their IPv6 compatibility in this scenario as it would've been caught right away. 

PhoneBoy
Admin
Admin

At least the fix is relatively simple Smiley Happy

We should probably adjust the default setting for this kernel value, you're right.

Thank you for sharing.

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events