- CheckMates
- :
- Products
- :
- Quantum
- :
- Maestro Masters
- :
- Is it normal for routed daemon to use 3GB of RAM f...
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is it normal for routed daemon to use 3GB of RAM for 12k OSPF routes?
Hello everyone,
One of our customers constantly having issues with failovers due to routed PNOTE. We noticed that they have around 12k OSPF routes. After the failover we see that the routed daemon on the newly active node already uses 3GB of memory. Here's the output from the currently active node after failover:
[Global] ch02-01> show routed memory
1_01:
Total Memory Usage: 32 MB
Core: 3 MB
BGP: 360 B
MFC: 236 B
OSPF: 28 MB
Policy: 2 KB
2_01:
Total Memory Usage: 3 GB
Core: 3 MB
BGP: 360 B
MFC: 236 B
OSPF: 3 GB
Policy: 2 KB
[Global] MAB-CL1-ch02-01> show routed resources
1_01:
Total Uptime : 3 hrs 9 mins 44 secs
Total User Time : 3 mins 2 secs
Total System Time : 13 secs
Page Faults : 85
Page Reclaims : 16070
Total Swaps: : 0
Voluntary Context Switches : 624724
Involuntary Context Switches : 70850
2_01:
Total Uptime : 21 days 1 hr 49 mins 52 secs
Total User Time : 12 hrs 3 mins 26 secs
Total System Time : 27 mins 46 secs
Page Faults : 807
Page Reclaims : 1023538
Total Swaps: : 0
Voluntary Context Switches : 63305978
Involuntary Context Switches : 12788009
At first, we thought that there was some memory leak going on, but there are no coredumps or anything that points to a memory leak. TAC agreed with us on this as well. What I'm wondering is if it's normal when there are 12k OSPF routes resulting in 3GB memory use. Or should we look at some underlying issues that ends up with this symptom?
If this is normal, then how much would route aggregation help us with the situation (before we go for a memory increasing path)?
As always, thank you very much.
Cheers!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What version you are running and jumbo take? (cpinfo -y all)
Maybe worth checking this:
Best Practice - Limit OSPF areas to about 50 routers based on the limitations of OSPF (traffic overhead, table size, convergence, and so on).
If you like this post please give a thumbs up(kudo)! 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Lesley Thank you for the reply. The version is R81.10 JHF Take 156. The behavior is seen since Take 95. I'm not sure there will be any control on OSPF routers though.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Are we sure there are no core dumps? If you run HCP health check any interesting in there?
PRJ-56903, |
Routing |
The ROUTED daemon may exit with a core dump during a BGP or OSPF restart. |
You state now that it has been reduced from 3gb to 30mb does it slowly increase again and then it will crash around 3gb?
See anything in monitoring tool (if you use) if you see memory increase slowly maybe there is memory leak.
If you like this post please give a thumbs up(kudo)! 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, triple checked, hoping that a simple update would resolve the issue...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How stable is the overall OSPF topology?
How many neighbors / what does it look like?
Is graceful restart used?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Graceful restart is on.
- The output of 'show ospf neighbor' looks like this:
show ospf neighbor
Instance: default
Neighbor state flag: G - graceful restart
Neighbor ID Pri State Dead Address Interface Errors
x.a.a.a 0 FULL 38 x.x.x.162 x.x.x.168 35
x.b.b.b 0 FULL 40 x.x.x.163 x.x.x.168 33
x.c.c.c 0 FULL 35 x.x.x.164 x.x.x.168 21
x.d.d.d 0 FULL 35 x.x.x.165 x.x.x.168 28
x.e.e.e 0 FULL 36 x.x.x.166 x.x.x.168 27
x.f.f.f 0 FULL 37 x.x.x.167 x.x.x.168 37
I don't know how stable the overall OSPF topology is, but for what it's worth, here's the OSPF summary:
FW1> show route summary
RouteSource Networks
connected 5
kernel 2
static 9
aggregate 0
bgp 0
igrp 0
ospf(default) 12901
rip 0
nat-pool 0
Total 12917
I think it could probably be considered 'relatively' stable since we're also working on aggregating the OSPF routes. But I don't have any information about potential reasons that may cause it to become unstable, like LSA flooding, timer differences etc.
By the way, routed daemon somehow released 3GB memory and now it's at 39MB. For the last two months, there have been multiple failovers due to routed crashing 'without' a coredump. We're currently trying to find the underlying reason with TAC, but they haven't been extremely helpful so far.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I alsways found that ospf/bgp in general is better in R81.20 than before. Not saying upgrading would fix the issue, but Im sure it would get better.
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Well, a new version usually means better code, of course. But we couldn't find a resolved issue that addresses our situation and blindly updating doesn't seem like a good idea currently.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Send me the TAC SR in a PM.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
do you see routed restarting (ps aux | grep routed) ?
What remote OSPF peers say ? Do they see some strange things happening, like restarting or something not correct ?
Jozko Mrkvicka
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
routed keeps using more and more memory until it crashes and restarts, which results in failover. I will ask for information from remote peers, but not sure about what we'll be looking for. What should be looking for on the remote peers?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If they see some strange logs related to OSPF or maybe compare configs if everything is correctly.
Jozko Mrkvicka
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I will definitely have them look into it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
According to R&D, 13k routes is a lot of routes for OSPF.
From the output, it looks like 6 peers and one interface is used, correct?
OSPF exchanges router and link information, which is stored in a database.
The router then has to do a route calculation that takes into account all entries in the database to build routes as a result. Considering a deployment like this has to store a massive database and also perform very large route calculations, it is not surprising a lot of memory is being used.
Route aggregation on our gateway won't help much.
This does not decrease the number of OSPF neighbors, interfaces, and the size of the database exchanged within the OSPF network.
If the aggregation is done on other routers in the environment, it might help, but the impact may not be significant.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@PhoneBoy Thank you for the effort!
We also heard from TAC, saying that also consulted with R&D, and apparently they think that a memory leak is likely and a Valgrind debug is on the way.
Judging by cpview data, routed consumed up to 16GB memory (on a 64GB machine) and then exits. I don't know what stops it at that point, or how to remedy this.
I will update this post as soon as we have new information.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Maybe 16 GB is some sort of "limit" for it? Let us know what they say about it.
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
there are so many limitations that it is impossible to discovered them and document them properly. Some are even not known and were never tested.
Jozko Mrkvicka
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm inclined to agree there's a memory leak somewhere based on what you've reported.
