Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
kamilazat
Collaborator

Is it normal for routed daemon to use 3GB of RAM for 12k OSPF routes?

Hello everyone,

One of our customers constantly having issues with failovers due to routed PNOTE. We noticed that they have around 12k OSPF routes. After the failover we see that the routed daemon on the newly active node already uses 3GB of memory. Here's the output from the currently active node after failover:

 

[Global] ch02-01> show routed memory

1_01:

Total Memory Usage:     32 MB

    Core:                3 MB

    BGP:               360  B

    MFC:               236  B

    OSPF:               28 MB

    Policy:              2 KB

2_01:

Total Memory Usage:      3 GB

    Core:                3 MB

    BGP:               360  B

    MFC:               236  B

   OSPF:                3 GB

    Policy:              2 KB

[Global] MAB-CL1-ch02-01> show routed resources

1_01:

Total Uptime                  : 3 hrs 9 mins 44 secs

Total User Time               : 3 mins 2 secs

Total System Time             : 13 secs  

Page Faults                   : 85       

Page Reclaims                 : 16070    

Total Swaps:                  : 0        

Voluntary Context Switches    : 624724   

Involuntary Context Switches  : 70850    

2_01:

Total Uptime                  : 21 days 1 hr 49 mins 52 secs

Total User Time               : 12 hrs 3 mins 26 secs

Total System Time             : 27 mins 46 secs

Page Faults                   : 807      

Page Reclaims                 : 1023538  

Total Swaps:                  : 0        

Voluntary Context Switches    : 63305978 

Involuntary Context Switches  : 12788009 

 

At first, we thought that there was some memory leak going on, but there are no coredumps or anything that points to a memory leak. TAC agreed with us on this as well. What I'm wondering is if it's normal when there are 12k OSPF routes resulting in 3GB memory use. Or should we look at some underlying issues that ends up with this symptom?

If this is normal, then how much would route aggregation help us with the situation (before we go for a memory increasing path)?

 

As always, thank you very much.

Cheers!

0 Kudos
10 Replies
Lesley
Leader Leader
Leader

What version you are running and jumbo take? (cpinfo -y all)

Maybe worth checking this:

Best Practice - Limit OSPF areas to about 50 routers based on the limitations of OSPF (traffic overhead, table size, convergence, and so on).

-------
If you like this post please give a thumbs up(kudo)! 🙂
0 Kudos
kamilazat
Collaborator

@Lesley Thank you for the reply. The version is R81.10 JHF Take 156. The behavior is seen since Take 95. I'm not sure there will be any control on OSPF routers though.

0 Kudos
Chris_Atkinson
Employee Employee
Employee

How stable is the overall OSPF topology?

How many neighbors / what does it look like?

Is graceful restart used?

 

CCSM R77/R80/ELITE
0 Kudos
kamilazat
Collaborator

- Graceful restart is on.

- The output of 'show ospf neighbor' looks like this:

show ospf neighbor

Instance: default
Neighbor state flag: G - graceful restart

Neighbor ID Pri State Dead Address Interface Errors
x.a.a.a     0  FULL   38 x.x.x.162 x.x.x.168 35
x.b.b.b     0  FULL   40 x.x.x.163 x.x.x.168 33
x.c.c.c     0  FULL   35 x.x.x.164 x.x.x.168 21
x.d.d.d     0  FULL   35 x.x.x.165 x.x.x.168 28
x.e.e.e     0  FULL   36 x.x.x.166 x.x.x.168 27
x.f.f.f     0  FULL   37 x.x.x.167 x.x.x.168 37

 

I don't know how stable the overall OSPF topology is, but for what it's worth, here's the OSPF summary:

FW1> show route summary 

RouteSource      Networks  

connected        5         
kernel           2         
static           9         
aggregate        0         
bgp              0         
igrp             0         
ospf(default)    12901     
rip              0         
nat-pool         0         
Total            12917  

I think it could probably be considered 'relatively' stable since we're also working on aggregating the OSPF routes. But I don't have any information about potential reasons that may cause it to become unstable, like LSA flooding, timer differences etc.

By the way, routed daemon somehow released 3GB memory and now it's at 39MB. For the last two months, there have been multiple failovers due to routed crashing 'without' a coredump. We're currently trying to find the underlying reason with TAC, but they haven't been extremely helpful so far.

0 Kudos
the_rock
Legend
Legend

I alsways found that ospf/bgp in general is better in R81.20 than before. Not saying upgrading would fix the issue, but Im sure it would get better.

Andy

0 Kudos
kamilazat
Collaborator

Well, a new version usually means better code, of course. But we couldn't find a resolved issue that addresses our situation and blindly updating doesn't seem like a good idea currently.

0 Kudos
PhoneBoy
Admin
Admin

Send me the TAC SR in a PM.

0 Kudos
JozkoMrkvicka
Authority
Authority

do you see routed restarting (ps aux | grep routed) ?

What remote OSPF peers say ? Do they see some strange things happening, like restarting or something not correct ?

Kind regards,
Jozko Mrkvicka
0 Kudos
kamilazat
Collaborator

routed keeps using more and more memory until it crashes and restarts, which results in failover. I will ask for information from remote peers, but not sure about what we'll be looking for. What should be looking for on the remote peers?

0 Kudos
JozkoMrkvicka
Authority
Authority

If they see some strange logs related to OSPF or maybe compare configs if everything is correctly.

Kind regards,
Jozko Mrkvicka
0 Kudos