CP5600 Memory Exhaustion

Chris_Sanduliak · ‎2019-06-21

We have a couple of CP5600 operating in different locations with very similar configurations. The load is about the same. Each is running r80.10 - T189.

Location B is stable and running without issues, but Location A we have to reboot about once every 45 days due to memory issues. Whatever is happening, affects the dataplane. IE, Fw stops forwarding packets.

This is the memory output for location A:

System Capacity Summary:
Memory used: 77% (4455 MB out of 5731 MB) - below watermark
Concurrent Connections: 10410 (Unlimited)
Aggressive Aging is enabled, not active

Hash kernel memory (hmem) statistics:
Total memory allocated: 3737321472 bytes in 912432 (4096 bytes) blocks using 14 pools
Initial memory allocated: 599785472 bytes (Hash memory extended by 3137536000 bytes) - 3.1GB?
Memory allocation limit: 4806672384 bytes using 512 pools
Total memory bytes used: 0 unused: 3737321472 (100.00%) peak: 3426386556
Total memory blocks used: 0 unused: 912432 (100%) peak: 861288
Allocations: 4163792559 alloc, 0 failed alloc, 4140371486 free

System kernel memory (smem) statistics:
Total memory bytes used: 4598247500 peak: 4608745920
Total memory bytes wasted: 3721660
Blocking memory bytes used: 4784944 peak: 9567848
Non-Blocking memory bytes used: 4593462556 peak: 4599178072
Allocations: 13741524 alloc, 0 failed alloc, 13738637 free, 0 failed free
vmalloc bytes used: 4588389496 expensive: no

Kernel memory (kmem) statistics:
Total memory bytes used: 4143730832 peak: 4231943656
Allocations: 4177522403 alloc, 0 failed alloc
4154099588 free, 0 failed free
External Allocations: 16896 for packets, 88628453 for SXL

Cookies:
3778625491 total, 0 alloc, 0 free,
150073 dup, 300575262 get, 2794359219 put,
2072999334 len, 2707089222 cached len, 0 chain alloc,
0 chain free

Connections:
388319874 total, 136725382 TCP, 231455561 UDP, 19560665 ICMP,
578266 other, 30721 anticipated, 195046 recovered, 10410 concurrent,
159214 peak concurrent

Fragments:
1118953332 fragments, 2706956154 packets, 3456 expired, 0 short,
0 large, 0 duplicates, 848 failures

NAT:
67013/0 forw, 52962/0 bckw, 982 tcpudp,
0 icmp, 5906-17579 alloc

Sync: off

[Expert@LocationA:0]# free -m
total used free shared buffers cached
Mem: 7744 7580 164 0 333 1837
-/+ buffers/cache: 5409 2334
Swap: 18394 0 18394

This is Location B:

System Capacity Summary:
Memory used: 9% (539 MB out of 5731 MB) - below watermark
Concurrent Connections: 8560 (Unlimited)
Aggressive Aging is enabled, not active

Hash kernel memory (hmem) statistics:
Total memory allocated: 599785472 bytes in 146432 (4096 bytes) blocks using 1 pool
Total memory bytes used: 0 unused: 599785472 (100.00%) peak: 27427 7488
Total memory blocks used: 0 unused: 146432 (100%) peak: 69627
Allocations: 1607331344 alloc, 0 failed alloc, 1607117916 free

System kernel memory (smem) statistics:
Total memory bytes used: 967638752 peak: 986044552
Total memory bytes wasted: 4180014
Blocking memory bytes used: 5820820 peak: 14955252
Non-Blocking memory bytes used: 961817932 peak: 971089300
Allocations: 151132250 alloc, 0 failed alloc, 151129180 free, 0 failed free
vmalloc bytes used: 956763424 expensive: no

Kernel memory (kmem) statistics:
Total memory bytes used: 401812380 peak: 640749756
Allocations: 1758439658 alloc, 0 failed alloc
1758224295 free, 0 failed free
External Allocations: 76032 for packets, 89765022 for SXL

Cookies:
1450833429 total, 836424 alloc, 836424 free,
251 dup, 433718314 get, 2695578081 put,
2263227759 len, 2298121504 cached len, 0 chain alloc,
0 chain free

Connections:
1628040697 total, 660800638 TCP, 927853823 UDP, 39386225 ICMP,
11 other, 288832 anticipated, 441738 recovered, 8560 concurrent,
161987 peak concurrent

Fragments:
302418965 fragments, 2297426537 packets, 2476610 expired, 0 short,
0 large, 0 duplicates, 1969 failures

NAT:
0/0 forw, 0/0 bckw, 0 tcpudp,
0 icmp, 0-27257 alloc

Sync: off

[Expert@locationB:0]# free -m
total used free shared buffers cached
Mem: 7744 7555 189 0 419 4896
-/+ buffers/cache: 2239 5504
Swap: 18394 0 18394

The only difference I can find between the two is that Location A is using Extended memory hash tables, but I don't know what would cause this behavior?

Timothy_Hall · ‎2019-06-21

One thing that caught my eye is you have an absolute crapload of fragmented traffic at Site A as compared to Site B. For the process of virtual reassembly, a table called frag_table is employed which is part of your hash memory allocation and could be what is causing your hash memory allocation to balloon like that. Run fw tab -t frag_table -s, what is the size of that table? My guess is it is very large. Fragmented traffic is a problem since it can't be accelerated at all by SecureXL in R80.10 and earlier, run this command to show all fragments coming into the firewall in real time:

tcpdump -eni any '((ip[6:2] > 0) and (not ip[6] = 64))'

Where are the frags coming from? If they come from networks you control, you have an MTU consistency problem in your network. For more info see here:

sk113992: How to configure timeout for fragmented packets on Security Gateway with disabled IPS blad...

Attend my 60-minute "Be your Own TAC: Part Deux" Presentation
Exclusively at CPX 2025 Las Vegas Tuesday Feb 25th @ 1:00pm

Chris_Sanduliak · ‎2019-06-21

Thank you Timothy

I ran the following "fw tab -t frag_table -s" on Site A:

[Expert@LocationA:0]# fw tab -t frag_table -s
HOST NAME ID #VALS PEAK #SLINKS
localhost frag_table 8184 0 11 0

This was from SiteB:

[Expert@LocationB:0]# fw tab -t frag_table -s
HOST NAME ID #VALS #PEAK #SLINKS
localhost frag_table 8184 0 196 0

I don't know if this was the output expected but I really appreciate the help.

I'll definitely be filing that tcpdump command away for later. There is one host in particular that I can see generating a lot of fragmented traffic and it is something we might be able to control. Is it possible for a few hosts to balloon the memory hash tables like this?

Cheers

Are you a member of CheckMates?

CP5600 Memory Exhaustion