Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
heavysoul
Participant
Jump to solution

5600 Gateway 80% memory utilisation

Hi Guys

 

I have 5600 gateway (r80.20 Take149) running at 80% memory utilisation

Gateway sits in customer DR site

Output of fwd debug is showing: -

fwarp_initialize_myself: unable to find mac address for interface eth2.506
fwarp_initialize_myself: unable to find mac address for interface eth2.507
fwarp_initialize_myself: unable to find mac address for interface eth2.505
fwarp_initialize_myself: unable to find mac address for interface eth2.508
fwarp_initialize_myself: unable to find mac address for interface eth4.998
fwarp_initialize_myself: unable to find mac address for interface eth2.501
fwarp_initialize_myself: unable to find mac address for interface eth2.586
fwarp_initialize_myself: unable to find mac address for interface eth2.511
fwarp_initialize_myself: unable to find mac address for interface eth2.504

 

can anyone advise if the above output is a likely cause of high memory?

 

sk30154 advises the interface name must not match the name given by OS and recommends replacing the '.' with '_' eg eth2.506 should have interface name changed to eth2_506

does this mean i cannot use meaningful interface names?

 

 

thanks in advance

gary

 

1 Solution

Accepted Solutions
HeikoAnkenbrand
Champion Champion
Champion

Hi @heavysoul,

With this little oneliner you can find the processes that consume the most memory:

ps -ax -o %mem,command | sort -b -r -k1

Then you can use the following sk97638 "Check Point Processes and Daemons" to identify the processes or daemons. Send the output of the oneliner. Then we can analyze further more.

 

➜ CCSM Elite, CCME, CCTE

View solution in original post

17 Replies
MartinTzvetanov
Advisor
What happens if you switch to DR device? Any problems? Memory utilization is not a problem until your device starts eating swap.
0 Kudos
heavysoul
Participant
gateway is vrrp cluster active device at DR site
0 Kudos
Timothy_Hall
Champion
Champion

Please post output of the free -m command on your active 5600 for analysis, it is possible free memory is not low and just being used for buffering/caching.  It is doubtful those error messages are causing high memory utilization.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
heavysoul
Participant

currently down to 34% today!

 

[Expert@VIR-PDN-EXT-FW-01:0]# free -m
total used free shared buffers cached
Mem: 7744 3871 3873 0 284 982
-/+ buffers/cache: 2603 5141
Swap: 18394 0 18394

0 Kudos
Timothy_Hall
Champion
Champion

You have 8GB total RAM and only about 2.6GB (2603) is being used for code execution.  So you really have 5.1GB free (5141) which is plenty.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
RS_Daniel
Advisor

Hello @Timothy_Hall,

I hope you can provide some guidence about a similar issue. We have Kernel memory constantly increasing and not being freed.

Top+M shows the process fwd with 4.4 usage% but anything else with high memory usage. The amount of connections is very low considering the capacity of the appliance (5400 - 500 concurrent connections). So we know it is not a process and neither amount of connections. Reviewing cpview history i see the memory has being increasing very slowly during last month. It has happened before and solved it with a reboot, but we are looking for a better way to get freed the memory. Any help is appreciated. Thanks. Some info:

free -m.jpg

[Expert@]# fwaccel stats -s
Accelerated conns/Total conns : 256/1028 (24%)
Accelerated pkts/Total pkts : 17824824/39793598 (44%)
F2Fed pkts/Total pkts : 7869816/39793598 (19%)
PXL pkts/Total pkts : 14098958/39793598 (35%)
QXL pkts/Total pkts : 0/39793598 (0%)

 

[Expert@]# fw ctl pstat

System Capacity Summary:
Memory used: 90% (5196 MB out of 5731 MB) - above watermark
Concurrent Connections: 1206 (Unlimited)
Aggressive Aging is enabled, active

Hash kernel memory (hmem) statistics:
Total memory allocated: 4235825152 bytes in 1034137 (4096 bytes) blocks using 20 pools
Initial memory allocated: 599785472 bytes (Hash memory extended by 3636039680 bytes)
Memory allocation limit: 4806672384 bytes using 512 pools
Total memory bytes used: 0 unused: 4235825152 (100.00%) peak: 4001190264
Total memory blocks used: 0 unused: 1034137 (100%) peak: 1004782
Allocations: 3060675661 alloc, 0 failed alloc, 3034033501 free

System kernel memory (smem) statistics:
Total memory bytes used: 5379542632 peak: 5671291880
Total memory bytes wasted: 4201275
Blocking memory bytes used: 5087720 peak: 8177700
Non-Blocking memory bytes used: 5374454912 peak: 5663114180
Allocations: 1048282425 alloc, 0 failed alloc, 1048280073 free, 0 failed free
vmalloc bytes used: 5370281956 expensive: no

Kernel memory (kmem) statistics:
Total memory bytes used: 4859302508 peak: 5318277140
Allocations: 4108915199 alloc, 0 failed alloc
4082271736 free, 0 failed free
External Allocations: 33024 for packets, 80635321 for SXL

Cookies:
2184255807 total, 695830 alloc, 695830 free,
15628032 dup, 3575396584 get, 878056976 put,
2623977660 len, 195 cached len, 0 chain alloc,
0 chain free

Connections:
240650183 total, 83090101 TCP, 40935622 UDP, 116546246 ICMP,
78214 other, 27238 anticipated, 69382 recovered, 1206 concurrent,
65030 peak concurrent

Fragments:
776 fragments, 100 packets, 0 expired, 0 short,
0 large, 0 duplicates, 0 failures

NAT:
415636831/0 forw, 348741769/0 bckw, 533236043 tcpudp,
224988272 icmp, 157642066-244693149 alloc

Sync:
Version: new
Status: Able to Send/Receive sync packets
Sync packets sent:
total : 464898224, retransmitted : 43, retrans reqs : 155, acks : 373017
Sync packets received:
total : 76958557, were queued : 9539, dropped by net : 93
retrans reqs : 40, received 202344 acks
retrans reqs for illegal seq : 0
dropped updates as a result of sync overload: 0
Callback statistics: handled 199016 cb, average delay : 1, max delay : 9

0 Kudos
Timothy_Hall
Champion
Champion

The kernel memory usage looks high, but is it actually causing any problems?  All memory utilized by the kernel must reside permanently in RAM, it cannot be swapped or paged to disk.  Based on the your fw ctl pstat output it looks like the kernel is just fine, and even getting enough memory for full hash operation to optimize table lookups.  To some degree usage of kernel memory may increase over time as various tables and such are expanded to meet demand, but if the kernel was to actually start running out with failed allocations and such I'm pretty sure some of those ballooned allocations will be trimmed back.

You are tipped over into swap space to the tune of 782MB, but that number shows peak swap utilization since the system was booted and never goes back down in my experience.  That number may be getting spiked by a policy load which takes a great deal of memory to complete.  Run sar -B and use the -f option to look at up to 30 days of history (especially page outs & major faults), are the swap statistics growing slowly over time (possibly indicating a memory leak) or spiking up at time correlated with policy installs?

 

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
HristoGrigorov

Tim, aggressive aging is active and that is never good ?

0 Kudos
Kaspars_Zibarts
Employee Employee
Employee
Indeed! AA active at 1200 conns?! Very strange
0 Kudos
Timothy_Hall
Champion
Champion

It is due to >70% memory usage, the connections table capacity must be set to "Automatically":

sk122154: How is Aggressive Aging enforced when Concurrent Connections Capacity Limit is calculated ...

 

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
RS_Daniel
Advisor

Hi @Timothy_Hall ,

No it is not causing any problem at this moment, only the fact that cpview shows the kernel usage increasing constantly and wanted to prevent any possible future problem when it reachs 100%, do you consider it can happen?

I looked up into sar files but only found the last seven days, pgpgout/s fault/s are relatively constant sinc Jun 29, only some spikes in times of policy install as you said.

Also saw the appliance is up 115 days now (with many failovers int the middle), it has happend many times before with an interval of 3-5 months to reach a porcentage >90, so i understand it should be a normal behavior? Some info below shows how swap and memory usage are bigger, but pgpgout/s also had a spike bigger than the last time i executed the commands so i think that is causing it, am i rigth?. 

[Expert@GW-LPZ-BORDE-2:0]# free -m
total used free shared buffers cached
Mem: 7744 7515 229 0 131 671
-/+ buffers/cache: 6711 1033
Swap: 18394 1011 17382

 

[Expert@GW-LPZ-BORDE-2:0]# fw ctl pstat

System Capacity Summary:
Memory used: 93% (5383 MB out of 5731 MB) - above watermark
Concurrent Connections: 612 (Unlimited)
Aggressive Aging is enabled, active

Hash kernel memory (hmem) statistics:
Total memory allocated: 4424957952 bytes in 1080312 (4096 bytes) blocks using 25 pools
Initial memory allocated: 599785472 bytes (Hash memory extended by 3825172480 bytes)
Memory allocation limit: 4806672384 bytes using 512 pools
Total memory bytes used: 0 unused: 4424957952 (100.00%) peak: 4193062196
Total memory blocks used: 0 unused: 1080312 (100%) peak: 1052989
Allocations: 2980496289 alloc, 0 failed alloc, 2952639910 free

System kernel memory (smem) statistics:
Total memory bytes used: 5568670872 peak: 5838527024
Total memory bytes wasted: 4079667
Blocking memory bytes used: 4987616 peak: 8177700
Non-Blocking memory bytes used: 5563683256 peak: 5830349324
Allocations: 1092094529 alloc, 0 failed alloc, 1092092196 free, 0 failed free
vmalloc bytes used: 5559510164 expensive: no

Kernel memory (kmem) statistics:
Total memory bytes used: 5032959588 peak: 5500419608
Allocations: 4072547246 alloc, 0 failed alloc
4044689588 free, 0 failed free
External Allocations: 26112 for packets, 79478726 for SXL

Cookies:
2263079737 total, 695830 alloc, 695830 free,
15882465 dup, 49138946 get, 904540595 put,
2711675483 len, 195 cached len, 0 chain alloc,
0 chain free

Connections:
250248669 total, 86796591 TCP, 41654708 UDP, 121716227 ICMP,
81143 other, 29241 anticipated, 70486 recovered, 612 concurrent,
65030 peak concurrent

Fragments:
776 fragments, 100 packets, 0 expired, 0 short,
0 large, 0 duplicates, 0 failures

NAT:
430178365/0 forw, 360781479/0 bckw, 549675981 tcpudp,
234841653 icmp, 164065797-253546174 alloc

Sync:
Version: new
Status: Able to Send/Receive sync packets
Sync packets sent:
total : 484478131, retransmitted : 45, retrans reqs : 155, acks : 387419
Sync packets received:
total : 78309972, were queued : 9539, dropped by net : 93
retrans reqs : 42, received 203390 acks
retrans reqs for illegal seq : 0
dropped updates as a result of sync overload: 0
Callback statistics: handled 200037 cb, average delay : 1, max delay : 9

Thanks for your help here.

0 Kudos
HristoGrigorov

I'd say that more than 90% allocated memory is something to worry about. What if the OS needs memory for some "heavy" operation (e.g. policy install or file scan) ? Swap won't save you much in such case, OOM will be invoked and bad things will happen.

If I was you I will consider rebooting firewall as soon as possible and also think of memory upgrade.

0 Kudos
HeikoAnkenbrand
Champion Champion
Champion

Hi @heavysoul,

With this little oneliner you can find the processes that consume the most memory:

ps -ax -o %mem,command | sort -b -r -k1

Then you can use the following sk97638 "Check Point Processes and Daemons" to identify the processes or daemons. Send the output of the oneliner. Then we can analyze further more.

 

➜ CCSM Elite, CCME, CCTE
heavysoul
Participant

thanks Heiko

 

8.4 fwd
5.0 vpnd 0
42.2 wsdnsd
2.9 /bin/monitord
1.3 cpd
1.2 /opt/CPda/bin/DAService
0.7 rtmd
0.7 /usr/sbin/snmpd -f -c /etc/snmp/userDefinedSettings.conf
0.6 in.geod 0
0.5 in.acapd 0
0.4 cpviewd
0.3 dtpsd 0
0.2 /opt/AutoUpdater/latest/bin/AutoUpdater
0.2 /bin/confd
0.1 sxl_statd
0.1 mpdaemon /opt/CPshrd-R80.20/log/mpdaemon.elg /opt/CPshrd-R80.20/conf/mpdaemon.conf
0.1 dtlsd 0
0.1 cpview_historyd
0.1 cphamcset -d
0.1 /bin/routed -i default -f /etc/routed0.conf -h 1
0.1 /bin/routed -N
0.1 /bin/rconfd /etc/actions_mapping.xml
0.1 /bin/pm
0.1 /bin/cloningd

Timothy_Hall
Champion
Champion

Uhh, the wsdnsd process is consuming 42% of physical memory?  That can't be right.  Is your gateway configured as a HTTP/HTTPS proxy?  Are you using updatable objects?  The wsdnsd daemon handles DNS resolution duties for these features.  Not sure if this situation could be driving your high kernel memory usage but is a serious red flag for sure.  You might want to try restarting this daemon to see what happens to memory usage, but be warned that doing so may cause an outage for a few moments:

cpwd_admin stop -name WSDNSD -path "$FWDIR/bin/wsdnsd" -command "kill -SIGTERM $(pidof $FWDIR/bin/wsdnsd)"

cpwd_admin start -name WSDNSD -path "$FWDIR/bin/wsdnsd" -command "wsdnsd"

Also see sk165616: WSDNSD memory leak when Updatable Objects are configured in the policy

Also please post the output of enabled_blades to provide some context as to how much memory should actually be utilized on your firewall (more features=more memory usage).

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
Timothy_Hall
Champion
Champion

@HeikoAnkenbrand for your one-liner you may want to add -n to your sort command (ps -ax -o %mem,command | sort -b -r -k1 -n) so that the sorting happens numerically and not in a lexical fashion, to ensure the biggest double-digit numerical percentages are always shown first.  So for example on my lab system:

ps -ax -o %mem,command | sort -b -r -k1

4.4 /opt/CPshrd-R80.40/jre_32/bin/java -Xmx256m -Xms128m -Xshareclasses:none -D
3.0 /opt/CPshrd-R80.40/jre_32/bin/java -D_solr=TRUE -Xdump:directory=/var/log/d
2.2 /opt/CPshrd-R80.40/jre_32/bin/java -D_smartview=TRUE -Xdump:directory=/var/
12.4 /opt/CPshrd-R80.40/jre_32/bin/java -D_CPM=TRUE -Xaot:forceaot -Xmx1024m -Xm
1.8 fgd50
1.8 /opt/CPshrd-R80.40/jre_32/bin/java -D_CPM_SOLR=TRUE -Xmx512m -Xms64m -Xgcpo
1.6 cpd
1.5 fwm
1.4 cpsemd
1.1 /opt/CPshrd-R80.40/jre_32/bin/java -D_RFL=TRUE -Xdump:directory=/var/log/du

...

ps -ax -o %mem,command | sort -b -r -k1 -n

12.4 /opt/CPshrd-R80.40/jre_32/bin/java -D_CPM=TRUE -Xaot:forceaot -Xmx1024m -Xm
4.4 /opt/CPshrd-R80.40/jre_32/bin/java -Xmx256m -Xms128m -Xshareclasses:none -D
3.0 /opt/CPshrd-R80.40/jre_32/bin/java -D_solr=TRUE -Xdump:directory=/var/log/d
2.2 /opt/CPshrd-R80.40/jre_32/bin/java -D_smartview=TRUE -Xdump:directory=/var/
1.8 fgd50
1.8 /opt/CPshrd-R80.40/jre_32/bin/java -D_CPM_SOLR=TRUE -Xmx512m -Xms64m -Xgcpo
1.6 cpd
1.5 fwm
1.4 cpsemd
1.1 /opt/CPshrd-R80.40/jre_32/bin/java -D_RFL=TRUE -Xdump:directory=/var/log/du

...

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
HeikoAnkenbrand
Champion Champion
Champion

Hi @Timothy_Hall 

Thanks!

I typed that without testing the command. I forgot '-n', unfortunately!

➜ CCSM Elite, CCME, CCTE

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events