Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Diego_Barba
Contributor

Frequent memory leaks of four different 3600 appliances running R80.30 last JHF

Hello, we have four new 3600 appliances (two active/standby clusters, each one managed by a different Management) and the four of them are having memory leakages: after 1 month working without any issue the free memory is constantly decreasing until it is very low (less than 10%) and eventually the devices stop forwarding traffic, and memory errors are returned when we execute operations such as "fw monitor", "fw ctl zdebug drop", "vpn tu" etc... the issue is fixed by rebooting, but the issue reappears again one month later. 

The strange thing is that it happens on 4 different GWs, configured on two clusters with two different policies and two different Management Servers, so there is no relationship between them, and they have a pretty simple configuration, with no additional blades apart from the VPN  (there is no NGTP, URLF, APPCTL, IPS...) and a very low amount of traffic being processed by the GWs.

We have a open SR but it is taking a lot of time to analyze it because we need to wait until the issue appears again and execute the leak_detection script (sk35496)

Please, has anyone had experienced such issues with 3600 appliances?

Best regards

 

0 Kudos
4 Replies
Timothy_Hall
Champion
Champion

The 3600s are fairly new and utilizing User Space Firewall (USFW), so unfortunately a real memory leak in process space will eventually cause the behavior you describe, whereas a kernel-based firewall would mostly continue to pass traffic even if all memory is exhausted in process space as the kernel pre-allocates all memory it needs.  When the boxes start to get low on memory, run top then hit capital M to sort processes by top memory usage.  Should give you an indication of where to look.

 

New 2021 IPS/AV/ABOT Self-Guided Video Series
now available at http://www.maxpowerfirewalls.com
shais
Employee
Employee

The issue is unrelated to USFW vs KMFW and in both modes, if GW suffers from a memory leak we will reach the above situation.

 In both USFW and KMFW we allocate pools for our memory, the difference is that in USFW we also able to free up the empty pools back to the OS.

The issue also does not sound to be related to the appliance itself.

Can you please provide the SR number? (can also be in private) 

Diego_Barba
Contributor

Thanks! I have sent you the SR number. In the SR we are waiting for the issue to appear again and execute the leak_detection script but usually the issue takes one month to reappear and it the meantime we have the risk to have another total loss of service... so, in the meatime we were investigating if sk103154 could be the cause of the issue.

0 Kudos
Diego_Barba
Contributor

Thanks for your help! I suspect that sk103154 could be the cause for the issue. We configured it to block the malicious IPs of the dynamic lists on opendbl.net as we have done on other devices (than didn't have this problem). But yesterday we noticed that the behaviour of the RAM when sk103154 is enabled on the 3600s is very different from those other devices, in particular the free RAM of the standby devices are decreasing very quick, at the same rate as the active units, if we disable sk103154, the free RAM stops decreasing so fast. Maybe the mechanism used by sk103154 (rate limiting rules) doesn't behave well on USFW devices?

0 Kudos