Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
belteto
Participant
Participant

Physical memory vs FW memory. Explanation needed!

Hi All!

 

I try to understand the nature of these two parameters of the VSX vsls gateways.

What is the differences/similarities of these two parameters, Physical memory and the FW memory in the cpview.

It is a little bit foggy to me since we are investigating a behaviour.

Could someone able to explain it to me? Seems to me the Fw memory is more and more important than the physical to monitor.

I attached a picture from cpview, when the Fw memory is fully utilised but the physical is still on 50%.

In this case the gateway stop processing traffic, lot of 'internal rule base error' drops. But the gateway itself are available.

all input for this are highly welcome.

thanks in advance.

 

0 Kudos
24 Replies
PhoneBoy
Admin
Admin

Physical memory refers to the entire appliance.
Firewall memory refers to the memory allocated to the various processes and such related to firewall functions.
More information is definitely required to assist in troubleshooting this (for example version/JHF level, precise error messages and such).

belteto
Participant
Participant

Thanks for the explanation and offer to help.

for my understanding, and correct me if I'm wrong:

The Physical memory usage is alway higher than the FW memory usage because this:

Physical memory usage = Fw memory usage + OS base memory usage

And PhysMem usage is increasing when the FWmem is increasing as well.

this is what we see on other VSX's (each has 10 vs on them)   The physical men usage is ~3Gbit more than the fw men.

 

In this particular case after the reboot and latest hot fix (r81.10 T87) the fw memory usage is still higher than the physical men usages. And keep rising, very slowly.

pic attached.

 

This VSX cluster is a 3 node cp26000  96gb ram.  r81.10 T87.  (VSLS. with 39 VS and 5 switch on it)

No error message is visible now. only when the fw memory was 100% full, we got only 'internal rule base error' drop messages in the logs. nothing more. 

 

Tac is already on in and possible RnD will be involved.

I'd like to pic your and the community's brain, maybe you saw similar like this.

 

0 Kudos
PhoneBoy
Admin
Admin

It's possible there is a memory leak somewhere.
I recommend getting the TAC involved. 

0 Kudos
belteto
Participant
Participant

Yes we (Tac as well) are suspected memory leak, that is why they recommended to apply T87, which has memory fixes (as they told us)

Maybe not all memory issues was fixed. So they are still investigating.

0 Kudos
the_rock
Legend
Legend

Yea, you got that right, but as @PhoneBoy said, its possible you have memory leak going on here. To be able to properly help you, can you send us outputs of below commands:

top

free -m

ps -auxw

cpview (look at initial screen for memory usage)

cpwd_admin list

enabled_bladed

cat /proc/cpuinfo

cat /proc/meminfo

cpstat fw -f all

Cheers,

Andy

0 Kudos
belteto
Participant
Participant

Hi!

Attached the outputs.

on the 39 VS, there are 3 of them has its blade enabled 

All others has only fw. and all the connections around 90-99% accelerated.

 

Thx

Balint

0 Kudos
Teddy_Brewski
Collaborator

Hello @belteto .  Did you find anything with the TAC?

We've recently experienced the same issue with R81.20 Take 90 on open servers. The FW memory got consumed and we ended up with 'internal rule base error' drops.

The case with the TAC went nowhere. They provided with the huge list of kernel settings that need to be enabled during the moment the memory is saturated, which hasn't happened so far.

Which brings me to another question: does anyone know how to monitor (SNMP) FW memory? I can get values for RAM - Real Active and RAM - Real Free, but it's no use.

Thank you.

 

 

0 Kudos
the_rock
Legend
Legend

Hey Teddy,

Can you send what you see below when running cpview? You can also check history by running cpview -t and then t to enter the desired date onwards. By the way, do you see anything consuming high memory from top or pa -auxw commands? What does free -m show?

Andy

 

Screenshot_1.png

 

my lab:

 

 [Expert@CP-GW:0]# free -m
total used free shared buff/cache available
Mem: 23309 6555 8494 32 8259 15303
Swap: 8191 0 8191
[Expert@CP-GW:0]#

0 Kudos
Teddy_Brewski
Collaborator

Hello @the_rock 

We didn't have enough patience and time to identify what was consuming high memory from top. It was a "Mad Max" emergency troubleshooting in the middle of the night. Even initial troubleshooting went in the wrong direction: FW memory values were overlooked and everybody was focused on the state of the cluster, which was perfectly fine and healthy. The failover fixed the issue and only in the morning we noticed 'internal rule base' errors and started replaying cpview which revealed FW memory exhaustion:

Untitled1x.png

And this is how it looks now:

Untitled1.png

For some weeks we didn't experience any memory increase, so it's still under observation.

 

0 Kudos
the_rock
Legend
Legend

Ok, fair enough. So, at this point, do commands top and ps -auxw show any process consuming high memory?

Andy

0 Kudos
Teddy_Brewski
Collaborator

Nothing high:

Untitled1xxx.png

As per 'ps -auxw', the most heavy talker (2.2%) is:

admin 19372 1.7 2.2 2646548 1438580 ? S<Ll Nov16 638:14 fwk

 

0 Kudos
the_rock
Legend
Legend

That looks normal.

0 Kudos
Teddy_Brewski
Collaborator

@the_rock, @belteto I think I found what causes the memory increase.  Adding those two DOS rules increased the memory usage by ~1GB instantly, and then it continues to grow continuously.  The counter never goes down, always up.

fwaccel dos rate add destination range:192.168.100.100 pkt-rate 1000

fwaccel dos rate add destination range:192.168.100.100 concurrent-conns 10000

Where 192.168.100.100 is one of ours, quite busy, publicly exposed, web servers.

I have around ~10 similar rules for other servers, but it seems that only these ones causes continuous and noticeable memory rise. Deleting those two rules stabilizes the memory usage.

We've had this rule activated for quite some time (around a year) in r80.40, so I think it's linked to r81.20.

 

 

the_rock
Legend
Legend

Excellent work @Teddy_Brewski . Im actually glad I saw your response, because I have a call with a customer later today and they asked me this exact question about the rule you added, so I will probably tell them not to do it, if it caused all these issues.

Andy

0 Kudos
PhoneBoy
Admin
Admin

Have you opened a TAC case on this?
While I can see memory increasing somewhat, that amount doesn't seem reasonable.

0 Kudos
Teddy_Brewski
Collaborator

Going to! Now I can reproduce it live.

I think it's somehow related to the load or protocols (http/s) used with that particular server.  I don't see memory increasing with other 10 rules.

0 Kudos
the_rock
Legend
Legend

Hey Teddy,

Just curious, whats different with those other rules if you dont mind expanding on it further?

Andy

0 Kudos
Teddy_Brewski
Collaborator

Hi @the_rock , the syntax is exactly the same, it's just that the values are smaller:

fwaccel dos rate add destination range:xxx.xxx.xxx.x pkt-rate 500

fwaccel dos rate add destination range:xxx.xxx.xxx.x concurrent-conns 1000

I have a feeling it's linked to the nature of the traffic.  The web server has http/s ports opened and is extremely busy, and perhaps DDoS continuously as we speak.

The rules above are applied to all publicly exposed DNS authoritative servers to mitigate fast flood DNS attacks. Could it be that they are not under attack at the moment and that explains no memory raise?

0 Kudos
the_rock
Legend
Legend

Got it, makes sense, thanks a lot. Yes, I would agree as far as your question, most likely since those are not under attack. When I say those, Im referring to IP addresses in the rules.

Andy

0 Kudos
Teddy_Brewski
Collaborator

I think I will be able to confirm this very soon, since we're 'fast flooded' every 4-5 days.

0 Kudos
the_rock
Legend
Legend

Please keep us posted.

Andy

0 Kudos
Teddy_Brewski
Collaborator

I think my assumption is correct, it seems it affects all rules.  Actually it's even worse, since the memory doesn't seem to be released either.  From my last post, according to 'fwaccel dos stats get', there was an increase in SecureXL packets dropped due to the rate limit, so the current rules were indeed used:

Rate Limit: 1725575

According to cpview, the FW memory has increased too:

Capture.PNG

The counter never goes down. Since there are no ongoing attacks it flaps between 5,115 and 5,119, but never significantly lower.

Anyone using 'fwaccel dos' rules under R81.20?

 

0 Kudos
belteto
Participant
Participant

Hi Teddy!

In our case, there was a memory leak identified, the dynamic_balancing feature process(dsd) caused.
The next jumbo hotfix solved the issue.

There is no way to monitor the Fw memory directly. No specific OID assigned to that parameter.

The get the data via snmp, we created a custom oid which run a script that query the counters with the fw ctl pstat:

added in the /etc/snmp/userDefinedSettings.conf:

pass .1.3.555.1 /usr/local/bin/mem_pass.sh max

pass .1.3.555.2 /usr/local/bin/mem_pass.sh used

 

/usr/local/bin/mem_pass.sh 

#!/bin/bash

 

max=$(fw ctl pstat | grep Physical | awk '{print $9}')

used=$(fw ctl pstat | grep Physical | awk '{print substr($5,2)}')

 

if [[ $1 =~ max ]]

then

  echo .1.3.555.1.0

  echo integer

  echo $max

fi

 

if [[ $1 =~ used ]]

then

  echo .1.3.555.2.0

  echo integer

  echo $used

fi

 

 

Hope it helps!

Balint

Teddy_Brewski
Collaborator

Thanks a lot!

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events