Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Netadmin2020
Collaborator

High cpu utilization on RAD Process

Hello!

We just update to 80.40 and with latest hotfix. (appliance 15600) 

I observe with top command high CPU percentage on rad process.

I check and found that is related with URL cache. 

We firstly hit the limits of 20000 (I checked it with the below command)

fw tab -t urlf_cache_tbl -s

After upgrading to 80.40 we add more users through the checkpoint (all with URL filtering enabled)

I change the limits of rad service yesterday with GuiDBedit.

The cache max hash size from 20000(default) firstly tried to 40000.

Today I saw that we reached the 40000 limit and I set it to 70000.

we hit the below values right now:

 

 

url.PNG

 

What are you proposing about this situation ?

0 Kudos
20 Replies
Timothy_Hall
Champion
Champion

How many internal users do you have?  The table default of 20,000 entries assumes about 1,000 users with a "normal" level of web browsing activity. 

Also check your policy and make sure APCL/URLF is not getting matched against inbound web requests heading to a DMZ web server farm or something.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
Netadmin2020
Collaborator

My users generally will going to be about 5000. Does this mean that I should change the limit to 100000?

0 Kudos
Timothy_Hall
Champion
Champion

Occasionally having the cache fill up and get cleared is fine, it is when it happens constantly that it is a problem.  If you have it set to 70000 and are now hitting 55000 without it constantly hitting the limit and getting cleared, I'd leave it at 70000 for now.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
Netadmin2020
Collaborator

Thank you. Is normal the rad process going to 250% but not staying permanently ? 

0 Kudos
Timothy_Hall
Champion
Champion

A CPU spike in rad every now and then is fine.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
Netadmin2020
Collaborator

Today morning ....

 

0 Kudos
Chris_Atkinson
Employee Employee
Employee

As this is multi-threaded, you will see more RAD processes and CPU can exceed 100%. This is a normal behavior.

Source: sk163793

CCSM R77/R80/ELITE
Martin_Seeger
Collaborator

Yes, but RAD CPU usage topping the CPU usage of all virtual firewalls on a VSX?

0 Kudos
the_rock
Legend
Legend

Does sound like normal behavior to me, sorry. Personally, I had never seen something like this pre R80 version.

0 Kudos
Martin_Seeger
Collaborator

This seems to me absurdly high. I'm still looking into the source of the RAD load.

My impression:

a) Malware-Blade generates a lots of queries to the RAD

b) Cache too small, that causes the RAD too often to inquire externally.

But that is only my current impression... My gut feeling has been wrong before.

0 Kudos
Chris_Atkinson
Employee Employee
Employee

The operation of the process was changed in JHFs applied over versions of R80.20 and above. That's not to say there isn't an issue warranting investigation however.

CCSM R77/R80/ELITE
0 Kudos
Martin_Seeger
Collaborator

Yes, we guess that the problem started after applying JHF 219. But this is unsure as  we were slow to notice the problem. I need to rewrite our monitoring. Nobody was watching the RAD CPU usage ;-).

0 Kudos
Martin_Seeger
Collaborator

Do you know how R80.30 differs in regard to the RAD cache from R80.40?

With R80.30 the gateway (VSX) does not yield any result with the "fw tab -t urlf_cache_tbl -s" command: Failed to get table status for urlf_cache_tbl.

Since the update to the latest JHF (Take 219), the RAD CPU usage is through the roof. It often tops the CPU usage from all the virtual firewalls together. The CPU usage alone would not worry us, but we see in tcpdumps that firewall ist sometimes "keeping" HTTP connections "on hold" for 5 seconds.

With some optimisations in the policy (exceptions in Thread Prevention Policy) the RAD load has been reduced a bit and the 5s delays occur now less often.

My impression is that the cache is too small. I asked the support on how to increase the cache, but I did not get a reply for 5 days now.

Do DNS queries somehow end up in the RAD? RAD CPU usage seems to be the highest when we see a lot of DNS traffic.

Thanks, Martin

0 Kudos
Martin_Seeger
Collaborator

RAD statistics in cpview seems broken (see screenshot below).

If I do a "rad_admin stats print malware", I get the same statistic in the CSV file.

0 Kudos
Chris_Atkinson
Employee Employee
Employee

The URLF cache increase procedure is documented per sk90422

CCSM R77/R80/ELITE
the_rock
Legend
Legend

With all due respect, it would not be the first time (and Im sure it wont be last either) that CP sk is wrong.

0 Kudos
Martin_Seeger
Collaborator

Thanks, I have been looking for such an SK for quite some time now. This is really helpful. 

According to what I see, I rather need to increase the cache for malware instead of urlf, but that is easily adapted.

0 Kudos
the_rock
Legend
Legend

What value exactly did you change in guidbedit?

Andy

0 Kudos
Netadmin2020
Collaborator

In my case I adjust the url cache at 120000. I have rad spikes but with no issues.

0 Kudos
the_rock
Legend
Legend

Okay, thats fair...BUT, personally (and I cant really speak for anyone else), to me, thats more really masking an issue, unless it truly addressed the problem you had. I always found that TAC may ask customers to increase certain values in guidbedit without a true understanding of how to fix the issue more permanently.

Andy

0 Kudos
(2)

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events