Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
ak4020
Contributor

Halting RAD requests handling / RAD reached maximum allowed concurrent request

Hello everybody,
we have had a support case running for a long time, but wanted to ask if anyone else still has the problem? The bug has been chasing us since R80 / currently with R80.40. br alois

0 Kudos
Reply
13 Replies
Sven_Glock
Advisor

I am not sure if it is the same, but sk103422 was a solution for an issue I had with RAD.

0 Kudos
Reply
ak4020
Contributor

good day, many thanks for your answer - unfortunately that only works with r70 - the cache_size for url_filter / max_conn in guidbedit rad_services only works in r70 according to support and our experience. br alois

0 Kudos
Reply
Timothy_Hall
Champion
Champion

Please provide the exact error message you are seeing, as there have been a variety of issues with rad.  Most of the capacity issues seemed to get solved by this: sk163793: How to scale up requests/responses RAD handling rates

Make sure the DNS servers you have defined in Gaia on the gateway are speedy and working correctly by testing them with the nslookup command.  Having to wait for slow DNS service can cause issues with rad.

Also try running rad_admin stats on urlf then visiting the RAD screen of cpview which can be quite informative about how rad is working and give you hints about what issues it is having.  Don't forget to run rad_admin stats off urlf when done!  If rad is having issues, from a user perspective the most common impact to performance tends to be URLF due to its need to hold user connections and query the ThreatCloud, so I'd start there.

Finally, check for this corner case with URL filtering: sk90422: How to modify URL Filtering cache size?

 

"Max Capture: Know Your Packets" Video Series
now available at http://www.maxpowerfirewalls.com
0 Kudos
Reply
ak4020
Contributor

hi, URLF - is not activated at all, the problem is related to AntiBot and DNS Trap. Also an interesting info - the URLF sk works only with R70x, also the RAD connection in the regedit makes no improvement, think the problem is that we have a lot of dns requests and the rad handler is pushing the limit here - and also for one week the cp support does not find the setting. br alois

0 Kudos
Reply
Trevor_Bruss
Contributor

Any suggestions for a speedy DNS server? We are seeing similar problems with the RAD error message (description: Halting RAD requests handling; reason: RAD reached maximum allowed concurrent requests). The odd thing is I had over 22000 messages in my Inbox since 12:45 in the morning with an additional 2000 after deleting that batch in just one hour.

 

In either case, we have been trying to use 1.1.1.1 as a DNS server and in the past we used Google's 8.8.8.8. Not certain what their "caching" ability is. Our other option is to use our internal DNS servers, which ultimately (since these lookups for external hosts) will go through 1.1.1.1 and 8.8.8.8. I know our internal servers will cache those lookups for the TTL, but if the Google and 1.1.1.1 also cache information it just seems like extra traffic I'm sending in to the local DNS, to only have to go out the same firewall to do the external lookup.

 

Is there anything specifically that this needs to lookup in DNS? I was under the impression that RAD used a central Checkpoint service but perhaps I'm wrong. I'll need to do a deeper dive into RAD.

 

We're running a scalable platform Maestro, and I think I'll be opening a TAC case to get the hotfix mentioned in the SK you referenced.

0 Kudos
Reply
Timothy_Hall
Champion
Champion

Your firewall should be configured to query your internal DNS servers, who then should ideally be directly querying the root Internet DNS servers, and not going through a DNS forwarder at your ISP or something like 8.8.8.8.  Bottom line is to make sure the firewall itself is pointed at valid DNS servers that are not overloaded.

 

"Max Capture: Know Your Packets" Video Series
now available at http://www.maxpowerfirewalls.com
0 Kudos
Reply
Trevor_Bruss
Contributor

Thanks. Seems somewhat counter-intuitive given the fact that I send the DNS request in through the internal leg of the firewall, which then goes to the DNS server which then does a lookup (even to the root hints) which passes through the same firewall out to the Internet. 

 

In either case, I think the problem is related to something else at the moment and I'm getting TAC to look at it. My nslookups from our clustered Maestro units are failing. When I perform a ping to the internal network on our bond I get no response back. When I perform a ping to the external bond, I get no response back. Only get a response on a third bond. Oddly enough though is that traffic is routing through the firewall as I'm connected remotely via a VPN, and going through the firewall to access servers, etc. Very strange. However, I've had nothing but problems since we switched to using Maestro so this is no surprise that we're having issues again.

0 Kudos
Reply
Rui_Gomes_PT
Contributor

Hi,

Any news regarding this issue? 

0 Kudos
Reply
redcrow
Contributor

Hi guys,

same issue here. A lot of "RAD reached maximum allowed concurrent requests" messages. We can confirm that the responsible is the "Anti-Virus/Anti-Bot" protection. Indeed, when those blades are disabled, no error message appears. We also tried to follow @Timothy_Hall  suggestions by adding our internal DNS servers within "Internal DNS Servers" section of Malware DNS Trap item, as well as in Gaia OS, but we didn't have success.

Please see attached screenshots.

0 Kudos
Reply
Trevor_Bruss
Contributor

This is the response I got back when I had opened a case about seeing that message in our environment, and this has worked for me. To be honest, I did not see a huge change in memory as mentioned, but every case may be different. Again, for my environment I moved up to 1200 initially which helped but then needed to increase that to 1500 about two months later. I haven't changed it since and I haven't seen that message for over three months since the last change.
Depending of how many users, we will need to extend the maximum number of flows RAD can open. This will need to be an iterative process, as increasing the number of flows RAD is allowed to handle will cause RAD to use an equal amount more RAM. 

To make the increase, modify the value of the following line in $FWDIR/conf/rad_conf.C:

    :max_flows (1000)

Begin by increasing this value by 100 or 200, install policy and restart RAD as done previously with "rad_admin stop ; rad_admin start". Pay close attention to the gateway's memory usage, as RAD's maximum memory usage will increase to scale with how many flows it can handle at once.

0 Kudos
Reply
redcrow
Contributor

Hello @Trevor_Bruss , thanks for sharing your solution.

I've checked my $FWDIR/conf/rad_conf.C and this is its contents:

(
:urlfs_service_check_seconds (7200)
:amws_service_check_seconds (1800)
:cpu_cores_as_number_of_threads (false)
:number_of_threads (0)
:threads_to_cores_ratio (0.334)
:number_of_threads_fast_response (0)
:number_of_threads_slow_response (0)
:queue_max_capacity (2000)
:debug_traffic (false)
:use_dns_cache (true)
:dns_cache_timeout_sec (2)
:use_ssl_cache (true)
:cert_file_name ("ca-bundle.crt")
:cert_type ("CRT")
:ssl_version ("TLSv1_0")
:ciphers ("TLSv1")
:autodebug (true)
:log_timeouts (false)
:log_errors (true)
:number_of_reports (512)
:max_repository_multiplier (20)
:flow_timeout (6)
:excessive_flow_timeout (300)
:transfer_timeout_sec (15)
:max_flows (1000)
:max_pc_in_reply (0)
:retry_mechanism_on (false)
:max_retries (25)
:retry_peroid_mins (15)

)

So, I'll try to follow your solution by increasing :max_flows (1000). Anyway, is it normal such a file is only contained in VS 0? We have a VSX configuration and I discovered that file is not contained in VS 4 (i.e., the virtual system having issues with RAD).

[Expert@lntfw-pgtw2:0]# ls /opt/CPsuite-R80.40/fw1/conf/rad_conf.C
/opt/CPsuite-R80.40/fw1/conf/rad_conf.C
[Expert@lntfw-pgtw2:0]# vsenv 4
Context is set to Virtual Device lntfw-pVSX1_Frontiera (ID 4).
[Expert@lntfw-pgtw2:4]# ls /opt/CPsuite-R80.40/fw1/CTX/CTX00004/conf/rad_*
/opt/CPsuite-R80.40/fw1/CTX/CTX00004/conf/rad_cloud_settings.C
[Expert@lntfw-pgtw2:4]#

0 Kudos
Reply
Djelo_Arnautali
Participant

In the VSX environment RAD is working under the VS0 only...all the request are sent from VS0. 

0 Kudos
Reply
redcrow
Contributor

Unfortunately that did not solve my issue. I asked official support. Following the solution in my case:

Open GuiDBedit > Other > rad_services > malware_rad_service_0 -> change the table cache size to (100-200-300k as needed) . They suggest starting with 150k as all environments are individual and you will have to pick the value not too high and not too low.

0 Kudos
Reply