- Products
- Learn
- Local User Groups
- Partners
- More
Access Control and Threat Prevention Best Practices
5 November @ 5pm CET / 11am ET
Ask Check Point Threat Intelligence Anything!
October 28th, 9am ET / 3pm CET
Check Point Named Leader
2025 Gartner® Magic Quadrant™ for Hybrid Mesh Firewall
HTTPS Inspection
Help us to understand your needs better
CheckMates Go:
Spark Management Portal and More!
Hey guys,
Happy holidays! I wanted to see if someone could provide some thoughts/suggestions on this. So our customer has 2 radius servers, onprem and Azure. All this works fine, BUT, for 2 years now and multiple TAC cases, we still cant solve failover problem.
Btw, management is S1C and gateways are 6400s, R81.20 jumbo 41 (the latest)
What I mean by that is that say if onprem is priority 1 and Azure is priority 2 and you shut down onprem server, one would think that Azure would take over, but no, auth requests still seem to go to onprem server, as we can clearly see by doing tcpdump on port 1812. By the way, same issue happens if Azure is main auth server. One way to quickly solve issue when it happens is simply change the priorities of the radius servers and then all works fine after installing policy.
Also tested with both servers as priority 1, no luck.
We even set global auth to radius, made sure generic object in legacy dashboard was set to radius and tried any, radius group that contains both servers, no luck in any of scenarios.
TAC confirmed more than once that config is right, so it truly begs a question...WHY does failover scenario not work? Im not sure if anyone out there is using 2 radius server, but if you are, PLEASE let us know how you made this work (if you did lol)
Thanks again for all the suggestions!
Best,
Andy
Hey guys,
Thanks to everyone who helped and responded. We were able to finally get this working with help of TAC on remote session and below are settings in global properties for Radius that worked 100% when onprem radius was shut down, which is primary and auth worked flawlessly to azure one and then also when onprem was powered back on.
Customer was very happy its finally fixed after 2 years.
Thanks again!
Best,
Andy
Grateful to @SenpaiNoticed_U @Chris_Atkinson @PhoneBoy @mccabe for all the advice and guidance ✌️
User authentication to RADIUS server times out
Hi Andy,
I'm not sure from your post, but have you tweaked the settings for "radius_retrant_num" and "radius_retrant_timeout" as yet?
There's a long-standing SK here:
https://support.checkpoint.com/results/sk/sk42449
@mccabe Thanks for the reply. As a matter of fact, that was one of very first thing we did and did not change anything, TAC was even on the phone when it was done.
Best,
Andy
Just for the reference, we even tried both values to 1, but below are values TAC asked us to configure, exact same issue, no change.
Best,
Andy
Try with Radius_connect_timeout at 20 seconds
keep the rest of the settings the same.
Let me know what the results are.
We did that long time ago and absolutely made no difference. We tried, 5,10,15,20,30 and so on, exact same issue.
Best,
Andy
What about in the Iked files?
do we see it stopping and claiming all servers are down in the IKED due to timeout?
What I read that configuration as is
Attempt each server 2 times,
for 5 seconds between attempts
10 seconds for the whole authentication attempt before claiming all servers are down.
Thus never getting enough time to attempt a 2nd server due to the 1st server taking up (5+5) = 10 seconds
Absolutely nothing...TAC asked us for literally every log file you can imagine before and there was no solution. I think what @PhoneBoy said would explain why this does not work, but in all honesty, I find it shocking, because in my mind, it totally defeats the purpose of even having 2 radius servers at all.
Best,
Andy
Seems to work with my Test Lab,
Radius servers in a group
Radius priority 1 = 10.250.250.1
Radius priority 2 = 10.150.150.2
Here are my settings in Global Properties
Here is the auth page example
I hear ya, lots of things work in my lab too that dont work in production lol
Andy
Then I suggest to work on your open TAC case, and showcase the issue, the traffic, and provide debugs/captures.
Thats the plan, yea. But, if you are willing to attach a word doc with screenshots of your radius lab servers and global properties settings, I am happy to suggest those to the customer next time they approve the maintenance window for this.
Best,
Andy
Just to clarify are you authenticating VPN users or SmartConsole admins etc?
At this stage going deeper with TAC seems the likely path...
VPN users Chris, correct.
Andy
What priorities did you give those 2 radius servers?
Andy
I posted my server priority in my previous post.
Radius servers in a group
Radius priority 1 = 10.250.250.1
Radius priority 2 = 10.150.150.2
it is Unsupported to have 2 Radius servers in a group with the same priority.
recommended different priorities
Yea, sorry about that, I noticed it right after I responded, my bad. Well, not sure what else to say, because thats exctly how we had it too, no difference. Btw, Azure radius server works fine, there are no issues with it, it is pingable and 100% reachable from both cluster members via BGP xpress route.
Best,
Andy
Any of the Radius servers over a VPN tunnel?
or are they both reachable without VPN.
If VPN tunnel involved, Verify that you have allowed Radius traffic to not be controlled by Implied Rules before VPN traffic is considered. meaning, to disable the implied rules for Radius traffic and make a policy rule accept and allow Radius traffic.
Glad you asked, thats 100% valid question and totally relevant in this case. Answer is no, neither Radius server communicates over VPN, as mentioned before the Azure one is via xpress route and onprem is reachable from their office.
Best,
Andy
See, another super challenging part here is that we obviously cant expect TAC or even ask them to try replicate this, because it involves Azure radius server. So as much as its greatly appreciated you also tested this in the lab and Im happy it worked for you, but it does not sadly represent true config customer uses.
Anyway, I reached out offline to Ilya Yusupov, as he helped us big time last year for the same customer with ISP redundancy script, dont think we would have solved that issue without his help in some time. Lets see what he finds out...
Best,
Andy
Its worth mentioning that we also changed generic object in legacy dashboard under auth to use radius group, rather than one server, that was there since the beginning, but it was exact same behavior. I was really hopeful that would make a difference, but sadly no.
Best,
Andy
I ran a Test using your settings for Radius Global properties and yes it only attempted the Priority 1 server, and not the 2nd.
due to the Settings not allowing it more time to attempt.
Your settings based on your screenshot
Thus, you only allowed your user:
Thus meaning you only allow the radius attempts to be for a total of 10 seconds = 2 attempts of 5 seconds per server attempt
1st server gets 2 attempts, 5 seconds each = 10 seconds
10 seconds being the max, so your auth attempt ends.
if you want to reach the other servers, you need to adjust your timers to allow enough time to reach to all servers and all attempts.
For 2 servers at 2 attempts each with 5 seconds.
I would recommends 25 seconds for radius_connect_timeout
As mentioned yesterday, we tried multiple settings there and it made no difference at all.
TAC was even on the phone when we did it before.
Best,
Andy
I am looking at your active case ending in 6-000xxxx753,
You have not provided debugs or a active session for TAC to review the behavior.
What is your plan for next steps?
I have Labed out both my settings and your settings, and proving it to work as expected.
I would have to advise to arrange a meeting or collect debugs per the TAC case request.
If needed, your case owner can arrange a session with you and me so that I can show case my lab set up.
Thats right, as customer has to approve maintenance window for this, so proper troubleshooting can be done.
You are welcome to attach your lab setup via word doc, just take relevant screenshots, thats what I always do.
Best,
Andy
I emailed T3 guy in DTAC, lets see if he can do quick zoom remote today, so I can show him the config. Here is the way I look at all this...just me personally, but in my mind, I dont think it makes any difference what those timeout values are, because at the end of the day, that would simply prolong OR shorten time it takes for things to fail or time out. The main issue here is that failover to working radius server never happens, but logic would indicate that it simply should happen without any issues.
UNLESS as @PhoneBoy indicated, this works only in a way that it takes into consideration whichever server has higher priority, but then again, that would totally defeats the purpose of even having 2 Radius servers for auth to begin with.
Anyway, lets see what TAC comes back.
Best,
Andy
For what its worth, we even had DTAC escalation guy tell us to set radius_retrant_timeout to 5 seconds and issue remianed the same.
Best,
Andy
I spoke with T3 guy Andrew and his esc. buddy Zack from DTAC on the case and they asked us to change below values as per screenshot, which I did, so lets see if it helps on the next maintenance window. Considering we changed these values who knows how many times, I want to be positive it will make a difference, so lets see : - )
Best,
Andy
I would not do radius_retrant_timout for 15 seconds if you have Radius_connect_timeout as 40 with the amount of servers and server attempts you have set.
I would do this.
This would give each server 3 attempts of communication, each 5 seconds apart.
meaning server 1 would get 15 seconds of attempt time, before moving on to the 2nd server.
2nd server would get its 3 attempts over another 15 seconds.
Totaling 30 seconds out of the 40 seconds that is permitted (radius_connect_timeout).
so you would see a tcpdump as this if both servers are failing
in seconds
00s source >>> destination_server1
05s source >>> destination_server1
10s source >>> destination_server1
15s source >>> destination_server2
20s source >>> destination_server2
25s source >>> destination_server2
note that 5 seconds per server may need adjusting based on your environment needs and you have to adjust as needed
Follow this train of thought:
Number of Radius servers + (1+radius_retrant_num) + radius_retrant_timeout = radius_connect_timeout +10 extra seconds
Example
2 servers + (1+2) + 5 = X +10 extra seconds
2 + (3) + 5 = X +10 extra seconds
X = 30 + 10 extra seconds
radius_connect_timeout = 40
*note: (radius_retrant_num)
you can set this to zero, and the gateway will still attempt once,
(radius_retrant_num) is more a value for Re-attempts so its 1 + # of retries
Leaderboard
Epsum factorial non deposit quid pro quo hic escorol.
User | Count |
---|---|
23 | |
16 | |
12 | |
9 | |
8 | |
8 | |
7 | |
7 | |
7 | |
5 |
Tue 28 Oct 2025 @ 11:00 AM (EDT)
Under the Hood: CloudGuard Network Security for Google Cloud Network Security Integration - OverviewTue 28 Oct 2025 @ 12:30 PM (EDT)
Check Point & AWS Virtual Immersion Day: Web App ProtectionTue 28 Oct 2025 @ 11:00 AM (EDT)
Under the Hood: CloudGuard Network Security for Google Cloud Network Security Integration - OverviewTue 28 Oct 2025 @ 12:30 PM (EDT)
Check Point & AWS Virtual Immersion Day: Web App ProtectionThu 30 Oct 2025 @ 03:00 PM (CET)
Cloud Security Under Siege: Critical Insights from the 2025 Security Landscape - EMEAThu 30 Oct 2025 @ 02:00 PM (EDT)
Cloud Security Under Siege: Critical Insights from the 2025 Security Landscape - AMERAbout CheckMates
Learn Check Point
Advanced Learning
YOU DESERVE THE BEST SECURITY