- Products
- Learn
- Local User Groups
- Partners
- More
Introduction to Lakera:
Securing the AI Frontier!
Quantum Spark Management Unleashed!
Check Point Named Leader
2025 Gartner® Magic Quadrant™ for Hybrid Mesh Firewall
HTTPS Inspection
Help us to understand your needs better
CheckMates Go:
SharePoint CVEs and More!
Hey guys,
Happy holidays! I wanted to see if someone could provide some thoughts/suggestions on this. So our customer has 2 radius servers, onprem and Azure. All this works fine, BUT, for 2 years now and multiple TAC cases, we still cant solve failover problem.
Btw, management is S1C and gateways are 6400s, R81.20 jumbo 41 (the latest)
What I mean by that is that say if onprem is priority 1 and Azure is priority 2 and you shut down onprem server, one would think that Azure would take over, but no, auth requests still seem to go to onprem server, as we can clearly see by doing tcpdump on port 1812. By the way, same issue happens if Azure is main auth server. One way to quickly solve issue when it happens is simply change the priorities of the radius servers and then all works fine after installing policy.
Also tested with both servers as priority 1, no luck.
We even set global auth to radius, made sure generic object in legacy dashboard was set to radius and tried any, radius group that contains both servers, no luck in any of scenarios.
TAC confirmed more than once that config is right, so it truly begs a question...WHY does failover scenario not work? Im not sure if anyone out there is using 2 radius server, but if you are, PLEASE let us know how you made this work (if you did lol)
Thanks again for all the suggestions!
Best,
Andy
This is what TAC confirmed is fine (see below). Personally, I want to be positive this will make a difference, but based on having tried who knows how many different values in the last 2 years, I doubt it, but lets see...hopefully, it can be tested again next week.
Best,
Andy
@PhoneBoy Happy holidays mate! I wanted to pick your brain on this and see if you had any suggestions. Honestly, its nothing urgent, as issue has been there for more than 2 years now, so customer does not expect it to be fixed magically lol
Just wanted to see if you have anything on your mind that may help, thats all.
Happy New Year.
Best,
Andy
I believe it merely tries the RADIUS severs in priority order versus "failing over" to make one active or not.
At least that's how I remember this feature working back in the day.
Really? Hm, interesting...so I guess it sort of defeats the purpose then of having 2 radius servers for authentication. Any way to make it work with 2 of them in a group if say one is priority 1 and other is 2? We even tested the other night both as same priority and exact same issue.
Best,
Andy
It's supposed to try priority 1 first, then priority 2.
If it's not doing that, the TAC may need to investigate.
Have you tried recreating the Radius server group in SmartConsole, does the current group name include special characters?
Yes we did, while back actually and name is Radius_group, but there was time it was called simply Radius and made no difference either.
Best,
Andy
And to confirm when you say the Radius is shutdown they are not providing any response at all correct?
(Generally auth fail and timeout are not the same from a liveliness perspective)
Thats right, as if you shut down windows PC, not just rebooted it. Check out below, 192.168.x.x was the one that was shut down and 10.x.x.x is Azure one that was up and running.
Best,
Andy
[Expert@FW-1:0]# tcpdump -enni any host 192.168.32.210 and port 1812
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
20:46:20.795393 Out 00:1c:7f:a1:42:47 ethertype IPv4 (0x0800), length 116: 10.240.0.3.55059 > 192.168.32.210.1812: RADIUS, Access-Request (1), id: 0x29 length: 72
20:46:20.795396 Out 00:1c:7f:a1:42:47 ethertype 802.1Q (0x8100), length 120: vlan 20, p 0, ethertype IPv4, 10.240.0.3.55059 > 192.168.32.210.1812: RADIUS, Access-Request (1), id: 0x29 length: 72
20:46:25.795160 Out 00:1c:7f:a1:42:47 ethertype IPv4 (0x0800), length 116: 10.240.0.3.55059 > 192.168.32.210.1812: RADIUS, Access-Request (1), id: 0x29 length: 72
20:46:25.795163 Out 00:1c:7f:a1:42:47 ethertype 802.1Q (0x8100), length 120: vlan 20, p 0, ethertype IPv4, 10.240.0.3.55059 > 192.168.32.210.1812: RADIUS, Access-Request (1), id: 0x29 length: 72
^C
4 packets captured
16 packets received by filter
0 packets dropped by kernel
[Expert@FW-1:0]#
Azure radius was not responding, so I changed priority to 1 for both and tested
[Expert@FW-1:0]# tcpdump -enni any host 10.200.11.14 and port 1812
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
^C
0 packets captured
66 packets received by filter
2 packets dropped by kernel
[Expert@FW-1:0]#
Just verified now, group is called RadiusGroup, so no special characters anywhere.
Best,
Andy
Hey guys,
Thanks to everyone who helped and responded. We were able to finally get this working with help of TAC on remote session and below are settings in global properties for Radius that worked 100% when onprem radius was shut down, which is primary and auth worked flawlessly to azure one and then also when onprem was powered back on.
Customer was very happy its finally fixed after 2 years.
Thanks again!
Best,
Andy
Grateful to @SenpaiNoticed_U @Chris_Atkinson @PhoneBoy @mccabe for all the advice and guidance ✌️
🙂
All I will say is this...maybe there are not too many customers out there using 2 radius servers for authentication (just my educated guess), but for those who are, it would be nice to update sk you gave initially with the calculation that was mentioned in this post and what DTAC esc. guy gave us as well. That way, there is no guessing what those values should be...now we know thats why it took 2 years to fix this permanently.
Anyway, not a huge huge deal, considering we had workaround when it would happen, but still...just annoying : - )
Best,
Andy
This is what I was referring to @mccabe
Straight from TAC case by T3 guy in Dallas and after I read it few times, makes total sense to me. I really hope sk is updated with this info
Andy
*********************
120 seconds (for auth, radius_user_timeout)
2 re-attempt per server (radius_retrant_num)
40 Seconds total, for the whole auth attempt (radius_connect_timeout)
5 seconds per server (radius_retrant_timeout)
This gives the gateway 15 seconds to try the first RADIUS server (1 initial and 2 re-attempts at 5 seconds each) and then it will go to the second RADIUS server for 15 seconds (1 initial and 2 re-attempts at 5 seconds each) but the window to for all RADIUS server attempts is 40 seconds which will allow the gateway enough time for the 30 seconds it needs to reach out to the two RADIUS servers.
********************
I'll chase the owner internally, Andy, and ask for something to be added as an 'example', using what you had above. Many thanks for your persistence on this.
No worries mate, no rush. Its always the team effort, so thank you and other guys who helped, along with great help from TAC, of course.
Best,
Andy
User authentication to RADIUS server times out
Ironically enough, we followed that sk 2 years ago and when I mentioned that to esc. engineer, he told me specifically NOT to change value from 5 to 30 seconds, like st states. Anyway, issue is fixed now, thats all I really care about : - )
Best,
Andy
Leaderboard
Epsum factorial non deposit quid pro quo hic escorol.
User | Count |
---|---|
15 | |
12 | |
8 | |
6 | |
6 | |
6 | |
5 | |
5 | |
4 | |
3 |
Tue 30 Sep 2025 @ 08:00 AM (EDT)
Tips and Tricks 2025 #13: Strategic Cyber Assessments: How to Strengthen Your Security PostureTue 07 Oct 2025 @ 10:00 AM (CEST)
Cloud Architect Series: AI-Powered API Security with CloudGuard WAFTue 30 Sep 2025 @ 08:00 AM (EDT)
Tips and Tricks 2025 #13: Strategic Cyber Assessments: How to Strengthen Your Security PostureThu 09 Oct 2025 @ 10:00 AM (CEST)
CheckMates Live BeLux: Discover How to Stop Data Leaks in GenAI Tools: Live Demo You Can’t Miss!Wed 22 Oct 2025 @ 11:00 AM (EDT)
Firewall Uptime, Reimagined: How AIOps Simplifies Operations and Prevents OutagesAbout CheckMates
Learn Check Point
Advanced Learning
YOU DESERVE THE BEST SECURITY