- CheckMates
- :
- Products
- :
- Quantum
- :
- Security Gateways
- :
- Re: Radius auth failover issue
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Are you a member of CheckMates?
×- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Radius auth failover issue
Hey guys,
Happy holidays! I wanted to see if someone could provide some thoughts/suggestions on this. So our customer has 2 radius servers, onprem and Azure. All this works fine, BUT, for 2 years now and multiple TAC cases, we still cant solve failover problem.
Btw, management is S1C and gateways are 6400s, R81.20 jumbo 41 (the latest)
What I mean by that is that say if onprem is priority 1 and Azure is priority 2 and you shut down onprem server, one would think that Azure would take over, but no, auth requests still seem to go to onprem server, as we can clearly see by doing tcpdump on port 1812. By the way, same issue happens if Azure is main auth server. One way to quickly solve issue when it happens is simply change the priorities of the radius servers and then all works fine after installing policy.
Also tested with both servers as priority 1, no luck.
We even set global auth to radius, made sure generic object in legacy dashboard was set to radius and tried any, radius group that contains both servers, no luck in any of scenarios.
TAC confirmed more than once that config is right, so it truly begs a question...WHY does failover scenario not work? Im not sure if anyone out there is using 2 radius server, but if you are, PLEASE let us know how you made this work (if you did lol)
Thanks again for all the suggestions!
Best,
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is what TAC confirmed is fine (see below). Personally, I want to be positive this will make a difference, but based on having tried who knows how many different values in the last 2 years, I doubt it, but lets see...hopefully, it can be tested again next week.
Best,
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@PhoneBoy Happy holidays mate! I wanted to pick your brain on this and see if you had any suggestions. Honestly, its nothing urgent, as issue has been there for more than 2 years now, so customer does not expect it to be fixed magically lol
Just wanted to see if you have anything on your mind that may help, thats all.
Happy New Year.
Best,
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I believe it merely tries the RADIUS severs in priority order versus "failing over" to make one active or not.
At least that's how I remember this feature working back in the day.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Really? Hm, interesting...so I guess it sort of defeats the purpose then of having 2 radius servers for authentication. Any way to make it work with 2 of them in a group if say one is priority 1 and other is 2? We even tested the other night both as same priority and exact same issue.
Best,
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It's supposed to try priority 1 first, then priority 2.
If it's not doing that, the TAC may need to investigate.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Have you tried recreating the Radius server group in SmartConsole, does the current group name include special characters?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes we did, while back actually and name is Radius_group, but there was time it was called simply Radius and made no difference either.
Best,
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
And to confirm when you say the Radius is shutdown they are not providing any response at all correct?
(Generally auth fail and timeout are not the same from a liveliness perspective)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thats right, as if you shut down windows PC, not just rebooted it. Check out below, 192.168.x.x was the one that was shut down and 10.x.x.x is Azure one that was up and running.
Best,
Andy
[Expert@FW-1:0]# tcpdump -enni any host 192.168.32.210 and port 1812
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
20:46:20.795393 Out 00:1c:7f:a1:42:47 ethertype IPv4 (0x0800), length 116: 10.240.0.3.55059 > 192.168.32.210.1812: RADIUS, Access-Request (1), id: 0x29 length: 72
20:46:20.795396 Out 00:1c:7f:a1:42:47 ethertype 802.1Q (0x8100), length 120: vlan 20, p 0, ethertype IPv4, 10.240.0.3.55059 > 192.168.32.210.1812: RADIUS, Access-Request (1), id: 0x29 length: 72
20:46:25.795160 Out 00:1c:7f:a1:42:47 ethertype IPv4 (0x0800), length 116: 10.240.0.3.55059 > 192.168.32.210.1812: RADIUS, Access-Request (1), id: 0x29 length: 72
20:46:25.795163 Out 00:1c:7f:a1:42:47 ethertype 802.1Q (0x8100), length 120: vlan 20, p 0, ethertype IPv4, 10.240.0.3.55059 > 192.168.32.210.1812: RADIUS, Access-Request (1), id: 0x29 length: 72
^C
4 packets captured
16 packets received by filter
0 packets dropped by kernel
[Expert@FW-1:0]#
Azure radius was not responding, so I changed priority to 1 for both and tested
[Expert@FW-1:0]# tcpdump -enni any host 10.200.11.14 and port 1812
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
^C
0 packets captured
66 packets received by filter
2 packets dropped by kernel
[Expert@FW-1:0]#
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Just verified now, group is called RadiusGroup, so no special characters anywhere.
Best,
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey guys,
Thanks to everyone who helped and responded. We were able to finally get this working with help of TAC on remote session and below are settings in global properties for Radius that worked 100% when onprem radius was shut down, which is primary and auth worked flawlessly to azure one and then also when onprem was powered back on.
Customer was very happy its finally fixed after 2 years.
Thanks again!
Best,
Andy
Grateful to @SenpaiNoticed_U @Chris_Atkinson @PhoneBoy @mccabe for all the advice and guidance ✌️
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
All I will say is this...maybe there are not too many customers out there using 2 radius servers for authentication (just my educated guess), but for those who are, it would be nice to update sk you gave initially with the calculation that was mentioned in this post and what DTAC esc. guy gave us as well. That way, there is no guessing what those values should be...now we know thats why it took 2 years to fix this permanently.
Anyway, not a huge huge deal, considering we had workaround when it would happen, but still...just annoying : - )
Best,
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is what I was referring to @mccabe
Straight from TAC case by T3 guy in Dallas and after I read it few times, makes total sense to me. I really hope sk is updated with this info
Andy
*********************
120 seconds (for auth, radius_user_timeout)
2 re-attempt per server (radius_retrant_num)
40 Seconds total, for the whole auth attempt (radius_connect_timeout)
5 seconds per server (radius_retrant_timeout)
This gives the gateway 15 seconds to try the first RADIUS server (1 initial and 2 re-attempts at 5 seconds each) and then it will go to the second RADIUS server for 15 seconds (1 initial and 2 re-attempts at 5 seconds each) but the window to for all RADIUS server attempts is 40 seconds which will allow the gateway enough time for the 30 seconds it needs to reach out to the two RADIUS servers.
********************
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'll chase the owner internally, Andy, and ask for something to be added as an 'example', using what you had above. Many thanks for your persistence on this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No worries mate, no rush. Its always the team effort, so thank you and other guys who helped, along with great help from TAC, of course.
Best,
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
User authentication to RADIUS server times out
Jozko Mrkvicka
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ironically enough, we followed that sk 2 years ago and when I mentioned that to esc. engineer, he told me specifically NOT to change value from 5 to 30 seconds, like st states. Anyway, issue is fixed now, thats all I really care about : - )
Best,
Andy

- « Previous
-
- 1
- 2
- Next »