Solved: R80.30 to R81.10 - Endpoint client fails to negoti...

Howard_Gyton · ‎2022-08-12

We are part way through a firewall migration from R80.30 to R81.10.

Both new boxes are built, the passive firewall has been turned off, and the configuration imported to the new R81.10 passive firewall.

Failover of the cluster has worked flawlessly, and we have conducted tests, and everything appears to work with one exception.

Endpoint clients refuse to VPN connect. I have tried an E86.00, E86.30, and E86.60 client, and none of them work, with the eventual error being that it could could not negotiate a connection. SNX, and Capsule VPN, on both Windows 11, and iPhone, work just fine. Even the really old CLI-enabled SNX copy we have works fine.

Restarting services on the primary, failing back to the R80.30 box, and all VPN services are fully restored.

Has anyone else experienced this?

Howard

Howard_Gyton · ‎2022-08-22

Issue was eventually resolved with a no-NAT rule for the IP address the VPN clients connect to, this information obtained from our support partner.

It seems that our R80.30 firewalls were affected by this to a degree, and two responses were sent to the client. One from the hide-NAT rule, and one from the IP address the responses should come from. The upshot was that all Endpoint clients were using Visitor mode rather than NAT-T.

On the R81.10 firewalls this didn't seem to work correctly, and instead of working in Visitor mode they disconnected after a few seconds.

A no-NAT rule sitting just above the existing hide NAT rule allowed the Endpoint clients to connect, and NAT-T connections could be seen with "vpn tu tlist".

Both firewall have been replaced with new R81.10 boxes, and everything is working fine. I understand a fix for this behaviour should be included in a forthcoming JHF.

View solution in original post

Chris_Atkinson · ‎2022-08-12

Which Jumbo was applied to the R81.10 gateways?

CCSM R77/R80/ELITE

Howard_Gyton · ‎2022-08-12

66

Chris_Atkinson · ‎2022-08-12

Any corresponding drop logs in SmartConsole or symptoms that align to sk175704?

CCSM R77/R80/ELITE

Howard_Gyton · ‎2022-08-12

Sadly nothing like that. You could see our accounts were authenticated via RADIUS, and the connection was successful. Then it would hang for a few minutes, then the response box would come up again. It did that perhaps three times before finally giving up stating that it could not negotiate a connection.

I just had another look over the SK. So while we didn't see any dropped traffic in the logs we do have a mixture of VPN rules.

We have a new VPN layer rule, using Access Roles, and have been gradually migrating services from the Legacy rules above.

The one thing I would add is this was not an in-place upgrade from R80.30, which that SK seems to imply. This was a new R81.10 server, where we have used the process to import the exported configuration on to that new box, so it has never had R80.30 on it.

set clienv on-failure continue
load configuration fw1_new-hardware
set clienv on-failure stop
save config

Similarly, the R81.10 management server was freshly built, using an exported database from the old R80.30 management server.

the_rock · ‎2022-08-12

I had seen customers have this issue before and in my experience, it was ALWAYS caused by some custom file either on mgmt or gateways that had to do with vpn config. So, either trac.config, or trac_client_1.ttm file in most cases. Not sure if you guys have that configured, but throwing it out there. As @Chris_Atkinson mentioned, any other relevant logs you can find? Do you know if anyone generated any logs from the VPN client?

Best,
Andy
"Have a great day and if its not, change it"

Howard_Gyton · ‎2022-08-12

That was one of the things our support partner suggesting checking prior to migration, and they found no alterations to the "trac_client_1.ttm" file.

the_rock · ‎2022-08-12

What you could do is cd $FWDIR/conf on both mgmt and gateways and do ls -lh trac*

This will show you if there is more than original trac_client_1.ttm file, so it would tell us 100% what is used.

Best,
Andy
"Have a great day and if its not, change it"

Howard_Gyton · ‎2022-08-12

Good idea. For good measure I ran this on the management server, and both the R80.30, and R81.10 firewalls.

ls -lh trac*
-rwxr-xr-x 1 admin bin 7.2K Jun 13 10:47 trac_client_1.ttm

ls -lh trac*
-rw-r----- 1 admin bin 7.2K Sep 24 2019 trac_client_1.ttm

ls -lh trac*
-rw-r----- 1 admin bin 7.2K Jun 30 2021 trac_client_1.ttm

the_rock · ‎2022-08-12

Check the content of it on mgmt server.

Best,
Andy
"Have a great day and if its not, change it"

Howard_Gyton · ‎2022-08-12

I can't see anything that stands out, that might be something we added. I was going to post its contents here, but it's a long file.

I copied the file from both the management server, and the R81.10 server, and then used Notepad++ to compare them

I ran a further compare with the file from the R80.30 server, and that was identical too.

Ilya_Yusupov · ‎2022-08-16

Hi @Howard_Gyton ,

you mention access role but did you enable Remote Access check box under identity Awareness configuration?

if you didn't check this box, can you please enable it and try re-test?

Howard_Gyton · ‎2022-08-16

Yes, we have that set. If we didn't it wouldn't work on our R80.30 firewall, where normal functionality is seen for all features.

As I may have mentioned, we believe the failure is in Phase 2 of the negotiation. Last night we ran some further tests, and played around with the encryption, and data integrity settings. Briefly we turned everything on, including DES/3DES, and even then the Endpoint clients wouldn't connect to the R81.10 firewall, but work quite happily with the R80.30 firewall. We of course changed those settings back.

But again, SNX, and Capsule VPN remain unaffected, and could connect to the R81.10 firewall, even if I set my phone to use IPSec rather than SSL.

Ilya_Yusupov · ‎2022-08-16

@Howard_Gyton ,

ok, i though you configured access role only in R81.10, miss understood the story 🙂

I will contact you offline via email and we will continue it from there.

Thanks,

Ilya

Juan_ · ‎2022-08-15

What are your link selection settings?

Howard_Gyton · ‎2022-08-15

"Selected address from topology table" VIP selected

"Operating system routing table"

Tracking=None

Howard_Gyton · ‎2022-08-22

Issue was eventually resolved with a no-NAT rule for the IP address the VPN clients connect to, this information obtained from our support partner.

It seems that our R80.30 firewalls were affected by this to a degree, and two responses were sent to the client. One from the hide-NAT rule, and one from the IP address the responses should come from. The upshot was that all Endpoint clients were using Visitor mode rather than NAT-T.

On the R81.10 firewalls this didn't seem to work correctly, and instead of working in Visitor mode they disconnected after a few seconds.

A no-NAT rule sitting just above the existing hide NAT rule allowed the Endpoint clients to connect, and NAT-T connections could be seen with "vpn tu tlist".

Both firewall have been replaced with new R81.10 boxes, and everything is working fine. I understand a fix for this behaviour should be included in a forthcoming JHF.

Are you a member of CheckMates?

R80.30 to R81.10 - Endpoint client fails to negotiate with the R81.10 box