Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
the_rock
Legend
Legend
Jump to solution

CP to Azure S2S vpn issue

Hey guys,

I hope someone might be able to shed some light into this situation, as I find it very peculiar. So, customer has domain based vpn between cp and azure and tunnel works fine, BUT, here is the issue. So, azure subnet is 10.18.0.0/16 and there is one host in that subnet that no matter what we do, logs show its going through the tunnel, though random one shows it being dropped or going out clear (randomly), but the page to access it never does come up, like it should. 

All the other hosts/services work fine.

Now, customer did have Azure case, they did bunch of checks and determined its not the problem on their end. I, together with the customer, did bunch of captures, checked the logs, we even added that host IP into enc domain, reset the tunnel, set tunnel management per gateway as a test, no dice.

I dont sadly have the actual log at the moment (can get it from the client), but captures when we run them show traffic comes to internal interface and thats it, nothing else, which is super odd, because say host 10.18.0.80 or .85 are fine, but .81 never works. Now, I know logically it would indicate issue with the host, but MS support verified 100% that is not the case.

I had client do basic vpn debugs on cp side, will review them myself, but just wondering if anyone may have any insight/suggestions we could try. I cant possible think of anything else myself that we had not tested.

Thanks as always.

Andy

0 Kudos
2 Solutions

Accepted Solutions
Timothy_Hall
Legend Legend
Legend

IKEv2 tunnel narrowing most definitely caused VPN interoperability problems at one point and is a good area to investigate, but to my knowledge all those bugs were fixed long ago.  That could explain why some traffic makes it and some doesn't if the problematic traffic was outside where the tunnel was narrowed by the responder, and the initiator didn't realize it or didn't process the narrowing properly.  I assume you have tried initiating the tunnel separately from each peer as the initiator and the problematic behavior remains no matter the original initiator of the tunnel?  If the problem only happens when a certain side is the initiator, that could well be a narrowing or subnets/Proxy-IDs issue.

A bit unlikely that intervening NAT is causing the issue, as NAT-T should start automatically.  I assume there is no NAT occurring somewhere between the VPN peers?  Usually NAT either exists between the peers or it doesn't, this won't normally change over time unless there are different ISPs in use, or the network path upstream of the peers is changing without your knowledge.

Seems unlikely but as far as fragmentation issues, are you allowing Type 3 Code 4 ICMP datagrams inbound to your gateway from any source on the Internet?  If you are not, Path MTU discovery will not work correctly in the event of a sub-1500 MTU occurring in the network path.  Once again, the low MTU may be intermittently occurring if the upstream path is changing without your knowledge, or you are unexpectedly failing over between multiple ISPs, or better yet experiencing traffic asymmetry between them.  If there is more than one ISP in use for this customer, completely disable one and fully test each ISP path one at a time.  Once again really seems unlikely that frags/MTU would affect VPN connectivity to only one Azure host IP within the tunnel.

Your earlier message about seeing some of these packets being dropped for ssl inspection caught my eye, if HTTPS Inspection is in use on the gateway make sure you are not specifying "Any" in any of the Destination and Service columns of all your HTTPS Inspection policy rules.  Traffic entering a VPN is not supposed to be inspected by HTTPS Inspection anyway, even though it is leaving on the interface that matches dynamic object "Internet".  Might be interesting to add an IP-based rule at the very top of your HTTPS Inspection policy matching all destination IPs accessible through the VPN tunnel with an action of Bypass and source Any, category Any just to make sure it is finding a bypass during the first pass on the HTTPS Inspection policy, and not being improperly pulled into active streaming for the second pass.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com

View solution in original post

(1)
the_rock
Legend
Legend

@Timothy_Hall My greatest appreciation for such an amazing response. We verified everything again and though we did not see any issues with things I mentioned, we revisited ssl inspection again and though there were no logs indicating an issue, we decided, why not, lets disable it as a test, try and BAM, it worked!

We then re-enabled it, made a bypass rule and all good now, customer was super happy, along with me and also great TAC engineer from Dallas.

I think friday the 13th was lucky for us, haha

Also grateful to everyone else  that helped

@G_W_Albrecht @AmirArama @Lesley @Duane_Toler 

Have a fantastic weekend everyone.

Best,

Andy

View solution in original post

0 Kudos
32 Replies
G_W_Albrecht
Legend Legend
Legend

Did you contact CP TAC yet ?

CCSP - CCSE / CCTE / CTPS / CCME / CCSM Elite / SMB Specialist
0 Kudos
the_rock
Legend
Legend

Not yet, as I want to review vpn debugs myself first.

Andy

0 Kudos
the_rock
Legend
Legend

Just to update on this...customer will try change the IP of the problematic host to see if that helps, but if not, they will send me the vpn debugs and will review. Honestly, Im not sure this even really qualifies for TAC case, though logs clearly show when issue is there that traffic does NOT go through VPN tunnel and Azure support is adamant its not problem on their end.

Anywho, lets see what gives 🙂

Andy

0 Kudos
Timothy_Hall
Legend Legend
Legend

Do you have "disable NAT in VPN community" set?  Almost sounds like you have a NAT of some kind just for that .81 address which would allow the traffic to enter the tunnel but then get dropped on the other end.  If the destination IP is getting NATted that could be why the traffic seems to disappear in your capture after the inbound.

I assume there is no Windows Firewall on .81?  If the traffic can be verified to be entering the tunnel properly on your side, you may need a packet capture on the .81 host to confirm the traffic is actually getting there.  Had many a troubleshooting session where the traffic is going into the tunnel properly and the other end insists it is decrypting and reaching the endpoint on their side...but it isn't due to a VPN config/policy or routing issue.  Until you do that packet capture they will just blame you 🙂

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
(1)
the_rock
Legend
Legend

Yep, we do have nat disabled. I even had them create manual no nat rule for that IP, no luck. Funny enough, when we do captures, randomly it shows going through the tunnel, but even then page never comes up.

Let me see if their dedicated Azure expert can change the ip of that host and see what happens.

Andy

0 Kudos
the_rock
Legend
Legend

Btw, just confirmed, no windows firewall. Let me review the logs and see what we can find.

Andy

0 Kudos
the_rock
Legend
Legend

Just spoke with customer. They decided to install jumbo 89 on their mgmt and cluster, so will let me know this weekend if that changes anything. I secretly hope it fixes the issue, but lets see.

If not, they will open TAC case next week and will provide an update.

Thanks!

Andy

0 Kudos
AmirArama
Employee
Employee

Hello,

 

this kind of issues needs cooperation from both sides.

from your (CP) side, you can only run:
vpn tu conn & fw monitor on the connection 5 tupple, and tcpdump on the ESP/NAT-T packets, but someone needs to run traffic capture on Azure side to see if the traffic reached the other sides, and if so, what it happening with it? does it reach the host? (if you manage this host, you can install wireshark over there and see for yourself), does the host respond or not?

the_rock
Legend
Legend

I totally agree. Let me follow up with customer to see if jumbo 89 made any difference.

Andy

0 Kudos
the_rock
Legend
Legend

Hey guys,

Just to give quick update on this, spoke with customer, they tried changing the IP on Azure host side, no luck. They wull install jumbo 89 this Saturday on the cluster and test. I would personally be shocked if that fixes anything, but lets hope for the best 🙂

Anyway, if no change, they will open TAC case.

Andy

0 Kudos
Lesley
Leader Leader
Leader

"but captures when we run them show traffic comes to internal interface and thats it, nothing else, which is super odd, because say host 10.18.0.80 or .85 are fine, but .81 never works."

if this capture is tcpdump it makes sense because you see the data incoming on LAN interface unecrypted. Then it would be send out on WAN interface and you will see ESP traffic between the 2 public IP's. 

For better understanding we need vpn debug while traffic is send towards the problem host. I assume you see on your side on the check point drops and unencrypted packets? Or you always see encrypted data in the logs towards the host? If you do not seen encrypted log entries it could be an indication it is an issue on your side. 

Are you using global encryption domain for tunnel? or you set it up on the community itself (would recommend this)

 

-------
If you like this post please give a thumbs up(kudo)! 🙂
0 Kudos
the_rock
Legend
Legend

Its actually set per community, not global. As far as packets, you see encrypted ones most of the time and then unencrypted say 5-10% of the time (randomly) and shows dropped because of ssl inspection, which I find very peculiar. 

Anyway, let me see if jumbo install makes any difference, but if not, then we may need to do some basic vpn debugs next time they can dedicate some time to this.

Thanks Lesley.

Andy

0 Kudos
the_rock
Legend
Legend

Just to give an update...client installed jumbo 90, but same issue, which Im really not surprised about. Anywho, they will open TAC case next week to check this further.

I will update when we have more info.

Best,

Andy

0 Kudos
Lesley
Leader Leader
Leader

Any debugs on the way? Ike viewer would help here a lot 

-------
If you like this post please give a thumbs up(kudo)! 🙂
0 Kudos
the_rock
Legend
Legend

Not yet sadly. These guys are super busy, so may take awhile :- (

0 Kudos
Duane_Toler
Advisor

I worked with a customer today on a new Azure VPN setup.  Had some issues, too, band the customer had the Windows firewall policy on the Azure VM not allowing some traffic in and out.  They did have some NSGs in place, too, which had to be adjusted.

Another item was missing a subnet on the Azure side VPN “local network gateway”.  This may not be your issue tho.

I also had their vpn community tunnel management set to “one tunnel per subnet pair” rather than universal tunnels. I wasn’t sure how they had their Azure side configured.

I hope some of this helps. Good luck with it!

(1)
the_rock
Legend
Legend

Hey Duane, tx for responding man, always nice to hear from you! Yea, we tested all those things you mentioned, tried different tunnel mgmt options, no dice. We know setup is right, as its ONLY this one host with the issue, but based on all Azure support did, they told the customer to look elsewhere.

I know they will probably open TAC case, but you know how it goes, lots of IT issues and only few guys to deal with them, so they need to work on more pressing problems, specially before holidays.

I know their IT boss will text me, as he always does, to ask for my help on this, though I feel he does it lately to get a good travel destination advice from me 🤣🤣

Anywho, I will certainly update once I have more info.

Thanks as always again!

Andy

0 Kudos
the_rock
Legend
Legend

Hey guys,

Just to update on this quick, customer told me they opened TAC case, but since its Txgiving in USA today, they will probably revisit this next week and let me know the outcome.

Andy

0 Kudos
the_rock
Legend
Legend

Just to provide quick update on this. Customer has the TAC case opened and I believe it went to senior engineer, so once I have more details and when it gets fixed, will let you guys know how.

Anyd

0 Kudos
the_rock
Legend
Legend

Hey guys,

Just to update on this, customer and I had been working with EXCELLENT Tier3 engineer in dallas TAC (I worked with him many times before, he is great) and he gave us some things to check, which we had verified, but still no luck. Strangely enough, when he had customer run vpn tu tlist -p peer_ip command, it shows message "narrowed" in one of the table entries.

We confirmed enc domains, enc. settings, did debugs, did tunnel resets, but same issue.

Will do another remote with TAC and hopefully have more clarity. I feel we are close to solving this.

Thanks for all the help.

Andy

0 Kudos
the_rock
Legend
Legend

Another quick update. TAC did below steps, but issue still not solved, though now we dont see "narrowed" message from vpn tu tlist.

-set enc domains as per user defined, rather than per gateway

-removed local gateway from both vpn communities

-we reseted PSK, since customer did not have it

-waited few mins, put gateways back in with new PSK

-tested again, but no joy, though now right key is presented when doing captures

I feel we are super close to solving this, but will have another remote tomorrow and see if we can figure it out.

Andy

0 Kudos
Duane_Toler
Advisor

Has a kdebug been ran by chance? My go-to starting point is  "fw ctl debug -m fw + drop conn packet" , and with "-m VPN + all" ? I'd do that simultaneously with a vpn debug.  In the debug outputs, see if you get the message "TAGGING" on the connection for that errant host.

There's also a "stupid sanity check":  Check to verify that errant IP (10.18.0.81, from your original message, or whatever it is) isn't somehow attached to another object in some ancient way, like a static NAT.  I'd suggest an API command on the management server:

mgmt_cli -r true -f json show-objects filter 10.18.0.81 ip-only true 

  (add "-d DOMAIN_NAME" if this is MDS).

Just some more ideas.  Hopefully you get some resolution soon!

0 Kudos
the_rock
Legend
Legend

Hey mate,

Thanks for checking in, always appreciated! So, TAC mentioned that based on the debugs, seems it could be fragmentation issue, so possibly mss clamping or mtu, but we will verify tomorow.

They also said its possible could be nat related, but I dont believe that would be the case here.

Lets see how tomorrow goes 🙂

Andy

0 Kudos
Timothy_Hall
Legend Legend
Legend

IKEv2 tunnel narrowing most definitely caused VPN interoperability problems at one point and is a good area to investigate, but to my knowledge all those bugs were fixed long ago.  That could explain why some traffic makes it and some doesn't if the problematic traffic was outside where the tunnel was narrowed by the responder, and the initiator didn't realize it or didn't process the narrowing properly.  I assume you have tried initiating the tunnel separately from each peer as the initiator and the problematic behavior remains no matter the original initiator of the tunnel?  If the problem only happens when a certain side is the initiator, that could well be a narrowing or subnets/Proxy-IDs issue.

A bit unlikely that intervening NAT is causing the issue, as NAT-T should start automatically.  I assume there is no NAT occurring somewhere between the VPN peers?  Usually NAT either exists between the peers or it doesn't, this won't normally change over time unless there are different ISPs in use, or the network path upstream of the peers is changing without your knowledge.

Seems unlikely but as far as fragmentation issues, are you allowing Type 3 Code 4 ICMP datagrams inbound to your gateway from any source on the Internet?  If you are not, Path MTU discovery will not work correctly in the event of a sub-1500 MTU occurring in the network path.  Once again, the low MTU may be intermittently occurring if the upstream path is changing without your knowledge, or you are unexpectedly failing over between multiple ISPs, or better yet experiencing traffic asymmetry between them.  If there is more than one ISP in use for this customer, completely disable one and fully test each ISP path one at a time.  Once again really seems unlikely that frags/MTU would affect VPN connectivity to only one Azure host IP within the tunnel.

Your earlier message about seeing some of these packets being dropped for ssl inspection caught my eye, if HTTPS Inspection is in use on the gateway make sure you are not specifying "Any" in any of the Destination and Service columns of all your HTTPS Inspection policy rules.  Traffic entering a VPN is not supposed to be inspected by HTTPS Inspection anyway, even though it is leaving on the interface that matches dynamic object "Internet".  Might be interesting to add an IP-based rule at the very top of your HTTPS Inspection policy matching all destination IPs accessible through the VPN tunnel with an action of Bypass and source Any, category Any just to make sure it is finding a bypass during the first pass on the HTTPS Inspection policy, and not being improperly pulled into active streaming for the second pass.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
(1)
the_rock
Legend
Legend

Thank you Tim, excellent feedback. I will definitely double check all that.

Andy

0 Kudos
the_rock
Legend
Legend

@Timothy_Hall My greatest appreciation for such an amazing response. We verified everything again and though we did not see any issues with things I mentioned, we revisited ssl inspection again and though there were no logs indicating an issue, we decided, why not, lets disable it as a test, try and BAM, it worked!

We then re-enabled it, made a bypass rule and all good now, customer was super happy, along with me and also great TAC engineer from Dallas.

I think friday the 13th was lucky for us, haha

Also grateful to everyone else  that helped

@G_W_Albrecht @AmirArama @Lesley @Duane_Toler 

Have a fantastic weekend everyone.

Best,

Andy

0 Kudos
Timothy_Hall
Legend Legend
Legend

Excellent, thanks for the update!  Traffic entering a VPN tunnel is not supposed to match against object Internet which should always be used in the Destination field of every HTTPS Inspection rule; Any should NEVER be used there.  So either there is an Any lurking in the destination of an HTTP Inspection rule somewhere, or this is some kind of bug.

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
the_rock
Legend
Legend

Hey Tim,

So any was not used in any of ssl inspection rules, BUT, not all of them had Internet either as destination.

Anyway, we are not too concerned if its a bug or not, Im just happy, as customer is, we got it fixed.

THANKS AGAIN!

Andy

0 Kudos
Duane_Toler
Advisor

Oh cool!  I didn't catch the line about SSL traffic; glad Timothy did!  What he says about the "Internet" object applies to any App Control/URL Filtering rules, too.  That's a big requirement that doesn't get enough attention.  Likewise, the interface zones are also a requirement.

Speaking of... along with with Timothy said about the TLS Inspection policy, what does your interface configurations for "Security Zones" look like?

@Timothy_Hall is there anything in here that can be in combination with the TLS Inspection blade here?  Maybe everything doesn't have the right "alignment of the stars" and that's triggering the issue?  Not necessarily "bug", but rather "incomplete configuration"?

 

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events