Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Zee
Contributor

VPN issue stuck in Phase 1 and sometimes disconnects

Hi Everyone,

Apologies for my beginners knowledge regarding Checkpoint , but I am having an issue with VPN tunnel from our HQ in Germany to one of the office in US. VPN gets stuck at phase 1 most of the times and sometimes it gets disconnected too but it is rare. As primary responsible is not available for some days and its a production environment , I need to find a fix. to make it work temporarily, I have to reset the tunnel and it starts working for some random time.  I could not find the issue via smart console logs. Has anyone experienced something like this and secondly has anyone used a script to reset tunnel whenever it is down or after an hour or so as a work around or some other solution. 

0 Kudos
33 Replies
P_Williams
Contributor

Is the VPN failing or is it running OK? I have seen the GUI say 'UP - Phase 1' without there being any issues reported.

Go onto the CLI in expert mode and do

#vpn tu

Option 3, put in the remote gateway address

Option 4, put in the remote gateway address

It might be that there is a mismatch between the encryption domains.

0 Kudos
Zee
Contributor

Hi, The VPN fails and the services are disrupted until tunnel is reset. No changes were done for VPN related configurations but this issue arose and it stucks on phase 1 randomly and just for two peers. I am using same VPN community with headquarter and other peer, it is working fine there. I verified vpn tu and it shows same IKE SA, could not find any mismatch will now.

0 Kudos
the_rock
Legend
Legend

Hey @Zee 

No worries, we are here to help. Just wondering, how is tunnel management tab configured inside the community object? Did this work and just started recently or was issue always there?

Any relevant logs you can send? Did you try do simple debug as below:

vpn debug trunc

vpn debug ikeon

-generate some traffic (30 seconds or 1 minute)

vpn debug ikeoff

Get ike* and vpnd* files from $FWDIR/log dir

Andy

0 Kudos
Zee
Contributor

Hi,
It was working before 15 July and no changes related to VPN was done which could have caused such an issue in my opinion. I have sent the debug files to TAC support but still have not heard anything relevant. I tried to verify the logs myself but could not found something specific to Phase 1 stuck issue. Moreover, I could not open iked0.elg via ikeview tool as I read somewhere that in R81.20, it should be ike.elg or trace file should have some relevant data.
The issue is very random, sometimes it wont arise for hours. I made a script to reset tunnel after 30 mins for now, but even with that sometimes it gets stuck and have to reset twice. (just a work around). The VPN community is same for HQ fw and all other FWs but the issue is  with one office to HQ fw, other vpn tunnels are fine.

0 Kudos
the_rock
Legend
Legend

K, so couple points about that setting. The way you have it is fine, BUT, in such case, I would make sure you use VTIs and set enc. domains as empty group. I find that works 100% of the time.

Andy

0 Kudos
Zee
Contributor

We are not using VTIs and enc. domains have the same IP pools which were before. The randomness of the issue has confused me 🙂

0 Kudos
the_rock
Legend
Legend

Not to sound funny now when I say this, but any time people tell me "O, this used to work yesterday", my answer is always "Well, I was a year younger last year, now Im not"...It would be nice if there was real time machine haha : - )

Anyway, lets see what we can do to help. Are you allowed to send debug file? I would be happy to check it.

Andy

0 Kudos
(1)
Zee
Contributor

I understand, there is something wrong which is causing this issue somehow. I don't know if this can help but these are some of the logs related to both fw in iked.elg files. Unfortunately, can not send the debug file as it is not allowed and ignore the IP prevention. 

        

0 Kudos
the_rock
Legend
Legend

Here is where I would start, or try to get this info...does it show which packet of phase 1 is it failing? Because you can totally forget about enc. domains, since thats always related to phase 2, but its not even getting there. So, for example, if it was failing on packet 4 phase 1, thats PSK issue, but anything before, most likely its not agreeing on algorithms.

Andy

0 Kudos
Zee
Contributor

Thank You for help Andy. I am following the same process to get the logs but somehow I am not getting relevant logs for the fw peer which is having this issue. The above mentioned logs were the only one which I could gather at the time of issue. May be I am doing something wrong. Secondly, after the script which is working after every 30 mins twice after 5 secs, the issue just comes 2-3 times in a day, earlier it was like more than 10 times. Do you think if the issue was with algo or PSK, resetting the tunnel would resolve the issue temporarily?

0 Kudos
the_rock
Legend
Legend

You can try that, does not hurt, but logically, it might not do anything to reset PSK unless there is log showing thats the issue. Plus, if PSK was the problem, tunnel would NEVER work to begin with 🙂

Andy

0 Kudos
Zee
Contributor

True, I will dig down more and see what TAC has to say, but idk why resetting the tunnel resolves the issue.

0 Kudos
the_rock
Legend
Legend

Is this CP to CP?

0 Kudos
Zee
Contributor

Yes, from HQ to one branch in another country. Almost all other CP FW which have a tunnel with HQ does not have this issue, except this one.

 

the_rock
Legend
Legend

Is it star or mesh community? If star, I fixed this sort of issue once by "flipping" centre and satellite gateways around, not sure if you can try that.

Andy

0 Kudos
Zee
Contributor

It is working in a mesh community actually

0 Kudos
the_rock
Legend
Legend

K, got it...I mean, you can try reset PSK, does not hurt and test.

Andy

0 Kudos
Zee
Contributor

I stopped the script and its been 24 hours I have not seen the issue with the same configurations. I am also thinking of an ISP issue but its a far stretched thought for now.

0 Kudos
the_rock
Legend
Legend

Yea...I would also agree its far-fetched it was an ISP issue, but definitely worth asking them.

Andy

0 Kudos
Zee
Contributor

I saw decryption failure: Could not get SAs from packet in one of the logs. Although there is no overlapping network but I changed the tunnel management option to per subnet. I have not seen any disconnection till now but I am not sure if this was the reason. Let's see.

0 Kudos
Zee
Contributor

Its been 2 weeks and TAC support is still unable to give me anything. After changing the tunnel management option, VPN issue between 2 FWs is somehow solved, but same issue is observed in others which are peering to main HQ FW. Has anyone faced such an issue in recent times?

0 Kudos
the_rock
Legend
Legend

Hey @Zee 

So what is tunnel management option set to currently? 

Andy

0 Kudos
Zee
Contributor

I changed it to 'per subnet' instead of 'per gateway' for the previous problematic one. For all others, its 'per gateway' but just 2 VPN tunnels are having the same issue, others are working fine. 

0 Kudos
the_rock
Legend
Legend

Can you do search for words "key install" in the logs and see if affected external IP shows up? Might give some clue...

Andy

0 Kudos
Zee
Contributor

Key Install only shows logs related to Ike: Auth exchange: Completed successfully related to affected IP even when it was stuck at phase1, I could not see any logs which could help me further, but anyways, thank you for your suggestions, I will further look into it and let you know if I find anything.

0 Kudos
the_rock
Legend
Legend

I feel bad cant help further, sorry : - (

0 Kudos
Zee
Contributor

So, the issue which I think after observing the tunnel management option is that our HQ Fw does not decrypt the traffic which is coming from other fws. I am not sure its because of the VPN community issue or DE fw itself but whenever there is an issue related to any HQ subnet unreachability from other fws, it is because there are no packets of VPN decryption on DE side and resetting the tunnel resolves it for sometime. I have tried alot of things but stuck on this until I hear back from TAC: I was wondering if someone had this issue in the past.

0 Kudos
the_rock
Legend
Legend

Its really difficult issue to troubleshoot, since best way to go about it would be to either check the relevant log or have some sort of cron job running that would catch the issue when it happens.

Andy

0 Kudos
Zee
Contributor

I could see the relevant SA in disconnected state when there was an issue, but I could not find anything online in order to resolve that permanently. You are right, it is a very difficult issue to resolve.

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events