Re: Temporary Inconsistent VPN Failure

Ryan_Coots · ‎2021-07-30

Does anyone know what would cause a temporary (5-10 minute) outage in both a site-to-site VPN tunnel (CP to CP) and users using the CP Mobile Access VPN Client? About once a month, nothing consistent all of a sudden some of my Remote Access VPN users, and some of my site-to-site VPN tunnels will lose connectivity for about 5-10 minutes, and then come back up all on their own.

All I see in the logs are "TUNNEL STATUS CHANGE: Peer gateway XXXXXXX has changed state to down" and then UP, this happens about 3 times over 10 minutes, and then it all comes back and works great for sometime length of time between 3 weeks and 2 months.

Has anyone seen anything like this before?

Thanks all!

We are on R80.40, and new hardware 7000 series HA pair for the main site and then 3600s, and some Cisco routers for the far site VPNs, I lose both the Ciscos and the CPs when this occurs, but not all of my sites appear to go down.

HeikoAnkenbrand · ‎2021-07-31

Can you provide more information and an IKE debug.

More here:

- ONELINER - Easy VPN Debug

➜ CCSM Elite, CCME, CCTE ➜ www.checkpoint.tips

the_rock · ‎2021-07-31

Heiko is 100% right...this definitely needs VPN debug ran, for sure. That error message is sadly very generic, so its hard to say what exactly could cause it. Just out of curiosity, since its CP to CP, do you have it set as permanent tunnel?

Best,
Andy

Timothy_Hall · ‎2021-07-31

This sounds like an intermittent failure/impairment of the vpnd daemon which handles certain VPN operations outside of INSPECT/SecureXL. If this daemon is dead or becomes impaired, most existing tunnels handled by INSPECT/SecureXL will continue to work, but anything that needs to be handled by vpnd will not (Visitor Mode Traffic, NAT-T Traffic, IKE negotiations/rekeys). Try looking around in $FWDIR/log/vpnd.elg to see if there are any errors logged around the time of the issue, also check the start time and date of the vpnd daemon with the ps command and see if it was restarted by fwd around the time of the failure (or when things suddenly started working again). vpnd is a child process of fwd and is not restarted by cpwd like most Check Point processes.

The good news is that the old vpnd daemon has gotten some serious love from Check Point R&D recently to address many, many longstanding stability/performance issues with it, including migrating some of vpnd's functions into INSPECT/SecureXL (i.e. large scale Visitor Mode support). So the solution here may be as simple as loading the latest GA Jumbo HFA for your release.

Gaia 4.18 (R82) Immersion Tips, Tricks, & Best Practices Video Course
Now Available at https://shadowpeak.com/gaia4-18-immersion-course

PhoneBoy · ‎2021-08-01

It has also gotten some additional love in R81.10 where the user space parts of S2S VPN and C2S VPN are now handled in separate processes.

Ryan_Coots · ‎2021-08-02

Thanks Timothy, nothing too strange in the vpnd.elg file, the daemon did restart on the day of the issue, but not anything that lines up with the timing. I will enable debug logging and see if I can't capture something there the next time the issue occurs. Thanks!

Are you a member of CheckMates?

Temporary Inconsistent VPN Failure