- CheckMates
- :
- Products
- :
- Quantum
- :
- Security Gateways
- :
- R80.40 VSX - Temporary loss of routing
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Are you a member of CheckMates?
×- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
R80.40 VSX - Temporary loss of routing
I have a TAC SR open but I'm posting this here in case in case someone experienced the same issue.
System is running VSX R80.40 Take 83 on a new appliance for some time and since a few days there actual losses of connectivity on one VS. Investigation network-wise showed nothing specific, but on /var/log messages this can be correlated with the creation of "messages_routed.vs1" and VS1 is the one with the issue.
There are always that kind of entries in the file.
Dec 29 17:34:09.749148 [routed] ERROR: recv(header) returns 0
Dec 29 17:34:09.749148 [routed] DEBUG: cpcl_recv: deleting peer task 0x984d13c
Dec 29 17:34:09.749148 [routed] DEBUG: peer_remove(130): Entering !!!!
Dec 29 17:34:59.681893 [routed] NOTICE: Exit routed[132332] version routed-09.25.2020-01:19:13
Dec 29 17:35:00 routed_syslog_on: Routed syslog to "/var/log/routed_messages_vs1" started
Followed by a bunch of "KRT REMNANT <routing prefix>: ignored"
Issue last several minutes after which the connectivity is restored without any specific intervention.
The start of the issue can't be linked to any specific architectural changes on the FW or the network to which it's connected.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Messages about routed and the behavior do suggest that routes appear to disappear for a period of time.
Why, I can't say, and it's good you have a TAC case open 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Indeed, I see core dumps which match exactly the times of the outages, I will follow-up with TAC.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Not VSX expert by any means, but I recall in the past (though mind you this was R77 and before), easiest way to fix issues like that was to either restart routed process or simply soft reboot the box from the ssh. Not sure if you tried that, but technically, cprestart on master fw if its a cluster should suffice too.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The issue in itself is self-fixing, routed gets its act together after a while and starts working again. The main thing is why it happens and how to fix it permanently, because now there's obviously a big target on the FW each time something happens in the network.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
K, understood. In that case, maybe have TAC case opened and have it worked hopefully by a routing expert. I recall in the past couple of tickets like that went to R&D and it took months for any logical solution, so just be prepared, in case you are expecting a quick resolution.
Happy New year!
