Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Alex_Gilis
Advisor

R80.40 VSX - Temporary loss of routing

I have a TAC SR open but I'm posting this here in case in case someone experienced the same issue.

System is running VSX R80.40 Take 83 on a new appliance for some time and since a few days there actual losses of connectivity on one VS. Investigation network-wise showed nothing specific, but on /var/log messages this can be correlated with the creation of "messages_routed.vs1" and VS1 is the one with the issue.

There are always that kind of entries in the file.

Dec 29 17:34:09.749148 [routed] ERROR: recv(header) returns 0
Dec 29 17:34:09.749148 [routed] DEBUG: cpcl_recv: deleting peer task 0x984d13c

Dec 29 17:34:09.749148 [routed] DEBUG: peer_remove(130): Entering !!!!
Dec 29 17:34:59.681893 [routed] NOTICE: Exit routed[132332] version routed-09.25.2020-01:19:13
Dec 29 17:35:00 routed_syslog_on: Routed syslog to "/var/log/routed_messages_vs1" started

 

Followed by a bunch of "KRT REMNANT <routing prefix>: ignored"

Issue last several minutes after which the connectivity is restored without any specific intervention.

The start of the issue can't be linked to any specific architectural changes on the FW or the network to which it's connected.

0 Kudos
Reply
5 Replies
PhoneBoy
Admin
Admin

Messages about routed and the behavior do suggest that routes appear to disappear for a period of time.
Why, I can't say, and it's good you have a TAC case open 🙂

0 Kudos
Reply
Alex_Gilis
Advisor

Indeed, I see core dumps which match exactly the times of the outages, I will follow-up with TAC.

0 Kudos
Reply
ottawacanada150
Advisor

Not VSX expert by any means, but I recall in the past (though mind you this was R77 and before), easiest way to fix issues like that was to either restart routed process or simply soft reboot the box from the ssh. Not sure if you tried that, but technically, cprestart on master fw if its a cluster should suffice too.

0 Kudos
Reply
Alex_Gilis
Advisor

The issue in itself is self-fixing, routed gets its act together after a while and starts working again. The main thing is why it happens and how to fix it permanently, because now there's obviously a big target on the FW each time something happens in the network.

0 Kudos
Reply
ottawacanada150
Advisor

K, understood. In that case, maybe have TAC case opened and have it worked hopefully by a routing expert. I recall in the past couple of tickets like that went to R&D and it took months for any logical solution, so just be prepared, in case you are expecting a quick resolution. 

 

Happy New year!

0 Kudos
Reply