so we have r80.20 cluster gaia, with fw vpn and ia enabled. corexl and securexl also enabled.
couple of days ago i added new vlan on empty interface for point to point against remote site FW, which connected through layer 2 line. so far so good. FWs are having vpn sts with each other. no static routes on that line, only encrypted traffic.
this GW actually connect HQ with all branches through main isp line on another interface.
today we had downs at least 7 times between HQ and all branches, each down time was for about 10-20 seconds, and go back up by itlsef., after checking with fw monitor i discovered that instead of routing packets directed to branches through the main isp line, the fw routed those packets through the new vlan interface that i meantioned above. and this is why the packets never arrived to the destination.
i thought first that maybe i had some duplicate routes, so i have checked, and there is no single route on this vlan interface except of course the directly connected point to point network which is in completely different subnet.
the things occured today before it started:
they go to this remote site to install pcs and printers etc.. which i don't believe relevant, and i fwaccel off and back on on this GW.
in messages i got a lot of :
kernel: [fw4_1];fwconn_recover_old_conn: connection is accelerated - cannot set handler.
kernel: [fw4_1];fwconn_recover_old_conn: handler (322) VERIFICATION_HANDLER. dropping packet
and also a lot from those: kernel: dst_release: dst:ffff8808147852c0 refcnt:-2
have no idea what these messages means..
it was happening for around 2 hours randomally and stopped about when they left the remote site. which again i don't believe related..
to me it looks very like a bug but i'm not sure why it happens just now and why with this new vlan specifically..
fwaccel off didn't solve the issue right away, but i just read that in r80.20 it not take effect on all connections as it was before.