Everyone:
We have been experiencing brief S2S VPN outages after a cluster failover. This is mostly experienced on the application side - I don't have a lot of firewall logs to give any hints of what is going on. Background information:
Site to site VPN between two centrally managed HA clusters. Both clusters are running R81.20 with JHFA Take 89. The VPN community is meshed, permanent tunnels are set, tunnel sharing is set to "One VPN tunnel per subnet pair." "IKEv2 only" is the encryption method.
In the firewall logs, I see a couple of IKE failures with message "Child SA exchange: Ended with error." I get this log a full minute after the failover. A number of seconds later I receive another IKE failover message: "Child SA exchange: Exchange failed: timeout reached."
I've gone down a number of rabbit holes on this one. One thing I found was on one cluster in the community "Maximum concurrent IKE negotiations:" is set to 200. On the other cluster in the community, it is set to 1000. Other items: on each cluster member, the vpn_queues table is empty:

And if it helps, here is the output showing some SA information on one cluster member:

I'm pretty much at a loss how to attack this. This is a highly critical environment and nobody likes it when connections are dropped. Any direction from the community would be appreciated.
Dave