Background: I have a hub and spoke VPN community with the center gateways meshed. The community is configured to allow traffic to the center and to other satellites through the center. MEP is enabled and the entry point selection mechanism is set to Select the closet gateway to the source (first to respond) Route injection is enabled. All tests show that satellites start routing traffic through another center should one of the center gateways be no longer available. All good.
Unfortunately it's not real clear how traffic fails back if the original path becomes available again. For example, center gateways are C1, C2, and C3. A satellite is S1 Pings times from S1 to C1, C2, C3 are all different as noted below.
Path
| Site | ping times |
A | S1 to C1 | 30ms |
B | S1 to C2 | 100ms |
C | S1 to C3 | 250ms |
In this scenario, let's say Path A fails. Traffic fails to Path B. When Path A comes back online my expectation is that traffic will start using this path again because it offers better performance. In my working setup this does not seem to be the case. Is this a setting that has been missed or is this how MEP has been designed?
To make matters worse let's say Path A and Path B fail and now Path C is chosen Path A and Path B come back but traffic is still using Path C. User applications are stuck with 8 times worse performance for their applications. Not ideal.
If the Paths do not auto-fail back to the faster link, is there a way to manually fail the traffic back to the least latent link without bringing the VPN tunnels down?