Dear CheckMates,
we had a strange problem with connectivity between two proxy.
Connection looks like this:
clients => proxyA => routerA => VSX-gateway => routerB => proxyB => Internet
The clients connect to their proxyA, proxyA forwards all requests to proxyB and then proxyB sends all to the whole world.
VSX-gateway is in the middle.
Now we have massive connectivity problems between proxyA and proxyB. Website loading is very slow or can't be shown.
Sometimes proxyA reports connectivity loss to proxyB. Problems occurs only during production hours. If only 100users
are online everything is fine. With 1200users the connectivity problem occurs.
- no drop logs on the VSX-gateway for this connection, not in the logs not with fw ctl zdebug drop
- only conenction proxyA <=> proxyB involved, anything else works fine
- CPU utilization around 40% on all 16 cores, no 100% spikes
- all interfaces are fine, no rx-drops, LACP is ok
- disabling URLF, APPCL and all TP-Features (IPS, ABot etc.) doesn't help
- sim fast_accel rule for this connection doesn't help
What was changed ?
We replaced proxyA with a new vendor, now it's squid, former vendor Microsoft TMG.
And additional much more connections now, because of the homeworkers.
Seems the new proxy is the problem but we tested, analyzed and can't find anything wrong.
We had the chance to test the connection proxyA <=> proxyB via a second routing way without VSX-gateway.
Yeah, this works great, but is not what we want.
It seems a problem is with the high number of connections between the two proxies but I'm not aware of any limitations for such a case.
If we reset all connections for these proxies on the VSX-gateway, it works for 2-4minutes and then problem occurs again.
We found Latency and/or packet loss for traffic which passes through a Virtual Switch in a VSX Gateway , which described exactly our environment but we are on the latest R80.10 Jumbo.
ProxyB is working well, there are other proxies forwarding their traffic to proxyB without routing via VSX-gateway.
TAC is involved, but maybee someone has an idea what's going wrong?
thanks
Wolfgang