Hi,
we have an issue on our FW-cluster standby member, causing AV/TE-Updates to fail.
Internet connection is OK, but I assume that there is an issue with DNS. If I try to resolve e.g. checkpoint.com from our standby member, it times out and is not resolvable. The error messages vary:
[Expert@chp-2:0]# nslookup checkpoint.com
;; connection timed out; trying next origin
;; connection timed out; no servers could be reached
[Expert@chp-2:0]# nslookup facebook.com
;; connection timed out; trying next origin
Server: 10.1.1.14
Address: 10.1.1.14#53
** server can't find facebook.com: NXDOMAIN
If I do the same with the active member, it works:
[Expert@chp-1:0]# nslookup checkpoint.com
Server: 10.1.1.14
Address: 10.1.1.14#53
Non-authoritative answer:
Name: checkpoint.com
Address: 209.87.209.100
Afterwards, in most cases (80%), the resolution works then from the standby member too for the same host.
But not always - it's a very strange behavior.
If I do a failover, the standby (now main) FW works properly.
The DNS server are up and running, no issues at all (except this one from the HA standby member).
As far as I can see on another FW (ASA), located between CP and DNS servers, all requests are coming with the cluster IP.
I did a fw monitor and at the same time a capture on the ASA. All packets have corresponding packet captures on ASA, and there I can also see, that the servers answer to every request. But in the FW monitor, I don't see most of the answer-packets. Here's an example:
On the left side you can see the FW monitor output. Packets leave with the cluster VIP 10.2.1.1 on different ports.
On the right side you can see the ASA-capture where all these packets appear, and where you can also see the answers from the DNS server 10.1.1.14/15. (marked blue/azure)
But on the left side, these answers dont appear for the first 6 requests. Only the last, yellow one has a properly appearing answer.
The only strange thing that bothers me, is that the packet length of the failing answers seems to be much lower than the last one (26 compared to 96).
Any ideas, what could be causing this problem?