Are you finding that the rule with the Domain object is not being matched when attempts are made to connect to the destination using the DNS names, but it is intermittent?
Those logs look around 1 hour apart, which makes it look like a 1 hour caching timeout but non-FQDN objects (Domain Objects) aren't cached for an hour they are resolved with DNS lookups (see below), and then held in a different cache by the looks of it.
Maybe a cache full problem..
Did you check on the gateways command line with nslookup or ping, and check the DNS configuration on the gateways?
Try this on the gateway in expert mode:
domains_tool -ip 54.166.251.207
These show table summaries (-s), and #VALS is current number of entries:
fw ctl multik print_bl dns_reverse_cache_tbl -s
fw ctl multik print_bl dns_reverse_unmatched_cache -s
fw ctl multik print_bl dns_reverse_domains_tbl -s
Also try hcp -r all
You may have to open a ticket with TAC if you can't see anything obvious and AWS isn't broken in some way (again).
From: https://support.checkpoint.com/results/sk/sk90401
Solution
- The default TTL of the domain cache is 1 hour, which is being confirmed in sk120633 as:
The resolved IP addresses are cached, and traffic to those IP addresses are matched on the rule using that FQDN object. The timeout of the FQDN cache respects the TTL of the DNS. The default TTL of the domain cache is 1 hour. When an FQDN is sent for resolving, we add the response to cache for 60 minutes, we do not respect the actual TTL returned by the server.
- Also the default TTL is confirmed in sk181215:
Check Point Security Gateway will resolve the FQDN against the DNS servers configured every 60 seconds, we will cache the new IP address resolved and update the TTL if the IP is already cached before. The default TTL of the domain cache is 1 hour.
Architecture
When a connection that traverses the Security Gateway is being evaluated against the rulebase, if the Unified Policy mechanism encounters a possible match that includes a Domain Object, the object must be resolved before a verdict can be reached.
- If all Domain Objects in the Access Policy are in FQDN mode, the Security Gateway performs direct DNS resolution of each of the Domain Objects and caches the results
- If any domain object is not FQDN, and the IP address isn't in the cache, the Security Gateway must perform a reverse DNS request to determine whether the IP belongs to the Domain Object.
The Time-to-Live (TTL) for FQDN cache is 60 minutes. When using FQDN mode, all Domain Objects are refreshed once per minute. To refresh the Domain Object resolution, the Security Gateway queries all defined DNS servers for both "domain.com" and "www.domain.com" from the Domain Object.
For FQDN queries that return multiple results, there is no individual limit on the number of cached IP addresses per Domain Object. The Security Gateway's full cache size for Domain Objects maxes out at 25000 entries.
If changes are made to the Security Gateway's defined DNS servers, the WSDNSD process must be restarted to apply the changes to the resolution of Domain Objects.
Troubleshooting
To observe Domain Object resolution, use the domains_tool command:
- R81.10 and higher:
[Expert@SecurityGateway]# domains_tool {-ip <IP address> | -d <domain name> [ -m] | -uo <updatable object name> | -hc | -report }
- R81 and lower:
[Expert@SecurityGateway]# domains_tool {-ip <IP address> | -d <domain name> [ -m] | -uo <updatable object name> }
For more information on how to use domains_tool, refer to sk161632.