I've been puzzling over this for several days now. I'm a Check Point partner, trying to bench a potential customer configuration. It just so happens that my home ISP configuration roughly mirrors this prospective customer, so I'm "doing it live" in a manner of speaking. The environment:
- ISP
- ISP-1 is 1Gb up/down, dynamic IP
- ISP-2 is 200Mb up/down, static IP
- Firewall: Check Point 5800 running R81.10 JHF81 (also testing with R81.20)
The customer requirement is seemingly simple: default all traffic to ISP-1, except for some number of internal subnets that will go through ISP-2. ISP Redundancy is optional. The customer has had this configuration in place on a Fortinet firewall for the past 6 years.
To accomplish this, I've done the following:
- Gateway configuration
- ISP-1 on eth1 set to obtain IP automatically.
- ISP-2 on eth2 set to static IP
- Static routes to internal subnets
- Default route empty
- PBR
- Table ISP2Table: default destination next hop ISP-2 gateway.
- Rule priority 1: source 10.60.128.0/23, interface eth3, table ISP2Table
- Kernel routes enabled
- SMS
- All traffic allowed out to internet for testing
- No ISP Redundancy configured on gateway
- Attempted three different NAT configurations:
- subnet 10.60.128.0/23 automatic Hide NAT on gateway
- subnet 10.60.128.0/23 automatic Hide NAT on ISP-2 IP address
- subnet 10.60.128.0/23 no automatic NAT, manual Hide NAT on ISP-2 IP address
The problem I'm running into is with the subnet(s) configured to go out ISP-2. I get frequent disconnects (e.g., Zoom calls drop) and slow performance getting to websites. Running fw ctl zdebug + drop, I see no drops. What is odd is fw monitor and tcpdump. I am seeing traffic leave eth1 (ISP-1) but with the source address of eth2 (ISP-2). It looks like initial connections are attempted through eth1 with the eth2 address as NAT, and then eventually it kicks over to eth2. This is most easily seen when I make a call using my VOIP desk phone. Dozens of packets leave through the wrong ISP link immediately after dialing, and the phone doesn't connect the call until the connection switches to eth2.
This behavior is not consistent. For a while after setting automatic Hide NAT on the network object and installing policy, everything worked great. All of the erroneous traffic on eth1 disappeared. It returned about 24 hours later after installing policy again.
I've done this before for other customers, and it has worked properly. The difference here is that ISP-1 is dynamic, and I'm leaning toward that being the issue. Has anyone attempted to implement a similar configuration?