traffic doesn't go to the right interface

Exonix · ‎2022-09-22

Hello all,

we have GW R81.10 and some PBR. We've just found that from time to time some traffic doesn't go to the right interface - GRE for Zscaler. The strange thing is that sometimes traffic goes where it should. This is a good traffic:

this is a bad traffic:

as you can see the difference is only the port definition. It is always 443, but different objects. tcp_443_noage has the following settings (unfortunately, I do not know the purpose of this object, but it is used by some rules for VMWare and veeam):

What could be wrong and how to fix it?

Thank you!

Chris_Atkinson · ‎2022-09-22

R81.30 is not a version that exists yet, do you mean R80.30?

What do your PBR rules look like?

CCSM R77/R80/ELITE

Exonix · ‎2022-09-22

it is R81.10, I've correted this information.

abihsot__ · ‎2022-09-22

oh, I was not aware you can attach PBR table based on FW rule number. How would that work if you add another rule above and the whole thing shifts?

As per port definition, it is probably some workaround for backup team with long running backups etc. I would suggest for regular user traffic sticking with standard https object.

I believe there was major PBR redesign some time ago - at least that's what I understood from release notes. We still have some incorrectly performed routing by PBR too, but on R80.40.

Exonix · ‎2022-09-22

after adding a new rule the number in the PBR is also changed. And yes, this is not only one problem with PBR... We have another case with CP Support where we have workaround (adding disabled rule before impacted rule) but we can't use it in my case.

the non-standart port is not defined in the Rule for Zscaler and appears only in the logs. How FW decides which object to use?

the_rock · ‎2022-09-22

Run ip r g and then IP address as a destination and verify it is indeed correct (from expert mode).

Best,
Andy

Exonix · ‎2022-09-22

this looks good, since eth1 is the external interface through which the gre tunnel naturally goes:

ip r g 18.185.14.90
18.185.14.90 via 194.xxx.xxx.xxx dev eth1 src 194.yyy.yyy.yyy
cache

PhoneBoy · ‎2022-09-23

A “different” service could be necessary if, for instance, you want certain HTTPS traffic to have a different timeout than the default.
I suspect there may be some issue with PBR, in which case you will probably need TAC assistance.

Exonix · ‎2022-09-26

the problem exists for a long time already. in some cases we managed to solve it, but in this case not. We have already opened a Ticket with the Support, but they havent't found a reson and solution yet...

RS_Daniel · ‎2022-09-26

Hello,

The PBR matching depend on your fw rule, so it would be usefull a capture to see how it looks like. Only one object with port TCP/443 should have the option Match for Any enabled, https object has this, so disable the option on one of the objects. On the rule are using specific service objects or any? I have found that using only the FW rule as condition can be quit problematic, in some cases, the reply packets are also routed trought the destination interface instead of sending it back to the internal interface. I would try to add another condition, for example the internal interface or source IP network.

Regards

Exonix · ‎2022-09-27

we have two rules:

100 source_A destination_B tcp_443_noage

500 source_C destination_D any

for Rule 500 we configured PBR

thanks for advice for additional condition!

RS_Daniel · ‎2022-09-27

Hello,

On rule 500, you are using service ANY. In that case, any service or service-range with option "Match for ANY" can match here. In the case of port TCP/443 you have two objects with this option enable, https and tcp_443_noage. According to sk150553 "it is highly recommended not having any conflicting or overlapping services with Match for 'Any' on."

I think the easiest way to fix this is disabling this option in tcp_443_noage. But if you need to keep the option enabled on this object so just use specific service objects on your rule 500, it would be https plus any other port you want. With that the matching traffic should always use https service object.

Regards

Exonix · ‎2022-10-05

One more question. I checked the firewall health and this is what I found. What does it mean "FW Tables Limit"? this test failed on both nodes. Is there any limit for amount\number of the FW\NAT Rules?

PhoneBoy · ‎2022-10-05

What generated this table you provided?

In general, there isn't a limit on the number of rules you can have.
That said, there can be issues on the management when you're managing a policy with several thousand rules or more.
Due to the mechanisms we provide such as groups, multiple sources/destination/service per access rule, Access Roles, and others, you shouldn't actually need that many rules.
Most policies I've seen that are thousands of rules can often be reduced substantially through an optimization exercise.

What does have limits are some of the kernel tables that we use to keep track of the various traffic going through the gateway.
The "peak" refers to the "high water mark" for the number of entries in the specified table.
Whether this points to an actual problem or not remains to be seen.

Exonix · ‎2022-10-13

that was HealthCheckPoint: hcp -r all --include-wts yes

There is an Update: one branch just got the same problem, but not GRE interface related. I've created a rule before the existing rule and only for affected traffic and port 443 - it did help!!! How is it possible?

Are you a member of CheckMates?

traffic doesn't go to the right interface