Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Exonix
Collaborator

traffic doesn't go to the right interface

Hello all,

we have GW R81.10 and some PBR. We've just found that from time to time some traffic doesn't go to the right interface - GRE for Zscaler. The strange thing is that sometimes traffic goes where it should. This is a good traffic:

gre1.png

this is a bad traffic:

gre2.png

 

as you can see the difference is only the port definition. It is always 443, but different objects. tcp_443_noage has the following settings (unfortunately, I do not know the purpose of this object, but it is used by some rules for VMWare and veeam):

443.png

 

What could be wrong and how to fix it?

Thank you!

0 Kudos
14 Replies

R81.30 is not a version that exists yet, do you mean R80.30?

What do your PBR rules look like?

0 Kudos
Exonix
Collaborator

it is R81.10, I've correted this information.

pbr1.png

0 Kudos
abihsot__
Advisor

oh, I was not aware you can attach PBR table based on FW rule number. How would that work if you add another rule above and the whole thing shifts?

 

As per port definition, it is probably some workaround for backup team with long running backups etc. I would suggest for regular user traffic sticking with standard https object.

 

I believe there was major PBR redesign some time ago - at least that's what I understood from release notes. We still have some incorrectly performed routing by PBR too, but on R80.40.

0 Kudos
Exonix
Collaborator

after adding a new rule the number in the PBR is also changed. And yes, this is not only one problem with PBR... We have another case with CP Support where we have workaround (adding disabled rule before impacted rule) but we can't use it in my case.

the non-standart port is not defined in the Rule for Zscaler and appears only in the logs. How FW decides which object to use?

logs1.png

0 Kudos
the_rock
Champion
Champion

Run ip r g and then IP address as a destination and verify it is indeed correct (from expert mode).

0 Kudos
Exonix
Collaborator

this looks good, since eth1 is the external interface through which the gre tunnel naturally goes:

ip r g 18.185.14.90
18.185.14.90 via 194.xxx.xxx.xxx dev eth1 src 194.yyy.yyy.yyy
cache

PhoneBoy
Admin
Admin

A “different” service could be necessary if, for instance, you want certain HTTPS traffic to have a different timeout than the default.
I suspect there may be some issue with PBR, in which case you will probably need TAC assistance.

0 Kudos
Exonix
Collaborator

the problem exists for a long time already. in some cases we managed to solve it, but in this case not. We have already opened a Ticket with the Support, but they havent't found a reson and solution yet...

0 Kudos
RS_Daniel
Advisor

Hello,

The PBR matching depend on your fw rule, so it would be usefull a capture to see how it looks like. Only one object with port TCP/443 should have the option Match for Any enabled, https object has this, so disable the option on one of the objects. On the rule are using specific service objects or any? I have found that using only the FW rule as condition can be quit problematic, in some cases, the reply packets are also routed trought the destination interface instead of sending it back to the internal interface. I would try to add another condition, for example the internal interface or source IP network.

Regards

0 Kudos
Exonix
Collaborator

we have two rules:

100 source_A destination_B tcp_443_noage

500 source_C destination_D any

for Rule 500 we configured PBR

 

thanks for advice for additional condition!

0 Kudos
RS_Daniel
Advisor

Hello,

On rule 500, you are using service ANY. In that case, any service or service-range with option "Match for ANY" can match here. In the case of port TCP/443 you have two objects with this option enable, https and tcp_443_noage. According to sk150553 "it is highly recommended not having any conflicting or overlapping services with Match for 'Any' on."

I think the easiest way to fix this is disabling this option in tcp_443_noage. But if you need to keep the option enabled on this object so just use specific service objects on your rule 500, it would be https plus any other port you want. With that the matching traffic should always use https service object.

Regards

Exonix
Collaborator

One more question. I checked the firewall health and this is what I found. What does it mean "FW Tables Limit"? this test failed on both nodes. Is there any limit for amount\number of the FW\NAT Rules?

FW Kernel Tables.png

 

0 Kudos
PhoneBoy
Admin
Admin

What generated this table you provided?

In general, there isn't a limit on the number of rules you can have.
That said, there can be issues on the management when you're managing a policy with several thousand rules or more.
Due to the mechanisms we provide such as groups, multiple sources/destination/service per access rule, Access Roles, and others, you shouldn't actually need that many rules.
Most policies I've seen that are thousands of rules can often be reduced substantially through an optimization exercise.

What does have limits are some of the kernel tables that we use to keep track of the various traffic going through the gateway.
The "peak" refers to the "high water mark" for the number of entries in the specified table.
Whether this points to an actual problem or not remains to be seen.

0 Kudos
Exonix
Collaborator

that was HealthCheckPoint: hcp -r all --include-wts yes

 

There is an Update: one branch just got the same problem, but not GRE interface related. I've created a rule before the existing rule and only for affected traffic and port 443 - it did help!!! How is it possible?

443.2.png

0 Kudos