Re: R80.40 GNAT issue after Upgrade

StackCap43382 · ‎2020-09-09

Hi All,

Bit of a strange one after a staged upgrade of an r80.20 cluster to r80.40 Y77.

We have upgraded one of the nodes and enabled MVC but have hit what appears to be an fwx_alloc issue:

cloningd: Error in delayed connection() 111 - Connection refused
kernel: [fw4_0];fwxlate_allocate_port_from_sync: synced port already exists. Port 10702 (protocol 6) of hide_src ##########, dst ##########.
kernel: [fw4_0];fwxlate_sync_port_allocation: fwxlate_allocate_port_from_sync failed
kernel: [fw4_1];fwxlate_allocate_port_from_sync: synced port already exists. Port 10602 (protocol 6) of hide_src ##########, dst ##########.
kernel: [fw4_1];fwxlate_sync_port_allocation: fwxlate_allocate_port_from_sync failed
kernel: [fw4_4];de_allocate_port: fwx_alloc_global_del failed (second try). <ipp: 6 hide_src: ##########, port: 10637, dest: ##########, dport: 443 (443)>

kernel: [fw4_0];fwxlate_allocate_port_from_sync: synced port already exists. Port 10708 (protocol 6) of hide_src ##########, dst ##########.
kernel: [fw4_0];fwxlate_sync_port_allocation: fwxlate_allocate_port_from_sync failed
kernel: [fw4_0];de_allocate_port: fwx_alloc_global_del failed (second try). <ipp: 6 hide_src: ##########, port: 10404, dest: ##########, dport: 443 (443)>
kernel: [fw4_3];de_allocate_port: fwx_alloc_global_del failed (second try). <ipp: 6 hide_src: ##########, port: 10555, dest: ##########, dport: 443 (443)>

GNAT is set to 1 as its a 6 FW worker appliance.

Have a Ticket open with TAC but suspect next step is create and modify fwkern.conf to 0 as per sk165153.

Anyone come across this ?

UPDATE:

As per sk165153 and sk26202 we set fwx_gnat_enabled to 0 and rebooted the appliances. Fixed the issue.

CCSME, CCTE, CCME, CCVS

PhoneBoy · ‎2020-09-11

The way I read sk165153 is that it should only be enabled if there are more than 5 worker instances, which won't ever happen on a 6 core box since at least one would be SND.
Setting it to zero would be a surefire way to make sure it's disabled.

benko2 · ‎2020-09-21

We did upgrade of our cluster from R80.30 to R80.40 and also used MVC upgrade.

Now both nodes of our cluster are R80.40 and MVC is turned off (according to "show cluster members mvc"). We have 6 CoreXL instances and the "fw ctl get int fwx_gnat_enabled" command gives the output "fwx_gnat_enabled = 1".

But our /var/log/messages is full of messages like this:

Sep 21 20:01:03 2020 fwnode1 kernel: [fw4_2];de_allocate_port: fwx_alloc_global_del failed (second try). <ipp: 6 hide_src: xxx.xxx.xxx.xxx, port: 12858, dest: yyy.yyy.yyy.yyy, dport: 11680 (11680)>
Sep 21 20:01:03 2020 fwnode1 kernel: [fw4_2];de_allocate_port: fwx_alloc_global_del failed (second try). <ipp: 6 hide_src: xxx.xxx.xxx.xxx, port: 12857, dest: yyy.yyy.yyy.yyy, dport: 11680 (11680)>
Sep 21 20:01:03 2020 fwnode1 kernel: [fw4_2];de_allocate_port: fwx_alloc_global_del failed (second try). <ipp: 6 hide_src: xxx.xxx.xxx.xxx, port: 12847, dest: yyy.yyy.yyy.yyy, dport: 11680 (11680)>
Sep 21 20:01:04 2020 fwnode1 kernel: [fw4_2];de_allocate_port: fwx_alloc_global_del failed (second try). <ipp: 6 hide_src: xxx.xxx.xxx.xxx, port: 12859, dest: yyy.yyy.yyy.yyy, dport: 11680 (11680)>

StackCap43382 · ‎2020-09-21

As per sk165153 and sk26202 we set fwx_gnat_enabled to 0 and rebooted the appliances. Fixed the issue.

CCSME, CCTE, CCME, CCVS

AaronCP · ‎2021-07-22

Hey @StackCap43382,

Sorry for jumping on this old post, but it's the only result that Google churns out when I search for the error I'm seeing in our /var/log/messages files!

Here is an example of the error:

Jul 16 13:10:39 2021 CORP-FW1 kernel: [fw4_19];fwxlate_sync_port_allocation: fwxlate_allocate_port_from_sync failed

Jul 16 17:09:53 2021 CORP-FW1 kernel: [fw4_8];fwxlate_sync_port_allocation: fwxlate_allocate_port_from_sync failed

Jul 19 10:55:25 2021 CORP-FW1 kernel: [fw4_22];fwxlate_sync_port_allocation: fwxlate_allocate_port_from_sync failed

I was wondering if TAC gave you any explanation as to what these messages mean and what their impact could be? Also, did you notice any issues when disabling GNAT? I am contemplating disabling GNAT to see if it "resolves" another issue (here is the link to the post)

Any advice you could give would be much appreciated!

Thanks,

Aaron.

RamGuy239 · ‎2021-07-22

Agree, we need to know what these messages imply and what effect they might have. Simply disabling GNAT is not really a "solution", it's a workaround. If one wants to utilise CoreXL Split / Dynamic Balancing on an appliance running R80.40-R81.10 you need to have GNAT enabled.

I can't see any reason why GNAT should not be working when having less than five CoreXL workers/instances. Sure it might not be on by default if you have less than five. But that doesn't mean it should cause issues if you enable it and you might want to enable it for various reasons so it's good to know what these messages actually mean and how to deal with them.

Certifications: CCSA, CCSE, CCSM, CCSM ELITE, CCTA, CCTE, CCVS, CCME

StackCap43382 · ‎2021-07-23

Hi,

Closed the TAC case once we got it working again. Customer didn't have any apatite for another drawn out TAC engagement.

Been working fine for nearly a year.

non-G-NAT has been the NAT method since forever, It will work just like it did before.

CCSME, CCTE, CCME, CCVS

Are you a member of CheckMates?

R80.40 GNAT issue after Upgrade