Maestro - Flush and Ack

Gennady

Hello, everyone!

Before you read this article, I highly recommend to read “Introduction to Sticky Connections” and “Non-Sticky Connections” from R81.20 ClusterXL Administration Guide.

Flush and Ack (FnA) is described in administration guides as a mechanism to ensure that non-sticky connections are processed correctly even in case if server-to-client packet arrives on a gateway which is not a Connection Owner. In fact, a connection (regardless of stickiness) must be synchronized by FnA mechanism before Cluster Correction Layer is able to forward s2c packet to the connection owner. Correction usage in Maestro elevates the scale of FnA to a much higher level than it is in ClusterXL. This is the reason why FnA mechanism should be reviewed in more details.

The article uses the connection between two test machines 172.16.61.2 and 172.16.61.66 with default distribution settingsThe test environment is R81.20 JHF take 98 Maestro Dual-Site.

# asg dxl calc 172.16.61.2 172.16.61.66 1
<172.16.61.2,172.16.61.66,dst_based,318>
Chassis 1: Blade(s):1_02,1_04
Chassis 2: Blade(s):2_02,2_04

# asg dxl calc 172.16.61.66 172.16.61.2 1
<172.16.61.66,172.16.61.2,dst_based,418>
Chassis 1: Blade(s):1_03,1_04
Chassis 2: Blade(s):2_03,2_04

c_type, c_flags and “frozen connection”

FnA story starts after a connection is accepted by the firewall, it is after VM outbound is completed. I will include timestamps of the debug messages for you to see how fast it supposed to work.

7:39:54.242302;[vs_0];[tid_54];[fw4_54];After  VM: <dir 0, 172.16.61.2:32229 -> 172.16.61.66:179 IPP 6> (len=72) TCP flags=0x2 (SYN), seq=3190727552, ack=0, data end=3190727553 ;
7:39:54.242302;[vs_0];[tid_54];[fw4_54];VM Final action=ACCEPT;
7:39:54.242407;[vs_0];[tid_54];[fw4_54];fwha_asym_handle_new_conn_outbound: handling dir 0, 172.16.61.2:32229 -> 172.16.61.66:179 IPP 6, conn_should_sync = 1, sync_without_backups = 0, data_conn = 0;
7:39:54.242410;[vs_0];[tid_54];[fw4_54];asym_get_s2c_info_outbound: s2c connkey <dir 0, 172.16.61.66:179 -> 172.16.61.2:32229 IPP 6>, is tunnel 0, ifn bond20.2205;

At this point the SGM knows that the connection is asymmetric and records the connection into the Connections table.

7:39:54.242429;[vs_0];[tid_54];[fw4_54];h_insert: hp=0x7fbaed3c1c08 tuple=<0,ac103d02,7de5,ac103d42,b3,6;10001,40044080,f,b,0,68dd130a,0,7b7b121e,fe63cb35,4c,ffffffff,ffffffff,ffffffff,2000800,f9200,90000000,340,0,250eee48,7fb9,0,0,0,0,0,0,0,0,0,0,0,0,0,0>;
7:39:54.242430;[vs_0];[tid_54];[fw4_54];h_insert: entry_flags=0, sync_mask = [00000000];

We can check two values from the connection entry to find out about current connection state.

c_type = 00010001

0 | No IP compression
0 | No IP compression
0 | None
1 | Client-to-Server connection
0 | None
0 | None
0 | None
1 | Connection is not inside a VPN community. Connection is not encrypted.

c_flags = 40044080

4 | SecureXL in a Cluster with asymmetric routing detected only TCP [SYN] and TCP [ACK] packets of a TCP handshake.
0 | None
0 | None
4 | Connection started with TCP [SYN] (SYN_SENT). TCP [SYN-ACK] or TCP [SYN-RST] are only allowed as a response.
4 | Replies are allowed for the connection (UDP/IP) - not unidirectional.
0 | None
8 | Security Gateway logged the connection.
0 | None

In addition, we see sync_mask = [00000000]; This means that the operation above is local to the SGM and will not be immediately followed by any synchronization actions. You can read more about sync_mask here: https://community.checkpoint.com/t5/Maestro/Maestro-Connection-synchronization-during-a-switchover/m...

Right after the connection was written to the Connections table the sync_mask is updated. It explicitly includes only the SMG 1_02.

7:39:54.242431;[vs_0];[tid_54];[fw4_54];ld2_set_wto_ttl_aggr: (after insert) hi_sync_mask is [0x00000002], modify is 1;

To summarize, based on the connection entry we can see that only SYN (no SYN-ACK) was seen. This is the start of the connection processing.

7:39:54.242446;[vs_0];[tid_54];[fw4_54];asym_osp_get_members_from_connkey: connkey <dir 0, 172.16.61.66:179 -> 172.16.61.2:32229 IPP 6> - adding member 1_3 to active sync set;
7:39:54.242447;[vs_0];[tid_54];[fw4_54];fwha_asym_handle_new_conn_outbound: setting FnA, s2c member is 2;

The SGM knows that the connection should be subjected to FnA 0.000145 seconds after the connection is accepted.

FnA starts to search for synchronization targets:
7:39:54.242451;[vs_0];[tid_54];[fw4_54];unisync_handle_new_conn_outbound: current sync mask is 0x00000002;
7:39:54.242452;[vs_0];[tid_54];[fw4_54];unisync_handle_new_conn_outbound: setting selective FnA to mask 0x00000004;
7:39:54.242453;[vs_0];[tid_54];[fw4_54];FW-1: fwconn_chain_sync_connection_with_links: called with sync_mask [0x00000006];

0x00000002 + 0x00000004 = 0x00000006 -> Means that SGMs 1_02 and 1_03 are included into the synchronization targets at the moment.

The connection is ready to be synchronized. fwlddist_set_ex and fwlddist_slink are called to prepare buffer for the connection entry and symlinks.

7:39:54.242480;[vs_0];[tid_54];[fw4_54];fwlddist_set_ex: d=8158 tuple=<dir 0, 172.16.61.2:32229 -> 172.16.61.66:179 IPP 6,10001,40044080,f,b,0,UUID: 68dd130a-0000-0000-1e-12-7b-7b-35-cb-63-fe, 4c,ffffffff,ffffffff,ffffffff,2000800,f9200,90000000,OPQS:[0,0x7fb9dafa2cc8,0,0x7fbc2eaffc68,0,0,0x7fb9d8cbd748,0,0,0,0x7fba62ad8438,0,0,0,0,0,0x7fbc2eaffc28,0x7fbbb9ed8038,0x7fbbad698650,0,0,0,0],0,0,0,0,0,0,0,0,0,0,0,0,0,0@0/18>;
7:39:54.242481;[vs_0];[tid_54];[fw4_54];fwlddist_set_ex: lp = 0x7fbaeb8a0148 (0x7fdf579db800[8158]), entry_flags=8, sync_mask=[0x00000006] (maxbchosts 28);
7:39:54.242482;[vs_0];[tid_54];[fw4_54];fwlddist_get_sync_ids: called with sync_mask [0x00000006];
7:39:54.242484;[vs_0];[tid_54];[fw4_54];get_sync_buf: SET in connections(8158), global 1, local 0, size 75;
7:39:54.242486;[vs_0];[tid_54];[fw4_54];fwconn_sync_connection: syncing connection <dir 0, 172.16.61.2:32229 -> 172.16.61.66:179 IPP 6> links;
7:39:54.242488;[vs_0];[tid_54];[fw4_54];fwconn_sync_links: syncing connection <dir 0, 172.16.61.2:32229 -> 172.16.61.66:179 IPP 6> links;
7:39:54.242490;[vs_0];[tid_54];[fw4_54];ld_sync_links: d=8158 lp=connections tuple=<0,ac103d02,7de5,ac103d42,b3,6>;
7:39:54.242497;[vs_0];[tid_54];[fw4_54];fwlddist_slink: lp = 0x7fbaeb8a0148 (0x7fdf579db800[8158]), sync_mask [0x00000006];
7:39:54.242498;[vs_0];[tid_54];[fw4_54];fwlddist_get_sync_ids: called with sync_mask [0x00000006];
7:39:54.242498;[vs_0];[tid_54];[fw4_54];get_sync_buf: SLINK in connections(8158), global 1, local 0, size 16;
7:39:54.242505;[vs_0];[tid_54];[fw4_54];fwlddist_slink: lp = 0x7fbaeb8a0148 (0x7fdf579db800[8158]), sync_mask [0x00000006];
7:39:54.242506;[vs_0];[tid_54];[fw4_54];fwlddist_get_sync_ids: called with sync_mask [0x00000006];
7:39:54.242507;[vs_0];[tid_54];[fw4_54];get_sync_buf: SLINK in connections(8158), global 1, local 0, size 16;
7:39:54.242514;[vs_0];[tid_54];[fw4_54];fwlddist_slink: lp = 0x7fbaeb8a0148 (0x7fdf579db800[8158]), sync_mask [0x00000006];
7:39:54.242515;[vs_0];[tid_54];[fw4_54];fwlddist_get_sync_ids: called with sync_mask [0x00000006];
7:39:54.242516;[vs_0];[tid_54];[fw4_54];get_sync_buf: SLINK in connections(8158), global 1, local 0, size 16;
7:39:54.242517;[vs_0];[tid_54];[fw4_54];unisync_handle_new_conn_outbound: new sync mask is 0x00000006;
7:39:54.242519;[vs_0];[tid_54];[fw4_54];fwldbcast_flush_chain_with_callback: FnA on chain (inbound) 0x7fbbb72d6188: dir 1, 172.16.61.2:32229 -> 172.16.61.66:179 IPP 6 ;
7:39:54.242520;[vs_0];[tid_54];[fw4_54];fwldbcast_drop_resume_connection_traffic: FnA Retransmissions for chain 0x0x7fbbb72d6188 will be dropped;
7:39:54.242521;[vs_0];[tid_54];[fw4_54];fwconn_drop_connection_traffic: connection traffic will be dropped;

The “connection traffic will be dropped” is the true state of things when the FnA process is on-going. Any packet for the connection is dropped until FnA is completed.

For example, if FnA takes long enough, SYN retransmission will arrive and must be dropped with reason “fw_conn_inspect Reason: Frozen connection (FWCONN_KTYPE_DROP) - connection packets in this direction are dropped;” - sk156752 - Cluster Member drops traffic with "dropped by fw_conn_inspect Reason: Frozen connection"

Let’s continue to keep track on time and note that 0.000219 seconds passed since the connection was Accepted.

This is the moment when FnA starts doing its job to ensure that the connection entry is synchronized to the s2c SGM. SYN packet will not be released to the egress interface until FnA process is completed.

The next step in the process is to call fwlddist_modifyfields locally for the SGM 1_02 and insert a little update in the connections table entry. After the insert the sync_mask returned back to 0x00000006.

7:39:54.242554;[vs_0];[tid_54];[fw4_54];fwlddist_modifyfields: d=8158 tuple=<dir 0, 172.16.61.2:32229 -> 172.16.61.66:179 IPP 6,/481,@0/-2147483648>;
7:39:54.242555;[vs_0];[tid_54];[fw4_54];fwlddist_modifyfields: lp = 0x7fbaeb8a0148 (0x7fdf579db800[8158]);
7:39:54.242556;[vs_0];[tid_54];[fw4_54];get_sync_buf: MODIFYFIELDS in connections(8158), global 1, local 0, size 20;
7:39:54.242567;[vs_0];[tid_54];[fw4_54];ld2_set_wto_ttl_aggr: d=8158 lp=connections tuple=<0,ac103d02,7de5,ac103d42,b3,6;210001,c0044080,f,b,0,68dd130a,0,7b7b121e,fe63cb35,4c,ffffffff,ffffffff,ffffffff,2000800,f9200,90000000,340,0,250eee48,7fb9,0,0,0,0,0,0,0,0,0,0,0,0,0,0@(mod)> flags 0 sync_mask [00000000] opcode 4;
7:39:54.242576;[vs_0];[tid_54];[fw4_54];h_insert: hp=0x7fbaed3c1c08 tuple=<0,ac103d02,7de5,ac103d42,b3,6;210001,c0044080,f,b,0,68dd130a,0,7b7b121e,fe63cb35,4c,ffffffff,ffffffff,ffffffff,2000800,f9200,90000000,340,0,250eee48,7fb9,0,0,0,0,0,0,0,0,0,0,0,0,0,0>;
7:39:54.242577;[vs_0];[tid_54];[fw4_54];h_insert: entry_flags=0, sync_mask = [00000000];
7:39:54.242578;[vs_0];[tid_54];[fw4_54];ld2_set_wto_ttl_aggr: (after insert) hi_sync_mask is [0x00000006], modify is 1;

The modified values are ctype and cflags:

c_type = 00210001

0 | No IP compression
0 | No IP compression
2 | FnA is in process. (This information was kindly added to the sk65133 by my request.)
1 | Client-to-Server connection
0 | None
0 | None
0 | None
1 | Connection is not inside a VPN community. Connection is not encrypted.

c_flags = c0044080

C (8+4) | SecureXL in a Cluster with asymmetric routing detected only TCP [SYN] and TCP [ACK] packets of a TCP handshake. + In a Cluster environment, connection should be dropped while waiting for Flush-and-Ack to finish.

0 | None
0 | None
4 | Connection started with TCP [SYN] (SYN_SENT). TCP [SYN-ACK] or TCP [SYN-RST] are only allowed as a response.
4 | Replies are allowed for the connection (UDP/IP) - not unidirectional.
0 | None
8 | Security Gateway logged the connection.
0 | None

As you can see, the connection entry includes updated c_type and c_flags to indicate that the traffic is temporary dropped. This is how you can track such connections in case if there are many “frozen” drops. Skyline also count this type of drop as “General FnA” inside metric “system.traffic.dropped” and “system.traffic.dropped.rate”.

Masks and indexes

There are many tables to be kept in sync between SGMs. Some of the tables are synchronized to all members of a Maestro set up. The Connections table should be synchronized only to an SGMs included in sync_mask. This is why FnA has two separate processes for broadcast sync and unicast sync.

Broadcast sync is handled via “sync_id 0”. Unicast sync is handled via “sync_id <SGM_ID>”. According to R&D, the two sync types has “separate pipes” and processed independently. The rest of the story is rather confusing.

Let’s clarify some new names before we continue with the debugs:

An “ipaddr” stands for an identifier of an SGM. It starts from 0 and goes up to 27. For example, ipaddr 14 stands for SGM 2_01, ipaddr 0 stands for SGM 1_01 and so on.
An “Index” stands for a position of the said SGM in bc_hostmask.
bc_hostmask is a list of all the SGMs which are required to acknowledge a sync_id to mark it fully acknowledged.
bc_nhosts is the number of SGMs to receive a sync_id.

Here is an example from the same debug:

7:39:54.242580;[vs_0];[tid_54];[fw4_54];ipaddr_toind: member 2 is at index 5 (total 7);
7:39:54.242582;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: adding index 5 (member 2) to global_sync_id_selective_fna_mask (sync_id 3);
7:39:54.242583;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: global_sync_id_selective_fna_mask is [0x00000020];

It is better to make a table to make things clearer. I will limit the table only to 4 SGMs per site as I have it in the debug. Each SGM also has a unicast sync_id. Sync_id 0 is reserved for the broadcast sync.

ALL   | sync_id 0 | ipaddr 255 | N/A       | N/A
1_01 | sync_id 1  | ipaddr 0   | member 0  | index 4
1_02 | sync_id 2  | ipaddr 1   | member 1  | no index because we don’t need to ack from the originator of the sync buffer
1_03 | sync_id 3  | ipaddr 2   | member 2  | index 5
1_04 | sync_id 4  | ipaddr 3   | member 3  | index 6
2_01 | sync_id 15 | ipaddr 14  | member 14 | index 0
2_02 | sync_id 16 | ipaddr 15  | member 15 | index 1
2_03 | sync_id 17 | ipaddr 16  | member 16 | index 2
2_04 | sync_id 18 | ipaddr 17  | member 17 | index 3

global_sync_id_selective_fna_mask is a new term you may notice in the debug lines above. global_sync_id_selective_fna_mask is binary list of indexes of SGMs for which a unicast sync will be initiated.

global_sync_id_selective_fna_mask is [0x00000020]; -> 00000000 00000000 00000000 00000000 0010 0000 -> index 5 -> sync_id 3 – means that sync_id 3 is required for Unicast Sync. And indeed, it is, because SGM 1_03 is part of sync_mask [0x00000006], so it absolutely must receive the sync message and acknowledge it.

The next portion of debug includes handling of all sync_ids (0-28).

Sync_id 0 is being processed by SGM 1_02 – the originator of the sync_buffer. We see the full list of ipaddr references below. It is worth to note that sync_id 0 has its own ack_seq numbers in comparison to unicast sync_ids.

7:39:54.242584;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: sync_id 0: bc_nhosts 7, bc_hostsmask [0x0000007f], bc_dst_ipaddr=255;
7:39:54.242584;[vs_0];[tid_54];[fw4_54];bchosts array:;
7:39:54.242585;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: ipaddr 14, lst_time 13791106, last_ack_seq 0x177b7 isvalid 1;
7:39:54.242586;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: ipaddr 15, lst_time 13791106, last_ack_seq 0x177b7 isvalid 1;
7:39:54.242586;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: ipaddr 16, lst_time 13791107, last_ack_seq 0x177b7 isvalid 1;
7:39:54.242587;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: ipaddr 17, lst_time 13791106, last_ack_seq 0x177b7 isvalid 1;
7:39:54.242587;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: ipaddr 0, lst_time 13791107, last_ack_seq 0x177b7 isvalid 1;
7:39:54.242588;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: ipaddr 2, lst_time 13791107, last_ack_seq 0x177b9 isvalid 1;
7:39:54.242589;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: ipaddr 3, lst_time 13791107, last_ack_seq 0x177b7 isvalid 1;
7:39:54.242589;[vs_0];[tid_54];[fw4_54];;
7:39:54.242590;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: handle sync_id 0 (bc_nhosts 7, bc_hostsmask [0x0000007f], bc_dst_ipaddr=255);
7:39:54.242591;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge_ex called with aseq is: 0x177b9 (do_ack 1, my_seq 0x177ba: last_time_sent 13770815, my_time 13791107);
7:39:54.242592;[vs_0];[tid_54];[fw4_54];fwlddist_flush_msg_block: Flushing block messages to all instances;
7:39:54.242629;[vs_0];[tid_54];[fw4_54];fwldbcast_flush_ex: sync_id 0, buflen 84( bc-ur: 0 ur: 0 end: 84);
7:39:54.242630;[vs_0];[tid_54];[fw4_54];fwldbcast_flush_ex:  ( bc-ur: 0 ur: 0 end: 84);
7:39:54.242631;[vs_0];[tid_54];[fw4_54];fwldbcast_flush_ex: sending buffer 0x7fbbd2068ad0 with 84 words;
7:39:54.242652;[vs_0];[tid_54];[fw4_54];fwldbcast_xmit_data: my_seq= 96186, last time sent d2203f, sync_id  0;
7:39:54.242654;[vs_0];[tid_54];[fw4_54];fwldbcast_xmit: sending type 128 seq 177ba len 85 dst ff frag_flag 0 sync_id 0;
7:39:54.242654;[vs_0];[tid_54];[fw4_54];fwldbcast_xmit: type for sync_id: 0, flags: 2, dst_ipaddr: 255;
7:39:54.242655;[vs_0];[tid_54];[fw4_54];fwldbcast_xmit: target ipaddr: 255;
7:39:54.242655;[vs_0];[tid_54];[fw4_54];fwldbcast_xmit: sync_id to at will be peer: 0;
7:39:54.242662;[vs_0];[tid_54];[fw4_54];fwha_protected_var_wrapper: param code 0, returning 26;
7:39:54.242664;[vs_0];[tid_54];[fw4_54];fwlddist_get_reset: resetting buffer (bc-ur: 84 ur: 84 end: 84 17500), sync_id 0;
7:39:54.242664;[vs_0];[tid_54];[fw4_54];fwlddist_get_reset: after reset (bc-ur: 0 ur: 0 end: 0 17500);
7:39:54.242665;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge_ex: buffer flushed (ack requested for sync_id 0);

sync_id 0: bc_nhosts 7, bc_hostsmask [0x0000007f], -> 00000000 00000000 00000000 00000000 01111111 -> means that index 0 to index 7 are included into sync_id_0.

You may notice that the message “fwldbcast_xmit_data: my_seq= 96186, last time sent d2203f, sync_id 0;” has “my_seq” value switched to decimal and “last time sent” value switched to hexadecimal. It makes reading of the debug a bit harder for no obvious reason.

Sync_id 1

@;1057811054.826559414; 1Oct2025  7:39:54.242666;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: sync_id 1: bc_nhosts 1, bc_hostsmask [0x00000001], bc_dst_ipaddr=0;
@;1057811054.826559415; 1Oct2025  7:39:54.242666;[vs_0];[tid_54];[fw4_54];bchosts array:;
@;1057811054.826559416; 1Oct2025  7:39:54.242667;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: ipaddr 0, lst_time 12158629, last_ack_seq 0x5d9255 isvalid 1;
@;1057811054.826559417; 1Oct2025  7:39:54.242668;[vs_0];[tid_54];[fw4_54];;

Sync_id 2

@;1057811054.826559418; 1Oct2025  7:39:54.242669;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: sync_id 2: bc_nhosts 0, bc_hostsmask [00000000], bc_dst_ipaddr=0;
@;1057811054.826559419; 1Oct2025  7:39:54.242669;[vs_0];[tid_54];[fw4_54];bchosts array:;
@;1057811054.826559420; 1Oct2025  7:39:54.242670;[vs_0];[tid_54];[fw4_54];;

Sync_id 3 is the one to have SET, SLINK and MODIFYFIELDS in the sync_buffer. The important parts of the debug is highlighted in bold.

last_ack_seq 0x612d5e – this is the sequence acked the previous time when a unicast sync was sent to SGM 1_03 from SGM 1_02. In case if the “last_ack_seq” differs more than for a half of the value “aseq” then all the acknowledgments will be ignored. It essentially breaks delta sync between two such SGMs.

my_seq 0x612d63– this is the sequence number used by the originator of the sync_buffer.

do_ack – indicates that an ACK is required for the sync_id.

sync_id to at will be peer: 2; - means that on the recipient side sync_id 2 should be used to send an acknowledgement.

7:39:54.242670;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: sync_id 3: bc_nhosts 1, bc_hostsmask [0x00000001], bc_dst_ipaddr=2;
7:39:54.242671;[vs_0];[tid_54];[fw4_54];bchosts array:;
7:39:54.242672;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: ipaddr 2, lst_time 13790512, last_ack_seq 0x612d5e isvalid 1;
7:39:54.242672;[vs_0];[tid_54];[fw4_54];;
7:39:54.242673;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: handle sync_id 3 (bc_nhosts 1, bc_hostsmask [0x00000001], bc_dst_ipaddr=2);
7:39:54.242674;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge_ex called with aseq is: 0x612d62 (do_ack 1, my_seq 0x612d63: last_time_sent 13790912, my_time 13791107);
7:39:54.242675;[vs_0];[tid_54];[fw4_54];fwldbcast_flush_ex: sync_id 3, buflen 148( bc-ur: 0 ur: 0 end: 148);
7:39:54.242675;[vs_0];[tid_54];[fw4_54];fwldbcast_flush_ex:  ( bc-ur: 0 ur: 0 end: 148);
7:39:54.242676;[vs_0];[tid_54];[fw4_54];fwldbcast_flush_ex: sending buffer 0x7fbbd085e910 with 148 words;
7:39:54.242687;[vs_0];[tid_54];[fw4_54];SET: connections (8158) keys(6)=<0,ac103d02,7de5,ac103d42,b3,6> vals(34)=<10001,40044080,f,b,0,68dd130a,0,7b7b121e,fe63cb35,4c,ffffffff,ffffffff,ffffffff,2000800,f9200,90000000,340,0,250eee48,7fb9,0,0,0,0,0,0,0,0,0,0,0,0,0,0> sync_mask=[0x00000006] refresh time=18 aggressive time=0 ttl=0 sync time=1 kmask=0 op_code=0 no n6addrs;
7:39:54.242688;[vs_0];[tid_54];[fw4_54]; no kbufs;
7:39:54.242689;[vs_0];[tid_54];[fw4_54]; got app_opaque id=1,size=80;
7:39:54.242693;[vs_0];[tid_54];[fw4_54];SLINK: connections (8158) keys=<0,ac103d42,b3,ac103d02,7de5,6> target keys=<0,ac103d02,7de5,ac103d42,b3,6> val=806;
7:39:54.242697;[vs_0];[tid_54];[fw4_54];SLINK: connections (8158) keys=<1,ac103d02,7de5,ac103d42,b3,6> target keys=<0,ac103d02,7de5,ac103d42,b3,6> val=802;
7:39:54.242700;[vs_0];[tid_54];[fw4_54];SLINK: connections (8158) keys=<1,ac103d42,b3,ac103d02,7de5,6> target keys=<0,ac103d02,7de5,ac103d42,b3,6> val=805;
7:39:54.242704;[vs_0];[tid_54];[fw4_54];MODIFYFIELDS connections (8158) keys=<0,ac103d02,7de5,ac103d42,b3,6> vals=<210001,68dd130a,0,7b7b121e,fe63cb35> modify aggressive time=0 ttl=0 fmask=1e1 op_code=0;
7:39:54.242705;[vs_0];[tid_54];[fw4_54];             no app_opaques given;
7:39:54.242707;[vs_0];[tid_54];[fw4_54];fwldbcast_xmit_data: my_seq= 6368611, last time sent d26ec0, sync_id  3;
7:39:54.242708;[vs_0];[tid_54];[fw4_54];fwldbcast_xmit: sending type 128 seq 612d63 len 149 dst 2 frag_flag 0 sync_id 3;
7:39:54.242708;[vs_0];[tid_54];[fw4_54];fwldbcast_xmit: type for sync_id: 3, flags: 1, dst_ipaddr: 2;
7:39:54.242709;[vs_0];[tid_54];[fw4_54];fwldbcast_xmit: target ipaddr: 2;
7:39:54.242709;[vs_0];[tid_54];[fw4_54];fwldbcast_xmit: sync_id to at will be peer: 2;
7:39:54.242713;[vs_0];[tid_54];[fw4_54];fwha_protected_var_wrapper: param code 0, returning 26;
7:39:54.242714;[vs_0];[tid_54];[fw4_54];fwlddist_get_reset: resetting buffer (bc-ur: 148 ur: 148 end: 148 17500), sync_id 3;
7:39:54.242715;[vs_0];[tid_54];[fw4_54];fwlddist_get_reset: after reset (bc-ur: 0 ur: 0 end: 0 17500);
7:39:54.242715;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge_ex: buffer flushed (ack requested for sync_id 3);

Sync_id 4 to sync_id 28 looks mostly the same because no information is sent. The SGMs behind the sync_ids are not part of current sync_mask [0x00000006]. It is worth to mention that bc_hostsmask [0x00000001] for each of the sync_ids below indicate necessity of an Ack for the Flush. No Ack is needed in case of bc_hostsmask [00000000].

7:39:54.242716;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: sync_id 4: bc_nhosts 1, bc_hostsmask [0x00000001], bc_dst_ipaddr=3;
7:39:54.242716;[vs_0];[tid_54];[fw4_54];bchosts array:;
7:39:54.242717;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: ipaddr 3, lst_time 12122649, last_ack_seq 0x463078 isvalid 1;
7:39:54.242717;[vs_0];[tid_54];[fw4_54];;
7:39:54.242718;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: sync_id 5: bc_nhosts 0, bc_hostsmask [00000000], bc_dst_ipaddr=4;
7:39:54.242718;[vs_0];[tid_54];[fw4_54];bchosts array:;
7:39:54.242719;[vs_0];[tid_54];[fw4_54];;
7:39:54.242719;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: sync_id 6: bc_nhosts 0, bc_hostsmask [00000000], bc_dst_ipaddr=5;
7:39:54.242720;[vs_0];[tid_54];[fw4_54];bchosts array:;
7:39:54.242720;[vs_0];[tid_54];[fw4_54];;
7:39:54.242721;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: sync_id 7: bc_nhosts 0, bc_hostsmask [00000000], bc_dst_ipaddr=6;
7:39:54.242721;[vs_0];[tid_54];[fw4_54];bchosts array:;
7:39:54.242721;[vs_0];[tid_54];[fw4_54];;
7:39:54.242722;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: sync_id 8: bc_nhosts 0, bc_hostsmask [00000000], bc_dst_ipaddr=7;
7:39:54.242722;[vs_0];[tid_54];[fw4_54];bchosts array:;
7:39:54.242722;[vs_0];[tid_54];[fw4_54];;
7:39:54.242723;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: sync_id 9: bc_nhosts 0, bc_hostsmask [00000000], bc_dst_ipaddr=8;
7:39:54.242723;[vs_0];[tid_54];[fw4_54];bchosts array:;
7:39:54.242724;[vs_0];[tid_54];[fw4_54];;
7:39:54.242725;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: sync_id 10: bc_nhosts 0, bc_hostsmask [00000000], bc_dst_ipaddr=9;
7:39:54.242725;[vs_0];[tid_54];[fw4_54];bchosts array:;
7:39:54.242725;[vs_0];[tid_54];[fw4_54];;
7:39:54.242726;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: sync_id 11: bc_nhosts 0, bc_hostsmask [00000000], bc_dst_ipaddr=10;
7:39:54.242726;[vs_0];[tid_54];[fw4_54];bchosts array:;
7:39:54.242727;[vs_0];[tid_54];[fw4_54];;
7:39:54.242727;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: sync_id 12: bc_nhosts 0, bc_hostsmask [00000000], bc_dst_ipaddr=11;
7:39:54.242728;[vs_0];[tid_54];[fw4_54];bchosts array:;
7:39:54.242728;[vs_0];[tid_54];[fw4_54];;
7:39:54.242729;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: sync_id 13: bc_nhosts 0, bc_hostsmask [00000000], bc_dst_ipaddr=12;
7:39:54.242729;[vs_0];[tid_54];[fw4_54];bchosts array:;
7:39:54.242729;[vs_0];[tid_54];[fw4_54];;
7:39:54.242730;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: sync_id 14: bc_nhosts 0, bc_hostsmask [00000000], bc_dst_ipaddr=13;
7:39:54.242730;[vs_0];[tid_54];[fw4_54];bchosts array:;
7:39:54.242730;[vs_0];[tid_54];[fw4_54];;
7:39:54.242731;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: sync_id 15: bc_nhosts 1, bc_hostsmask [0x00000001], bc_dst_ipaddr=14;
7:39:54.242731;[vs_0];[tid_54];[fw4_54];bchosts array:;
7:39:54.242732;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: ipaddr 14, lst_time 12122702, last_ack_seq 0x98671 isvalid 1;
7:39:54.242733;[vs_0];[tid_54];[fw4_54];;
7:39:54.242734;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: sync_id 16: bc_nhosts 1, bc_hostsmask [0x00000001], bc_dst_ipaddr=15;
7:39:54.242734;[vs_0];[tid_54];[fw4_54];bchosts array:;
7:39:54.242735;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: ipaddr 15, lst_time 12120539, last_ack_seq 0x1db22b isvalid 1;
7:39:54.242735;[vs_0];[tid_54];[fw4_54];;
7:39:54.242736;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: sync_id 17: bc_nhosts 1, bc_hostsmask [0x00000001], bc_dst_ipaddr=16;
7:39:54.242736;[vs_0];[tid_54];[fw4_54];bchosts array:;
7:39:54.242737;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: ipaddr 16, lst_time 12093736, last_ack_seq 0x9f328 isvalid 1;
7:39:54.242737;[vs_0];[tid_54];[fw4_54];;
7:39:54.242738;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: sync_id 18: bc_nhosts 1, bc_hostsmask [0x00000001], bc_dst_ipaddr=17;
7:39:54.242738;[vs_0];[tid_54];[fw4_54];bchosts array:;
7:39:54.242739;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: ipaddr 17, lst_time 12086683, last_ack_seq 0x88219 isvalid 1;
7:39:54.242739;[vs_0];[tid_54];[fw4_54];;
7:39:54.242740;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: sync_id 19: bc_nhosts 0, bc_hostsmask [00000000], bc_dst_ipaddr=18;
7:39:54.242740;[vs_0];[tid_54];[fw4_54];bchosts array:;
7:39:54.242741;[vs_0];[tid_54];[fw4_54];;
7:39:54.242741;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: sync_id 20: bc_nhosts 0, bc_hostsmask [00000000], bc_dst_ipaddr=19;
7:39:54.242742;[vs_0];[tid_54];[fw4_54];bchosts array:;
7:39:54.242742;[vs_0];[tid_54];[fw4_54];;
7:39:54.242743;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: sync_id 21: bc_nhosts 0, bc_hostsmask [00000000], bc_dst_ipaddr=20;
7:39:54.242743;[vs_0];[tid_54];[fw4_54];bchosts array:;
7:39:54.242743;[vs_0];[tid_54];[fw4_54];;
7:39:54.242744;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: sync_id 22: bc_nhosts 0, bc_hostsmask [00000000], bc_dst_ipaddr=21;
7:39:54.242744;[vs_0];[tid_54];[fw4_54];bchosts array:;
7:39:54.242745;[vs_0];[tid_54];[fw4_54];;
7:39:54.242745;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: sync_id 23: bc_nhosts 0, bc_hostsmask [00000000], bc_dst_ipaddr=22;
7:39:54.242746;[vs_0];[tid_54];[fw4_54];bchosts array:;
7:39:54.242746;[vs_0];[tid_54];[fw4_54];;
7:39:54.242747;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: sync_id 24: bc_nhosts 0, bc_hostsmask [00000000], bc_dst_ipaddr=23;
7:39:54.242747;[vs_0];[tid_54];[fw4_54];bchosts array:;
7:39:54.242747;[vs_0];[tid_54];[fw4_54];;
7:39:54.242748;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: sync_id 25: bc_nhosts 0, bc_hostsmask [00000000], bc_dst_ipaddr=24;
7:39:54.242748;[vs_0];[tid_54];[fw4_54];bchosts array:;
7:39:54.242748;[vs_0];[tid_54];[fw4_54];;
7:39:54.242749;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: sync_id 26: bc_nhosts 0, bc_hostsmask [00000000], bc_dst_ipaddr=25;
7:39:54.242749;[vs_0];[tid_54];[fw4_54];bchosts array:;
7:39:54.242750;[vs_0];[tid_54];[fw4_54];;
7:39:54.242750;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: sync_id 27: bc_nhosts 0, bc_hostsmask [00000000], bc_dst_ipaddr=26;
7:39:54.242751;[vs_0];[tid_54];[fw4_54];bchosts array:;
7:39:54.242751;[vs_0];[tid_54];[fw4_54];;
7:39:54.242752;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: sync_id 28: bc_nhosts 0, bc_hostsmask [00000000], bc_dst_ipaddr=27;
7:39:54.242752;[vs_0];[tid_54];[fw4_54];bchosts array:;
7:39:54.242752;[vs_0];[tid_54];[fw4_54];;

0.000450 seconds after the connection was accepted, we have all the Flush done.

The following lines includes an important summary of the FnA progress: we are waiting for acknowledgements from SGMs. A sync_id counts as fully acknowledged in case if ack_mask equals to bchosts_mask. In the example below you can find that for sync_id 0 ack_mask = [0x0000005f] and bchosts_mask = [0x0000007f]

[0x0000005f] -> 00000000 00000000 00000000 01011111

[0x0000007f] -> 00000000 00000000 00000000 01111111

It is “obvious” that “index 5” is the difference between the two. Index 5 translates to sync_id 3 which is SGM 1_03. We need only an ack from SGM 1_03 to count sync_id 0 as fully acknowledged. This is why we see “(sync_id 0): wait for FnA on seq 177ba for the following members: ipaddr 2,;”

For the unicast sync_ids we have ack_mask = [00000000] and bchosts_mask = [0x00000001]. We can instantly tell that only index 0 is missing. Sync_id 3 has only one destination at index 0 and it is SGM 1_03. This is why we see “(sync_id 3): wait for FnA on seq 612d63 for the following members: ipaddr 2,;”

Note that for the same destination (ipaddr 2) we have different seq number because broadcast and unicast sync pipes are separate.

Overall success for FnA is to match the callback_mask [0x00000009] -> 00000000 00000000 00000000 00001001 -> sync_id 0 and sync_id 3 must be confirmed.

7:39:54.242754;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: alloc_size 124 cb_shared 0x7fb7ebebf848 cb 0x7fb7ebebf854, callback_mask [0x00000009];
7:39:54.242754;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: created callback with 0x7fdf5590d7c0 data 0x7fbbb72d6188 seq 177ba sync_id 0 cb_index 0;
7:39:54.242755;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: ack_mask = [0x0000005f];
7:39:54.242756;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: ack_mask[sync_id] = [00000000];
7:39:54.242756;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: bchosts_mask = [0x0000007f];
7:39:54.242757;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: global_sync_id_selective_fna_mask = [0x00000020];
7:39:54.242758;[vs_0];[tid_54];[fw4_54];(sync_id 0): wait for FnA on seq 177ba for the following members:  ipaddr 2,;
7:39:54.242759;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: created callback with 0x7fdf5590d7c0 data 0x7fbbb72d6188 seq 612d63 sync_id 3 cb_index 1;
7:39:54.242760;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: ack_mask = [00000000];
7:39:54.242760;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: ack_mask[sync_id] = [00000000];
7:39:54.242761;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: bchosts_mask = [0x00000001];
7:39:54.242761;[vs_0];[tid_54];[fw4_54];fwldbcast_acknowledge: global_sync_id_selective_fna_mask = [0x00000020];
7:39:54.242762;[vs_0];[tid_54];[fw4_54];(sync_id 3): wait for FnA on seq 612d63 for the following members:  ipaddr 2,;
7:39:54.243046;[vs_0];[tid_54];[fw4_54];fwha_protected_var_wrapper: param code 0, returning 26;

As far as it goes, ack was requested from all the SGMs according to bchosts_mask. Hance, all the SGMs will send an acknowledgement.

Below, you can see that SGM 1_02 receives an ack from all the other SGMs in the Security Group and the process doesn’t stop until ack is received from SGM 1_03. This is the only SGM which matters in this case because ack_mask 0x0000005f (0 in the mask stands for SGM indexes required to ack) + 0x00000020 (index of SGM 1_03) = 0x0000007f (bchosts_mask) and this is the stop factor for the sync_id 0 -> sync_id 0 was acked by all members.

7:39:54.243053;[vs_0];[tid_54];[fw4_54];ipaddr_toind: member 16 is at index 2 (total 7);
7:39:54.243054;[vs_0];[tid_54];[fw4_54];fwldbc_handle_callbacks: (sync_id 0) handle ack from member 16 (ind 2 seq 177ba bchosts_mask [0x0000007f]);
7:39:54.243055;[vs_0];[tid_54];[fw4_54];fwldbc_handle_callbacks: cb->seq 177ba cb->mask [0x0000005f];
7:39:54.243057;[vs_0];[tid_54];[fw4_54];FnA: fwldbc_handle_callbacks: (sync_id 0) mark ack from member 16;
7:39:54.243059;[vs_0];[tid_54];[fw4_54];fwha_protected_var_wrapper: param code 0, returning 26;
7:39:54.243062;[vs_0];[tid_54];[fw4_54];ipaddr_toind: member 15 is at index 1 (total 7);
7:39:54.243063;[vs_0];[tid_54];[fw4_54];fwldbc_handle_callbacks: (sync_id 0) handle ack from member 15 (ind 1 seq 177ba bchosts_mask [0x0000007f]);
7:39:54.243064;[vs_0];[tid_54];[fw4_54];fwldbc_handle_callbacks: cb->seq 177ba cb->mask [0x0000005f];
7:39:54.243064;[vs_0];[tid_54];[fw4_54];FnA: fwldbc_handle_callbacks: (sync_id 0) mark ack from member 15;
7:39:54.243066;[vs_0];[tid_54];[fw4_54];fwha_protected_var_wrapper: param code 0, returning 26;
7:39:54.243069;[vs_0];[tid_54];[fw4_54];ipaddr_toind: member 0 is at index 4 (total 7);
7:39:54.243070;[vs_0];[tid_54];[fw4_54];fwldbc_handle_callbacks: (sync_id 0) handle ack from member 0 (ind 4 seq 177ba bchosts_mask [0x0000007f]);
7:39:54.243071;[vs_0];[tid_54];[fw4_54];fwldbc_handle_callbacks: cb->seq 177ba cb->mask [0x0000005f];
7:39:54.243071;[vs_0];[tid_54];[fw4_54];FnA: fwldbc_handle_callbacks: (sync_id 0) mark ack from member 0;
7:39:54.243073;[vs_0];[tid_54];[fw4_54];fwha_protected_var_wrapper: param code 0, returning 26;
7:39:54.243076;[vs_0];[tid_54];[fw4_54];ipaddr_toind: member 3 is at index 6 (total 7);
7:39:54.243077;[vs_0];[tid_54];[fw4_54];fwldbc_handle_callbacks: (sync_id 0) handle ack from member 3 (ind 6 seq 177ba bchosts_mask [0x0000007f]);
7:39:54.243077;[vs_0];[tid_54];[fw4_54];fwldbc_handle_callbacks: cb->seq 177ba cb->mask [0x0000005f];
7:39:54.243078;[vs_0];[tid_54];[fw4_54];FnA: fwldbc_handle_callbacks: (sync_id 0) mark ack from member 3;
7:39:54.243089;[vs_0];[tid_54];[fw4_54];fwha_protected_var_wrapper: param code 0, returning 26;
7:39:54.243092;[vs_0];[tid_54];[fw4_54];ipaddr_toind: member 17 is at index 3 (total 7);
7:39:54.243093;[vs_0];[tid_54];[fw4_54];fwldbc_handle_callbacks: (sync_id 0) handle ack from member 17 (ind 3 seq 177ba bchosts_mask [0x0000007f]);
7:39:54.243094;[vs_0];[tid_54];[fw4_54];fwldbc_handle_callbacks: cb->seq 177ba cb->mask [0x0000005f];
7:39:54.243094;[vs_0];[tid_54];[fw4_54];FnA: fwldbc_handle_callbacks: (sync_id 0) mark ack from member 17;
7:39:54.243108;[vs_0];[tid_54];[fw4_54];fwha_protected_var_wrapper: param code 0, returning 26;
7:39:54.243111;[vs_0];[tid_54];[fw4_54];ipaddr_toind: member 14 is at index 0 (total 7);
7:39:54.243112;[vs_0];[tid_54];[fw4_54];fwldbc_handle_callbacks: (sync_id 0) handle ack from member 14 (ind 0 seq 177ba bchosts_mask [0x0000007f]);
7:39:54.243112;[vs_0];[tid_54];[fw4_54];fwldbc_handle_callbacks: cb->seq 177ba cb->mask [0x0000005f];
7:39:54.243113;[vs_0];[tid_54];[fw4_54];FnA: fwldbc_handle_callbacks: (sync_id 0) mark ack from member 14;
7:39:54.243489;[vs_0];[tid_54];[fw4_54];fwha_protected_var_wrapper: param code 0, returning 26;
7:39:54.243492;[vs_0];[tid_54];[fw4_54];ipaddr_toind: member 2 is at index 5 (total 7);
7:39:54.243492;[vs_0];[tid_54];[fw4_54];fwldbc_handle_callbacks: (sync_id 0) handle ack from member 2 (ind 5 seq 177ba bchosts_mask [0x0000007f]);
7:39:54.243493;[vs_0];[tid_54];[fw4_54];fwldbc_handle_callbacks: cb->seq 177ba cb->mask [0x0000005f];
7:39:54.243493;[vs_0];[tid_54];[fw4_54];FnA: fwldbc_handle_callbacks: (sync_id 0) mark ack from member 2;
7:39:54.243494;[vs_0];[tid_54];[fw4_54];FnA: fwldbc_handle_callbacks: sync_id 0 was acked by all members. ;
7:39:54.243494;[vs_0];[tid_54];[fw4_54];FnA: sync_ids_acked_mask before: [00000000];
7:39:54.243495;[vs_0];[tid_54];[fw4_54];FnA: sync_ids_acked_mask after: [0x00000001];
7:39:54.243496;[vs_0];[tid_54];[fw4_54];FnA: sync_ids_mask: [0x00000009];
7:39:54.243496;[vs_0];[tid_54];[fw4_54];FnA: fwldbc_handle_callbacks: more sync_ids are required. waiting with callback.;
7:39:54.243502;[vs_0];[tid_54];[fw4_54];fwha_protected_var_wrapper: param code 0, returning 26;

[0x00000001] is not equal to [0x00000009] and because of it more sync_ids are required. waiting with callback. The missing sync_id is about to be completed.

7:39:54.243503;[vs_0];[tid_54];[fw4_54];fwldbcast_handle_packet: packet from 2 seq 612d64 type 1 ack_fl 0 fragm_flag 0 len 3, sync_id 3;
7:39:54.243504;[vs_0];[tid_54];[fw4_54];ipaddr_toind: member 2 is at index 0 (total 1);
7:39:54.243504;[vs_0];[tid_54];[fw4_54];fwldbcast_handle_retrans_request called with sync_id = 3 (from ipaddr 2. ind is 0);
7:39:54.243505;[vs_0];[tid_54];[fw4_54];fwldbcast_handle_retrans_request: got retreq from 0 seq 612d64 mask1=0 mask2=0 lseq 612d63 last_seq 0 last_mask 0 (updated host last_ack_seq 612d63);
7:39:54.243506;[vs_0];[tid_54];[fw4_54];ipaddr_toind: member 2 is at index 0 (total 1);
7:39:54.243507;[vs_0];[tid_54];[fw4_54];fwldbc_handle_callbacks: (sync_id 3) handle ack from member 2 (ind 0 seq 612d63 bchosts_mask [0x00000001]);
7:39:54.243507;[vs_0];[tid_54];[fw4_54];fwldbc_handle_callbacks: cb->seq 612d63 cb->mask [00000000];
7:39:54.243508;[vs_0];[tid_54];[fw4_54];FnA: fwldbc_handle_callbacks: (sync_id 3) mark ack from member 2;
7:39:54.243508;[vs_0];[tid_54];[fw4_54];FnA: fwldbc_handle_callbacks: sync_id 3 was acked by all members. ;
7:39:54.243509;[vs_0];[tid_54];[fw4_54];FnA: sync_ids_acked_mask before: [0x00000001];
7:39:54.243509;[vs_0];[tid_54];[fw4_54];FnA: sync_ids_acked_mask after: [0x00000009];
7:39:54.243510;[vs_0];[tid_54];[fw4_54];FnA: sync_ids_mask: [0x00000009];

At this moment all the sync_ids are acknowledged. The synchronization process is completed between the SGMs of an active sync set according to sync_mask [0x00000006]. The following lines indicate that no packet will be dropped any longer because of FnA process.

7:39:54.243511;[vs_0];[tid_54];[fw4_54];FnA: fwldbc_handle_callbacks: registered callback 0x0x7fdf5590d7c0 with data 0x0x7fbbb72d6188;
7:39:54.243513;[vs_0];[tid_54];[fw4_54];fwldbcast_queue_flush_callback: called with callback 0x0x7fdf5590d7c0 and data 0x0x7fbbb72d6188;
7:39:54.243542;[vs_0];[tid_54];[fw4_54];210001,c0044080,f,b,0,UUID: 68dd130a-0000-0000-1e-12-7b-7b-35-cb-63-fe, 4c,ffffffff,ffffffff,ffffffff,2000800,f9200,90000000,OPQS:[0,0x7fb9dafa2cc8,0,0x7fbc2eaffc68,0,0,0x7fb9d8cbd748,0,0,0,0x7fba62ad8438,0,0,0,0,0,0x7fbc2eaffc28,0x7fbbb9ed8038,0x7fbbad698650,0,0,0,0],0,0,0,0,0,0,0,0,0,0,0,0,0,0>
7:39:54.243543;[vs_0];[tid_54];[fw4_54];fwldbcast_drop_resume_connection_traffic: FnA Resuming traffic for chain 0x0x7fbbb72d6188 (not dropping traffic);
7:39:54.243544;[vs_0];[tid_54];[fw4_54];fwconn_resume_connection_traffic: connection traffic will no longer be dropped;

0.001242 seconds after the connection was accepted, we have completed the FnA process and the initial packet can be released to the egress interface.

You may ask what about a Backup sync set? A Backup sync set is synchronized after fw_cluster_delay_sync_sec (default is 3 seconds) and the sync happens without FnA. Standard delta sync is used to communicate messages for a backup sync set.

Possible problems caused by the Flush and Ack process

Egress packet delay

The most obvious problem is a delay before an initial packet can be sent from egress interface. In the perfect conditions of the lab when there is no traffic but only the test connection, the delay appeared to be 0.001242 seconds. This doesn’t look too bad but it increases significantly under high load. It is very good idea to keep an eye on “frozen connection” drop count in production environment. This type of drop is exclusively related to FnA and indicates the problem with the delay being long enough to cause SYN retransmission drop.

Overall delay for the FnA process cannot exceed the value of fwldbcast_pending_timeout (default is 50 HTU = 5 seconds). After 5 seconds the message “fwconn_resume_connection_traffic: connection traffic will no longer be dropped;” will appear even if no ack was received for a sync_id.

Changing the value is a bad solution to prevent frozen connections. It only replaces General FnA drops on c2s SGM by out_of_state packet drops on s2c SGM. I may recommend to decrease the value of fwldbcast_pending_timeout only in case if some malfunction in the FnA process itself cause you the trouble. You must be 100% sure that the sync messages are indeed delivered even if acknowledgement is not received by the originator.

CCP version mismatch in multi-version affects FnA process

The scenario is related to Maestro. When an administrator adds a new SGM to the R81.20 Security Group, the SGM boots with clean version of R81.20 (CCP version 4201) which is before JHF take 14. At the same time all the pre-existing SGMs are already have something newer then JHF take 14 installed (CCP version 4251). The scenario is partially described in https://support.checkpoint.com/results/sk/sk182829

At the moment pre-existing SGMs has “cphaconf mvc on” as default configuration. The new SGM has “cphaconf mvc off”. The new SGM is not considered as Production because it is not on the same JHF version as rest of the Security Group. There is nothing to complain about.

As the next step, the SGM is rebooted because of auto-clone feature is enabled or it is upgraded to the proper JHF version by the system administrator. After the reboot the new SGM is Active with proper JHF version. JHF take 14 was never installed on this SGM. An unforeseen situation happens. Until another reboot the new SGM still uses CCP 4201 packet structure for Delta Sync. The main change between 4251 and 4201 based on the taken debugs is the size of seq_num field. 4251 has 8 HEX digits for the length of sequence numbers and 4201 has 6 HEX digits for the same. As a result, the initial sequence number (when no sync_id was processed yet) for unicast sync is ffffff in version 4201 and ffffffff for version 4251. As you may notice ffffffff is always more than twice as big as ffffff. This condition leads to CCP packet being ignored and delta sync being broken. The problem is self-fixed by transit traffic via the Security Group. New transit connections quickly increase sequence numbers and the initial sequence ffffff is replaced by 1000 (the same value for CCP 4201 and 4251). The same doesn’t work for local connections which makes pre-production checks to fail: unstable connection to Log and Smart Management servers, RADIUS authentication failures etc..

The problem was addressed by PRHF-41550 as a result of the debug analyzes for a broken FnA process when a new SGM was added to a Security Group.

Before one can find the reason for a process to be broken it is better to understand how it works.

Many consequences of wrong distribution.

Selecting of a wrong (unsuitable) distribution method leads to increase asymmetric packer flow between Security Group Members.
Asymmetric traffic flow requires CCL to make additional packet forwarding from server-to-client SGM to client-to-server SGM.
CCL requires the connection entry in connections table and global connections to have all the information for proper packet forwarding
Additional entries should be synchronized between SGMs
For the synchronization to work properly and efficiently a need for selective sync_mask appeared with defined Active sync set and Backup sync set. The sync_mask have to be calculated at appropriate time of connection lifecycle and Maestro/Chassis Active/Down/Standby lifecycle.
Connection Owner acquires a new role to maintain and distribute information about sync_mask.
To ensure reliable delivery of delta sync synchronization messages based on sync_maks, an extended need for Flush and Ack mechanism appeared in Scalable Platforms
Flush and Ack mechanism introduces additional need for sequence number tracking, hold of initial packet, monitoring of broadcast sync and unicast sync completion

Please, select a distribution method properly. The overhead caused by bad connection distribution is not limited to one additional forwarding action by CCL.

Hauke

We had the same issues with the distribution.