Maestro - Asymmetric connection synchronization

Gennady

Hello, everyone!

This post is an attempt to better understand asymmetric connection synchronization process in a Maestro environment. The topic itself is not very well documented. You can find a mention of HyperSync technology in the SK147853 with no information about how the technology works.

The Checkpoint Clustering Protocol use synchronization messages to exchange information about connections state across SGMs in a Security Group or appliances in a Cluster. The list of the Sync message types can be found in SK92909. The better list can be found in output of "cphaprob ldstat" command. The most common message types are highlighted in bold.

# cphaprob ldstat
Operand            Calls     Bytes      Average   Ratio %
----------------------------------------------------------
ERROR              0         0          0         0
SET                267725188 4203817492 15        1
RENAME             0         0          0         0
REFRESH            518653817 1200181524 2         1
DELETE             203227437 3051096848 15        0
SLINK              649189418 2893417088 4         0
UNLINK             0         0          0         0
MODIFYFIELDS       763847709 979308832  1         0
RECORD DATA CONN   52        19344      372       0
COMPLETE DATA CONN 52        14976      288       0
GHTAB SYNC         0         0          0         0

Because a Security Group may have up to 28 SGMs, an optimization is required to efficiently handle delivery of the synchronization messages. Each connection should be synchronized only to a certain list of SGMs. The list depends on result of distribution calculation (dx calc) and configuration of a kernel parameter "fwha_sync_excp_mask".

Let's take an example of a dxl calculation for a Dual-Site setup. The connection is asymmetric with different SGMs receiving client to server and server to client packets.

# asg dxl calc 172.16.61.2 172.16.61.66 1
<172.16.61.2,172.16.61.66,dst_based,318>
Chassis 1: Blade(s):1_02,1_04
Chassis 2: Blade(s):2_02,2_04

# asg dxl calc 172.16.61.66 172.16.61.2 1
Interface bond4.913 mode is policy-internal
<172.16.61.66,172.16.61.2,dst_based,418>
Chassis 1: Blade(s):1_03,1_04
Chassis 2: Blade(s):2_03,2_04

The calculation above shows the following SGM roles:

Active SGMs on the Active Site: 1_02 (c2s) and 1_03 (s2c)
Backup SGMs on the Active Site: 1_04 (c2s) and 1_04 (s2c)
Backup SGMs on the Standby Site: 2_02 (c2s) and 2_03 (s2c)

fwha_sync_excp_mask=0 -> Disables the backup synchronization on the Active Site and the Standby Site. SGMs 1_02 and 1_03 will receive the synchronization messages. A failure of any Active SGM from the list will result in the connection drop.

fwha_sync_excp_mask=1 -> Synchronizes only the backup member on the Active Site. SGMs 1_02; 1_03; and 1_04 will receive the synchronization messages. A failure of the Active Site (switchover) will result in the connection drop.

fwha_sync_excp_mask=2 -> Synchronizes only the backup member on the Standby Site. SGMs 1_02; 1_03; 2_02 and 2_03 will receive the synchronization messages. A failure of any Active SGM must trigger a switchover from the Active Site to the Standby Site to avoid the connection drop.

fwha_sync_excp_mask=3 -> Synchronizes the backup member on the Active Site and the Standby Site. SGMs 1_02; 1_03; 1_04; 2_02 and 2_03 will receive the synchronization messages. Only a very unfortunate situation will result in the connection drop.

The text below includes samples from a debug taken with the following flags:

fw ctl debug -m fw + sync ld
fw ctl debug -m cluster + unisinc correction

Those samples are not a full debug example but the most relevant messages of my choice. The test environment is Maestro Dual site with Site 2 Active at the moment of the debug. fwha_sync_excp_mask=3 and delayed sync is in default configuration (enabled and delay is equal to 3 seconds)

Step 1. Before VM: SYN packet is received on c2s SGM 2_02. At the moment the SGM has no idea if the connection is asymmetric, so only itself is added into the sync set.

osp_calc_list_id: IP 172.16.61.2:40129 > 172.16.61.66:22: 6, Hash 2 ID 317, 14 0;
asym_osp_get_members_from_connkey: connkey <dir 0, 172.16.61.2:40129 -> 172.16.61.66:22 IPP 6> - adding member 2_2 to active sync set;
unisync_handle_new_conn_inbound: handling dir 0, 172.16.61.2:40129 -> 172.16.61.66:22 IPP 6;
fwconn_insert_conn_h: <dir 0, 172.16.61.2:40129 -> 172.16.61.66:22 IPP 6> added to dispatcher with ppack_id=-1 (qid=-1), member_id=15, flags=1;
fwlddist_set_ex: d=8158 tuple=<dir 0, 172.16.61.2:40129 -> 172.16.61.66:22 IPP 6,10001,40044080,f,21b,0,UUID: 68b8c1b-0000-0000-d8-8c-4e-10-91-89-e9-f0, 4d,ffffffff,ffffffff,ffffffff,2007800,f9200,90000000,OPQS:[0x7f9a38702808,0x7f9936af9748,0,0x7f99a7a58088,0,0,0x7f99847b1e48,0,0x7f98287aa608,0,0x7f99a0e95d28,0,0,0,0,0,0x7f99a0e93fc8,0x7f9a69e49278,0x7f9a55cfcad0,0,0,0,0],0,0,0,0,0,0,0,0,0,0,0,0,0,0@0/25>;

The fwlddist_set_ex is relevant for the 2_2 only for now.

Step 2. After VM: the SGM calculate s2c direction and understands that the connection is asymmetric.

osp_calc_list_id: IP 172.16.61.66:22 > 172.16.61.2:40129: 6, Hash 3 ID 417, 14 0;
asym_osp_get_members_from_connkey: connkey <dir 0, 172.16.61.66:22 -> 172.16.61.2:40129 IPP 6> - adding member 2_3 to active sync set;

Step 3. The connection is synchronized: 2_2 -> 2_3. 2_2 sends the SET using Flush and Ack mechanism (worth a separate post). The fwlddist_set_ex is relevant for 2_3 because it was added into active sync set above.

fwlddist_set_ex: d=8158 tuple=<dir 0, 172.16.61.2:40129 -> 172.16.61.66:22 IPP 6,10001,40044080,f,21b,0,UUID: 68b8c1b-0000-0000-d8-8c-4e-10-91-89-e9-f0, 4d,ffffffff,ffffffff,ffffffff,2007800,f9200,90000000,OPQS:[0x7f9a38702808,0x7f9936af9748,0,0x7f99a7a58088,0,0,0x7f99847b1e48,0,0x7f98287aa608,0,0x7f99a0e95d28,0,0,0,0,0,0x7f99a0e93fc8,0x7f9a69e49278,0x7f9a55cfcad0,0,0,0,0],0,0,0,0,0,0,0,0,0,0,0,0,0,0@0/18>;
fwconn_sync_connection: syncing connection <dir 0, 172.16.61.2:40129 -> 172.16.61.66:2 IPP 6> links;
fwconn_sync_links: syncing connection <dir 0, 172.16.61.2:40129 -> 172.16.61.66:22 IPP 6> links;
fwldbcast_flush_chain_with_callback: FnA on chain (inbound) 0x7ef21378a208: dir 1, 172.16.61.2:40129 -> 172.16.61.66:22 IPP 6 ;

Step 4. 2_3 receives the sync

fwconn_post_sync: conn <dir 0, 172.16.61.2:40129 -> 172.16.61.66:22 IPP 6> receiving sync with cflags = 40044080, ctype=10001;
fwconn_post_sync: connection [<172.16.61.2,40129,172.16.61.66,22,6>] [FW-0] synced in by 15
fwlddist_put_specific: d=8158 tuple=tuple=<dir 0, 172.16.61.2:40129 -> 172.16.61.66:22 IPP 6,10001,40044080,f,21b,0,UUID: 68b8c1b-0000-0000-d8-8c-4e-10-91-89-e9-f0, ffffffff,ffffffff,ffffffff,ffffffff,2007800,f9200,90000000,OPQS:[0x7f9a38702808,0x7f9936af9748,0,0x7f99a7a58088,0,0,0x7f99847b1e48,0,0x7f98287aa608,0,0x7f99a0e95d28,0,0,0,0,0,0x7f99a0e93fc8,0x7f9a69e49278,0x7f9a55cfcad0,0,0,0,0],0,0,0,0,0,0,0,0,0,0,0,0,0,0@0/23>;
fwconn_insert_conn_h: <dir 0, 172.16.61.2:40129 -> 172.16.61.66:22 IPP 6> added to dispatcher with ppack_id=-1 (qid=-1), member_id=15, flags=1;

The 4 steps above must happen faster than SYN-ACK arrives on 2_3. Otherwise, the SYN-ACK will be dropped as out_of_state packet. If everything works well, then the SYN-ACK will be corrected (Dispatcher Correction SK169154):

fwha_ccl_do_inbound_correct: Corrected <172.16.61.66(22) -> 172.16.61.2(40129) IPP 6> to member 15 ;

Step 4. Delayed Sync had been cancelled after 3 seconds since the SYN packet was received by 2_2. Note that there is nothing to worry about "expired" connection. PPK (performance pack/"SecureXL") uses this notification to announce a timer expiration.

[<172.16.61.2,40129,172.16.61.66,22,6>][PPK0] Connection expired;
osp_df: called with conn:<dir 0, 172.16.61.2:40129 -> 172.16.61.66:22 IPP 6> (zone:internal);
osp_calc_list_id: IP 172.16.61.2.40129 > 172.16.61.66.22 6, Hash 3 ID 317, l4 0;
asym_osp_get_members_from_connkey: connkey <dir 0, 172.16.61.2:40129 -> 172.16.61.66:22 IPP 6> - adding member 1_2 to backup sync set;
asym_osp_get_members_from_connkey: connkey <dir 0, 172.16.61.2:40129 -> 172.16.61.66:22 IPP 6> - adding member 2_2 to active sync set;
asym_osp_get_members_from_connkey: connkey <dir 0, 172.16.61.2:40129 -> 172.16.61.66:22 IPP 6> - adding member 2_4 to backup sync set;
osp_df: called with conn:<dir 1, 172.16.61.66:22 -> 172.16.61.2:40129 IPP 6> (zone:internal);
osp_calc_list_id: IP 172.16.61.66.22 > 172.16.61.2.40129 6, Hash 3 ID 417, l4 0;
asym_osp_get_members_from_connkey: connkey <dir 1, 172.16.61.66:22 -> 172.16.61.2:40129 IPP 6> - adding member 1_3 to backup sync set;
asym_osp_get_members_from_connkey: connkey <dir 1, 172.16.61.66:22 -> 172.16.61.2:40129 IPP 6> - adding member 2_3 to active sync set;
asym_osp_get_members_from_connkey: connkey <dir 1, 172.16.61.66:22 -> 172.16.61.2:40129 IPP 6> - adding member 2_4 to backup sync set;
fwconn_cals_cancel_delayed_sync: removed delayed_sync from connection <dir 0, 172.16.61.2:40129 -> 172.16.61.66:22 IPP 6>;
[<172.16.61.2,40129,172.16.61.66,22,6>][FW-0] Cancel delayed sync;
[<172.16.61.2,40129,172.16.61.66,22,6>][FW-0] New timeout:615, New aggr timeout:65, New pkt_delta:0,flags:0 delete eligible:0;
fwlddist_set_ex: d=8158 tuple=<dir 0, 172.16.61.2:40129 -> 172.16.61.66:22 IPP 6,10001,40044080,f,21b,0,UUID: 68b8c1b-0000-0000-d8-8c-4e-10-91-89-e9-f0, 4d,ffffffff,ffffffff,ffffffff,2007800,f9200,90000000,OPQS:[0x7f9a38702808,0x7f9936af9748,0,0x7f99a7a58088,0,0,0x7f99847b1e48,0,0x7f98287aa608,0,0x7f99a0e95d28,0,0,0,0,0,0x7f99a0e93fc8,0x7f9a69e49278,0x7f9a55cfcad0,0,0,0,0],0,0,0,0,0,0,0,0,0,0,0,0,0,0@0/615>;

The fwlddist_set_ex above is relevant for all the SGMs defined by distribution calculation and fwha_sync_excp_mask.

Step 5. The SGMs 1_2, 1_3 and 2_4 receive the initial sync for the connection. The SGM 2_3 receives the timer refresh.

2_3
fwconn_post_sync: conn <dir 0, 172.16.61.2:40129 -> 172.16.61.66:22 IPP 6> receiving sync with cflags = 40044080, ctype=1c001;
fwconn_timeout_from_sync_do_update: received refresh for <dir 0, 172.16.61.2:40129 -> 172.16.61.66:22 IPP 6>. timeout=620, pkt_delta=0, aggr_timeout=70;

1_2, 1_3, 2_4
fwconn_post_sync: conn <dir 0, 172.16.61.2:40129 -> 172.16.61.66:22 IPP 6> receiving sync with cflags = 40044080, ctype=1c001;
fwconn_timeout_from_sync_do_update: received refresh for <dir 0, 172.16.61.2:40129 -> 172.16.61.66:22 IPP 6>. timeout=620, pkt_delta=0, aggr_timeout=70;
fwconn_post_sync: connection [<172.16.61.2,40129,172.16.61.66,22,6>] [FW-0] synced in by 15
fwlddist_put_specific: d=8158 tuple=tuple=<dir 0, 172.16.61.2:40129 -> 172.16.61.66:22 IPP 6,1c001,40044080,f,21b,0,UUID: 68b8c1b-0000-0000-d8-8c-4e-10-91-89-e9-f0, ffffffff,ffffffff,ffffffff,ffffffff,2007800,f9200,90000000,OPQS:[0x7f9a38702808,0x7f9936af9748,0,0x7f99a7a58088,0,0,0x7f99847b1e48,0,0x7f98287aa608,0,0x7f99a0e95d28,0,0,0,0,0,0x7f99a0e93fc8,0x7f9a69e49278,0x7f9a55cfcad0,0,0,0,0],0,0,0,0,0,0,0,0,0,0,0,0,0,0@0/620>;
fwconn_insert_conn_h: <dir 0, 172.16.61.2:40129 -> 172.16.61.66:22 IPP 6> added to dispatcher with ppack_id=-1 (qid=-1), member_id=15, flags=1;

It may be useful to note the difference between fwlddist_set_specific and fwlddist_put_specific messages. The fwlddist_set_specific can be only seen on the originator of the sync SET message. The fwlddist_put_specific appears on the receiver end.

The fwconn_insert_conn_h is the adamant prove that the connection was indeed added into the connections table. If the fwconn_insert_conn_h doesn't appear in the debug, the entry was not added yet. It is a good pointer to a possible root cause of an out_of_state packet drops. Also, it is worth to mention that there is no 3 seconds of delayed sync for the connection synchronization between active sync set SGMs. Otherwise, it would be a disaster of out_of_state drops. Delayed Sync only relevant for synchronization operations between Active sync set and Backup sync set devices.

Possible problems caused by the synchronization process:

In some race conditions when number of connections in the connections table is counted in millions and number of connections in cphwd_db (fwaccel conns) is counted in hundreds of thousands, there may be a delay in delivery or processing of the synchronization messages. Is such a scenario the connection entries may not be updated properly which leads to unexpected packet drop.

Missed/delayed SET – a connection is added too late into connections table on an SGM. In case if it is an SGM from the Active sync set, then SYN-ACK will most probably be dropped. The scenario is separate from “frozen connection” (sk156752) which is handled by Flush and Ack. In our case the delay is not about FnA but about fwconn_insert_conn_h.
Missed REFRESH – a connection’s timers are not updated properly. The scenario leads to early connection expiration usually on an SGM from Backup sync set. When connection is expired on a Backup SGM it is also deleted from Active SGM which leads to out_of_state packet drop.
Missed DELETE – usually relevant at the switchover moment when a connection is “forgotten” at a Standby Site and not deleted. An unlucky distribution will allow the connection to expire and clean up the entry on an Active Site.
Missed MODIFYFIELDS – may lead to a mismatch of a connection state between SGMs. I have not yet seen a problem caused by this in a production or lab environment.

I hope that whoever reads this post will find it useful for future troubleshooting of synchronization issues. Please, let me know if anything above is wrong. It will help me with investigation of ongoing cases.

Are you a member of CheckMates?

Maestro - Asymmetric connection synchronization