Maestro - Connection synchronization during a swit...

Gennady · ‎2025-09-30

Hello, everyone!

This article is written to summarize connection synchronization process during a switchover in Maestro Dual-Site environment.

Disclaimer: the information below is based on debug data. Please, post a comment below if you have any corrections based on a known SK article or documentation.

The article uses the connection between two test machines 172.16.61.2 and 172.16.61.66 with default distribution settings: destination based with no L4. The test environment is R81.20 JHF take 98 Maestro Dual-Site.

Distribution calculation results are the following:

# asg dxl calc 172.16.61.2 172.16.61.66 1
<172.16.61.2,172.16.61.66,dst_based,318>
Chassis 1: Blade(s):1_02,1_04
Chassis 2: Blade(s):2_02,2_04

# asg dxl calc 172.16.61.66 172.16.61.2 1
<172.16.61.66,172.16.61.2,dst_based,418>
Chassis 1: Blade(s):1_03,1_04
Chassis 2: Blade(s):2_03,2_04

1. Connection Owner

A Connection Owner is an SGM which takes care of the first packet received for a connection. In case if the connection is sticky, cluster correction layer will forward all the following packets to the Connection Owner. In regards of a switchover process a Connection Owner has an additional role. The role is to track a list of destinations for synchronization messages for a certain connection. A Connection Owner not only knows which SGMs should receive a sync for the connection but also updates every involved SGM in case if the list changes. Non-Owner SGMs doesn’t know where to send synchronization updates until get a message from the Owner.

It is very counter intuitive, but a Connection Owner doesn’t follow a switchover process. In addition, Cluster Correction Layer doesn’t correct any packet to a Connection Owner which resides on a Standby or Down Site. It is implied by known limitation MBS-3944.

Here is an example from a debug taken with “fw ctl debug -m cluster + unisync correction” flags:

[kern];[tid_40];[fw4_0];fwha_ccl_is_stateful_corr_needed: <172.16.61.2(33183) -> 172.16.61.66(31435) IPP6>: chassis 0 is STANDBY in VS0, will not correct;
[kern];[tid_40];[fw4_0];fwha_ccl_inbound: no need to correct <172.16.61.2(33183) -> 172.16.61.66(31435) IPP6>

All the following packets will be processed by a whatever SGM receives those. The packets are not corrected anywhere because the Connection Owner is now locked on a non-Active site. This behavior has some performance implications…

A record about a Connection Owner is stored in at least two tables:

Connections table.

You can use “fw tab -t connections -u” in expert mode to see the full content of the connections table. There is the sk65133 with full description of the connections table format. It is highly recommended to read.

Global Connections table (fw_multik_ld_gconn_table)

You can use “fw ctl multik gconn” to see the full content of the global connections table. There is no an SK with proper description of the Global Connections table. Luckily, the command “fw ctl multik gconn” can be found in a CLI Reference Guide.

There are several ways to find out an SGM which is a Connection Owner for a certain connection.

Check output of “asg search”.

The well-known “asg search” script will find the Connection Owner and mark it with [O] in the output. The problem is that when a Site is in DOWN state, asg search will not show any Connection Owner if the connection was established on the currently DOWN site.

Check output of “fw ctl multik gconn” and find Cluster ID field.

The example below shows that some connections have a Connection Owner on SMG 1_1, others has it on SGM 1_2 or SGM 2_2. This example was taken from “R81.20 CLI Reference Guide” and we can see that the Connection Owner didn’t follow a switchover either.

[Expert@MyGW:0]# fw ctl multik gconn

Default:
==========================================================================================================================
| Segm | Src IP | S.port | Dst IP | D.port | Proto | Flags | PP |Ref Cnt(I/O)|Inst|PPAK ID|clstr mem ID|Rec. ref|Rec. Type|
==========================================================================================================================
|  0  | 192.168.3.52    | 18192 | 192.168.3.240   | 46082 | 6  |FP .. ..| No | 0/0 |  1  | 32 |   0    |   0   | UNDEF |
|  0  | 192.168.3.52    | 54216 | 192.168.3.240   | 257   | 6  |FP .. ..| No | 0/0 |  1  | 32 |   0    |   0   | UNDEF |
|  0  | 192.168.3.240   | 53925 | 192.168.3.53    | 18192 | 6  |FP .. ..| No | 0/0 |  0  | 32 |   1    |   0   | UNDEF |
|  0  | 192.168.3.240   | 257   | 192.168.3.52    | 54216 | 6  |FP .. ..| No | 0/0 |  1  | 32 |   0    |   0   | UNDEF |
|  0  | 192.168.3.53    | 18192 | 192.168.3.240   | 64216 | 6  |FP .. ..| No | 0/0 |  1  | 32 |   15   |   0   | UNDEF |
|  0  | 0.0.0.0         | 8116  | 192.168.3.53    | 8116  | 17 |FP .. ..| No | 0/0 |  1  | 32 |   1    |   0   | UNDEF |
|  0  | 0.0.0.0         | 8116  | 192.168.3.52    | 8116  | 17 |FP .. ..| No | 0/0 |  1  | 32 |   0    |   0   | UNDEF |
|  0  | 192.168.3.240   | 64216 | 192.168.3.53    | 18192 | 6  |FP .. ..| No | 0/0 |  1  | 32 |   15   |   0   | UNDEF |

Check an output of non-formatted “fw tab -t connections -u” and find BITS1 field. It is marked as bold with underscore on the example below.

<00000000, ac103d42, 000047df, ac103d02, 000000b3, 00000006; 0001c001, 40044080, 0000000f, 0000000b, 00000000, 68da50ae, 00000000, 412afd3c, e6fc6e36, 00000054, ffffffff, ffffffff, ffffffff, 02008000, 000f9280, 80000000, 00000340, 00000000, 06f31308, 00007fac, 00000000, 00000000, 00000000, 00000000, 00000000, 00000000, 00000000, 00000000, 00000000, 00000000, 00000000, 00000000, 00000000, 00000000; 00000001 ; 3394/3618>

Check an output of a formatted “fw tab -t connecitons -u -f” and also find BITS field. The formatted output has BITS1 and BITS2 merged.

04:18:08 5 N/A 5 10.95.202.22 > N/A LogId: <max_null>; ContextNum: <max_null>; OriginSicName: <max_null>; : -----------------------------------(+); Direction: 0; Source: 172.16.61.66; SPort: 18399; Dest: 172.16.61.2; DPort: 179; Protocol: tcp; CPTFMT_sep_1: ;; Type: 114689; Rule: 15; Timeout: 11; Handler: 0; Ifncin: 84; Ifncout: -1; Ifnsin: -1; Ifnsout: -1; Bits: 02008000000f9280; Expires: 3394/3618; LastUpdateTime: 30Sep2025 04:18:08; ProductName: VPN-1 & FireWall-1; ProductFamily: Network;

The sk65133 mentions: “ClusterXL Sticky Decision Function (SDF) uses the last 17 bits of the first field as a hash that maps to the cluster member that handles the connection.” In fact, it is not accurate for R81.20.

$FWDIR/lib/fwconn.h defines CCL_ID as starting from the 15th bit from the left (after the 11th bit from the right) and being 6 bits long.

#define FWHA_CCL_ID                             11
#define FWHA_CCL_ID_START                       15
#define FWHA_CCL_ID_NUM_BITS                    6

The definition allows us to make a table of CCL_IDs for Maestro Dual-Site environment.

1_01 |  000000100000000 000000 00000000000  | 2000000
1_02 |  000000100000000 000001 00000000000  | 2000800
1_03 |  000000100000000 000010 00000000000  | 2001000
1_04 |  000000100000000 000011 00000000000  | 2001800
1_05 |  000000100000000 000100 00000000000  | 2002000
1_06 |  000000100000000 000101 00000000000  | 2002800
1_07 |  000000100000000 000110 00000000000  | 2003000
1_08 |  000000100000000 000111 00000000000  | 2003800
1_09 |  000000100000000 001000 00000000000  | 2004000
1_10 |  000000100000000 001001 00000000000  | 2004800
1_11 |  000000100000000 001010 00000000000  | 2005000
1_12 |  000000100000000 001011 00000000000  | 2005800
1_13 |  000000100000000 001100 00000000000  | 2006000
1_14 |  000000100000000 001101 00000000000  | 2006800
2_01 |  000000100000000 001110 00000000000  | 2007000
2_02 |  000000100000000 001111 00000000000  | 2007800
2_03 |  000000100000000 010000 00000000000  | 2008000
2_04 |  000000100000000 010001 00000000000  | 2008800
2_05 |  000000100000000 010010 00000000000  | 2009000
2_06 |  000000100000000 010011 00000000000  | 2009800
2_07 |  000000100000000 010100 00000000000  | 200A000
2_08 |  000000100000000 010101 00000000000  | 200A800
2_09 |  000000100000000 010110 00000000000  | 200B000
2_10 |  000000100000000 010111 00000000000  | 200B800
2_11 |  000000100000000 011000 00000000000  | 200C000
2_12 |  000000100000000 011001 00000000000  | 200C800
2_13 |  000000100000000 011010 00000000000  | 200D000
2_14 |  000000100000000 011011 00000000000  | 200D800

With a single look at a connection entry, you can tell which SGM is the Connection Owner. The same is very useful for taking a debug and filter all the lines related to a certain connection owner.

2. Synchronization mask

Synchronization mask is a 32 bit length bit-mask which defines a destination list for synchronization messages related to a certain connection. 1-bits in a sync_mask stand in positions of SGMs in a Maestro Site. Bits goes from Site 1 to Site 2 right to left.

Here is an example of a sync_maks which represents the test connection used for this article. Sync_mask follows the switchover process from Site 1 to Site 2 and back. The Connection Owner of the test connection is SGM 1_02.

Site 1 is ACTIVE and Site 2 is STANDBY | sync_mask=[0x0001800e] | 0000 00000000000110 00000000001110 | Active Sync Set: 1_02; 1_03. Backup Sync Set: 1_04; 2_02; 2_03.
Site 1 is DOWN and Site 2 is ACTIVE    | sync_mask=[0x00038002] | 0000 00000000001110 00000000000010 | Active Sync Set: 2_02; 2_03. Backup Sync Set: 2_04.
Site 1 is STANDBY and Site 2 is ACTIVE | sync_mask=[0x00038006] | 0000 00000000001110 00000000000110 | Active Sync Set: 2_02; 2_03. Backup Sync Set: 2_04; 1_02; 1_03.
Site 1 is ACTIVE and Site 2 is DOWN    | sync_mask=[0x0000000e] | 0000 00000000000000 00000000001110 | Active Sync Set: 1_02; 1_03. Backup Sync Set: 1_04.
Site 1 is ACTIVE and Site 2 is STANDBY | sync_mask=[0x0001800e] | 0000 00000000000110 00000000001110 | Active Sync Set: 1_02; 1_03. Backup Sync Set: 1_04; 2_02; 2_03.

You can note that the Connection Owner is always present in the sync_mask, even for the moment when the Site is in DOWN state. This happens because a Connection Owner keeps track of a sync_mask and update all SGMs with the correct sync_mask. A sync_mask is distributed by a Connection Owner via SET synchronization messages.

Here is an example.

Site 1 is ACTIVE and Site 2 is STANDBY. 1_02 is the Connection Owner and updates all the SGMs with the sync_mask.

[vs_0];[tid_7];[fw4_7];SET: connections (8158) keys(6)=<0,ac103d02,77d3,ac103d42,b3,6> vals(34)=<1c001,40044080,f,b,0,68c7d7a2,0,7c718362,c36bddf7,ffffffff,ffffffff,4e,ffffffff,2000800,f9280,80000000,340,0,cec4bb88,7fba,0,0,0,0,0,0,0,0,0,0,0,0,0,0> sync_mask=[0x0001800e] refresh time=3608 aggressive time=509 ttl=0 sync time=1 kmask=0 op_code=0 no n6addrs;

Site 1 is DOWN and Site 2 is ACTIVE.

[vs_0];[tid_7];[fw4_7];SET: connections (8158) keys(6)=<0,ac103d02,77d3,ac103d42,b3,6> vals(34)=<1c001,40044080,f,b,0,68c7d7a2,0,7c718362,c36bddf7,ffffffff,ffffffff,4e,ffffffff,2000800,f9280,80000000,340,0,cec4bb88,7fba,0,0,0,0,0,0,0,0,0,0,0,0,0,0> sync_mask=[0x00038002] refresh time=3607 aggressive time=517 ttl=0 sync time=1 kmask=0 op_code=0 no n6addrs;

Site 1 is STANDBY and Site 2 is ACTIVE

[vs_0];[tid_7];[fw4_7];SET: connections (8158) keys(6)=<0,ac103d02,77d3,ac103d42,b3,6> vals(34)=<1c001,40044080,f,b,0,68c7d7a2,0,7c718362,c36bddf7,ffffffff,ffffffff,4e,ffffffff,2000800,f9280,80000000,340,0,cec4bb88,7fba,0,0,0,0,0,0,0,0,0,0,0,0,0,0> sync_mask=[0x00038006] refresh time=3617 aggressive time=527 ttl=0 sync time=1 kmask=0 op_code=0 no n6addrs;

Site 1 is ACTIVE and Site 2 is DOWN

[vs_0];[tid_7];[fw4_7];SET: connections (8158) keys(6)=<0,ac103d02,77d3,ac103d42,b3,6> vals(34)=<1c001,40044080,f,b,0,68c7d7a2,0,7c718362,c36bddf7,ffffffff,ffffffff,4e,ffffffff,2000800,f9280,80000000,340,0,cec4bb88,7fba,0,0,0,0,0,0,0,0,0,0,0,0,0,0> sync_mask=[0x0000000e] refresh time=3613 aggressive time=522 ttl=0 sync time=1 kmask=0 op_code=0 no n6addrs;

Site 1 is ACTIVE and Site 2 is STANDBY

[vs_0];[tid_7];[fw4_7];SET: connections (8158) keys(6)=<0,ac103d02,77d3,ac103d42,b3,6> vals(34)=<1c001,40044080,f,b,0,68c7d7a2,0,7c718362,c36bddf7,ffffffff,ffffffff,4e,ffffffff,2000800,f9280,80000000,340,0,cec4bb88,7fba,0,0,0,0,0,0,0,0,0,0,0,0,0,0> sync_mask=[0x0001800e] refresh time=3613 aggressive time=522 ttl=0 sync time=1 kmask=0 op_code=0 no n6addrs;

In case if an update is sent by non-Owner SGM the SET will have sync_mask=[00000000].

SET: connections (8158) keys(6)=<0,ac103d42,b3,ac103d02,77d3,6> vals(34)=<1c001,40044080,f,b,0,68c7d7a2,0,7c718362,c36bddf7,ffffffff,ffffffff,4e,4c,2000800,f9280,80000000,340,0,b700ba48,7fbb,0,0,0,0,0,0,0,0,0,0,0,0,0,0> sync_mask=[00000000] refresh time=3613 aggressive time=522 ttl=0 sync time=2 kmask=0 op_code=40 no n6addrs;

3. Synchronization process

The article only describes connection synchronization during a switchover event and for connections accelerated by SecureXL.

Step 1 – Connection REFRESH messages are sent every cphwd_conn_refresh_interval seconds. The REFRESH messages are sent by all SGMs which have an entry in fwaccel conns for a connection.

Please, note the following from the picture below:

Connection Owner
Entries in the Connections table and SecureXL table
Sync Set
Sync Buffer
Sync Mask

Ste 2 – an Administrator issues “asg chassis_admin -c 1 down” command. Site 1 becomes DOWN and Site 2 becomes ACTIVE.

You can see that the connection entry is deleted based on sync_mask=[ 0x0000000c] which excludes the Connection Owner and includes all the SGMs (included in the sync set) at the DOWN site. At the same time the connection entry is distributed based on the new sync_mask=[0x00038002]. SET, SLINKS and REFRESH are sent from sync buffer via Flush and Ack. DELETE messages don’t use Flush and Ack.

Step 3 - an Administrator issues “asg chassis_admin -c 1 up” commands. Site 1 becomes STANDBY and Site 2 stays ACTIVE.

Two pictures are used to describe the following events.

Site 2 sees that the Site 1 has changed the role from DOWN to STANDBY. Connection entries should be added to Backup Sync Set on the Site 1. However, Site 2 doesn’t have the Connections Owner to decide the sync_mask by itself. Because of this inconvenience backup sync set cannot be updated. Site 2 SGMs send synchronization messages according to previous sync_mask=[0x00038002]. Also, there is no dispatcher correction after the switchover. Both 2_02 and 2_03 have an entry in SecureXL table.

1_02 updates the sync_mask=[0x00038006]. This update is distributed and the connection is SET again on all the SGMs included in the sync_mask.

Step 4 – The switchover is completed. The SGMs 2_02 and 2_03 send REFREESH messages every 512 seconds (default value for cphwd_conn_refresh_interval).

4. Connection expiration

The fact that a Connection Owner doesn’t follow a switchover isn’t immediately worrisome. No switchover happens without any connection drop. Some connections are dropped and re-established right at the moment of a switchover. It is expected by most businesses. Maintenance window is usually scheduled with expected service degradation during a switchover time.

However, there is a set of problems caused by connection drop one hour or two hours after a switchover. The problem is valid because it usually happens outside of maintenance window and it is not expected by a customer. The root cause of such connection drops is connection expiration by session timeout at some backup SGM. When a connection is expired, it is deleted from all the other SGMs.

With a Connection Owner staying on a newly STANDBY site, it will have to send SET and DELETE messages described on the diagrams above via inter-site Sync. Such non-optimal topology introduces additional delay.

Loss of the SET messages is critical because those contain sync_mask. Outdated sync_mask will not allow the connections to be properly synced. The connections will be expired by session timeout on a Backup SGMs.
Loss of the DELETE messages is critical because not-deleted connections will not be included in the new sync_mask. The connections will expire by session timeout.

Both of the situations above cause the “unexpected” connection drop long after a switchover.

4.1 Mostly overlooked problem

In case if the most connections are accelerated by SecureXL (with is highly desired situation), SecureXL is responsible for connections session timeout update. By default, the REFRESH is sent every 512 seconds after the first packet is received by an SGM and an entry is created in SecureXL table. In business-as-usual case the connections are created randomly in SecureXL table, hance, REFRESHEs are sent in random quantities.

In case a switchover, all the packets are “first received” at a new ACTIVE site all at once or nearly so. This leads to a massive spike of synchronization messages after a switchover and at the moment of a switchover. If we have big numbers of connections to REFRESH, we will not have enough sync buffer every 512 seconds. As a result, some messages are not sent or lost. “Lost Updates” counter is increasing along with “Sent reject notifications” and “Received reject notifications” in output of cphaprob syncstat.

the_rock · ‎2025-10-05

Wow...amazing!!

Best,
Andy
"Have a great day and if its not, change it"

Dario_Perez · ‎2025-10-08

Thanks for full explanation

Henrik_Noerr1 · ‎2025-10-09

Wow - what a great article.

I have many questions, but I can only wonder what let you down this very thorough and surely time consuming path of troubleshooting. The time spent and environment needed to come to these conclusions are well... Impressive.

This at the same time worries me, wondering if our Operation and Design organisation for that matter can handle the switch to Maestro without massive up-skilling.

Thank you,

Henrik

Alex- · ‎2025-10-09

We switched a large VSX environment to a large Maestro + VSX environment. 😀

A lot of work, much had to be done and (re-)learned, but at one point you just need to jump in.

The community here was of tremendous help.

Gennady · ‎2025-10-09

Hello!

I highly encourage you to ask your questions! Answering questions from our customer allowed me to create the article.

We have a 4-year-old problem about out_of_state drops 1 hour after switchover. The article was written to organize my thoughts on the latest debug. At the moment I assume that the root cause of our issue is loss of SET update which leads to incorrect sync_mask on a new active SGM -> bad REFRESH distribution and connection entry expiration on a Backup SGM.

I would like to push RnD to a hotfix to make connection owner to follow a switchover process and always reside on an Active Site. It is much less probability to lose a sync update when it comes from some SGM via backplain rather than from an SGM on another Site via inter-site sync. RnD takes time to revise the same debugs which were used to write the text above...

The next article about Flush and Ack is on its way, it is also based on a problem investigation on the same Lab.