I had 2 clusters both managed by the same gateway. 2 gateways each.
I've since removed the 2nd cluster and readded those gateways to my first cluster. Still the same management gateway.
Re-established SIC after renaming them. Pushed policy, etc. All seemed to go well.
I can't be sure this was the cause but it appears ever since I've been getting a flapping cluster and upon inspection, i see this in /var/log/messages:
from /var/log/messages:
Jan 6 14:47:01 2023 cp-fw4-site2 xpand[11083]: Configuration changed from localhost by user admin by the service dbset
Jan 6 14:47:01 2023 cp-fw4-site2 last message repeated 6 times
Jan 6 14:47:12 2023 cp-fw4-site2 fwk: CLUS-120001-1: Cluster policy installation started (old/new Policy ID: 913558753/716459846)
Jan 6 14:47:13 2023 cp-fw4-site2 xpand[11083]: admin localhost t +volatile:clish:admin:6141 t
Jan 6 14:47:13 2023 cp-fw4-site2 clish[6141]: User admin running clish -c with ReadWrite permission
Jan 6 14:47:13 2023 cp-fw4-site2 clish[6141]: cmd by admin: Start executing : ver (cmd md5: 0812f14f43315611dd0ef462515c9d00)
Jan 6 14:47:13 2023 cp-fw4-site2 clish[6141]: cmd by admin: Processing : ver (cmd md5: 0812f14f43315611dd0ef462515c9d00)
Jan 6 14:47:13 2023 cp-fw4-site2 xpand[11083]: admin localhost t -volatile:clish:admin:6141
Jan 6 14:47:13 2023 cp-fw4-site2 clish[6141]: User admin finished running clish -c from CLI shell
Jan 6 14:47:14 2023 cp-fw4-site2 spike_detective: spike info: type: thread, thread id: 3263, thread name: fw_full, start time: 06/01/23 14:47:07, spike duration (sec): 6, initial cpu usage: 79, average cpu usage: 79, perf taken: 0
Jan 6 14:47:15 2023 cp-fw4-site2 fwk: CLUS-120005-4: Cluster policy installation finished - no change was done (Type-2)
Jan 6 14:47:15 2023 cp-fw4-site2 fwk: CLUS-120125-4: CCP Encryption turned ON
Jan 6 14:47:19 2023 cp-fw4-site2 xpand[11083]: Configuration changed from localhost by user admin by the service dbset
Jan 6 14:47:22 2023 cp-fw4-site2 kernel: fw_full (3202) used greatest stack depth: 12040 bytes left
Jan 6 14:47:28 2023 cp-fw4-site2 xpand[11083]: Configuration changed from localhost by user admin by the service dbset
Jan 6 14:47:28 2023 cp-fw4-site2 last message repeated 6 times
Jan 6 14:47:50 2023 cp-fw4-site2 kernel: fwk0_0[11551]: segfault at 7fb0e86d5cb8 ip 00007fb0a90f972e sp 00007fb058f3c1d0 error 6 in libfw_kern_64_us_0.so[7fb0a8188000+1cbf000]
Jan 6 14:47:54 2023 cp-fw4-site2 spike_detective: spike info: type: cpu, cpu core: 1, top consumer: fw_full, start time: 06/01/23 14:46:50, spike duration (sec): 63, initial cpu usage: 99, average cpu usage: 96, perf taken: 0
Jan 6 14:47:54 2023 cp-fw4-site2 spike_detective: spike info: type: thread, thread id: 6094, thread name: fw_full, start time: 06/01/23 14:47:42, spike duration (sec): 11, initial cpu usage: 86, average cpu usage: 85, perf taken: 0
Jan 6 14:47:57 2023 cp-fw4-site2 kernel: [fw4_0];fwzeco_vm_ops_shinfo_fault: fwk0_0, bad kernel_address (user_address 0x7fb05f118000, vm_start 0x7fb05f118000)
Jan 6 14:47:57 2023 cp-fw4-site2 kernel: [fw4_0];fwzeco_vm_ops_shinfo_fault: fwk0_0, bad kernel_address (user_address 0x7fb05f4d1000, vm_start 0x7fb05f118000)
Jan 6 14:47:57 2023 cp-fw4-site2 kernel: [fw4_0];fwzeco_vm_ops_shinfo_fault: fwk0_0, bad kernel_address (user_address 0x7fb05f4d2000, vm_start 0x7fb05f118000)
Jan 6 14:47:57 2023 cp-fw4-site2 kernel: [fw4_0];fwzeco_vm_ops_shinfo_fault: fwk0_0, bad kernel_address (user_address 0x7fb05f4d3000, vm_start 0x7fb05f118000)
Jan 6 14:47:57 2023 cp-fw4-site2 kernel: [fw4_0];fwzeco_vm_ops_shinfo_fault: fwk0_0, bad kernel_address (user_address 0x7fb05f4d4000, vm_start 0x7fb05f118000)
Jan 6 14:47:57 2023 cp-fw4-site2 kernel: [fw4_0];fwzeco_vm_ops_shinfo_fault: fwk0_0, bad kernel_address (user_address 0x7fb05f4d5000, vm_start 0x7fb05f118000)
Jan 6 14:47:57 2023 cp-fw4-site2 kernel: [fw4_0];fwzeco_vm_ops_shinfo_fault: fwk0_0, bad kernel_address (user_address 0x7fb05f4d6000, vm_start 0x7fb05f118000)
Jan 6 14:47:57 2023 cp-fw4-site2 kernel: [fw4_0];fwzeco_vm_ops_shinfo_fault: fwk0_0, bad kernel_address (user_address 0x7fb05f4d7000, vm_start 0x7fb05f118000)
Jan 6 14:47:57 2023 cp-fw4-site2 kernel: [fw4_0];fwzeco_vm_ops_shinfo_fault: fwk0_0, bad kernel_address (user_address 0x7fb05f4d8000, vm_start 0x7fb05f118000)
Jan 6 14:47:57 2023 cp-fw4-site2 kernel: [fw4_0];fwzeco_vm_ops_shinfo_fault: fwk0_0, bad kernel_address (user_address 0x7fb05f4d9000, vm_start 0x7fb05f118000)
Jan 6 14:47:57 2023 cp-fw4-site2 kernel: [fw4_0];fwzeco_vm_ops_shinfo_fault: fwk0_0, bad kernel_address (user_address 0x7fb05f4da000, vm_start 0x7fb05f118000)
Jan 6 14:47:57 2023 cp-fw4-site2 kernel: [fw4_0];fwzeco_vm_ops_shinfo_fault: fwk0_0, bad kernel_address (user_address 0x7fb05f4db000, vm_start 0x7fb05f118000)
Jan 6 14:47:57 2023 cp-fw4-site2 kernel: [fw4_0];fwzeco_vm_ops_shinfo_fault: fwk0_0, bad kernel_address (user_address 0x7fb05f4dc000, vm_start 0x7fb05f118000)
Jan 6 14:47:57 2023 cp-fw4-site2 kernel: [fw4_0];fwzeco_vm_ops_shinfo_fault: fwk0_0, bad kernel_address (user_address 0x7fb05f4dd000, vm_start 0x7fb05f118000)
Jan 6 14:47:57 2023 cp-fw4-site2 kernel: [fw4_0];fwzeco_vm_ops_shinfo_fault: fwk0_0, bad kernel_address (user_address 0x7fb05f4de000, vm_start 0x7fb05f118000)
Jan 6 14:47:57 2023 cp-fw4-site2 kernel: [fw4_0];fwzeco_vm_ops_shinfo_fault: fwk0_0, bad kernel_address (user_address 0x7fb05f4df000, vm_start 0x7fb05f118000)
Jan 6 14:47:57 2023 cp-fw4-site2 kernel: [fw4_0];fwzeco_vm_ops_shinfo_fault: fwk0_0, bad kernel_address (user_address 0x7fb05f4e0000, vm_start 0x7fb05f118000)
Jan 6 14:47:57 2023 cp-fw4-site2 kernel: [fw4_0];fwzeco_vm_ops_shinfo_fault: fwk0_0, bad kernel_address (user_address 0x7fb05f4e1000, vm_start 0x7fb05f118000)
Jan 6 14:47:57 2023 cp-fw4-site2 kernel: [fw4_0];fwzeco_vm_ops_shinfo_fault: fwk0_0, bad kernel_address (user_address 0x7fb05f4e2000, vm_start 0x7fb05f118000)
Jan 6 14:47:57 2023 cp-fw4-site2 kernel: [fw4_0];fwzeco_vm_ops_shinfo_fault: fwk0_0, bad kernel_address (user_address 0x7fb05f4e3000, vm_start 0x7fb05f118000)
Jan 6 14:47:57 2023 cp-fw4-site2 kernel: [fw4_0];fwzeco_vm_ops_shinfo_fault: fwk0_0, bad kernel_address (user_address 0x7fb05f4e4000, vm_start 0x7fb05f118000)
Jan 6 14:47:57 2023 cp-fw4-site2 kernel: [fw4_0];fwzeco_vm_ops_shinfo_fault: fwk0_0, bad kernel_address (user_address 0x7fb05f4e5000, vm_start 0x7fb05f118000)
Jan 6 14:47:57 2023 cp-fw4-site2 kernel: [fw4_0];fwzeco_vm_ops_shinfo_fault: fwk0_0, bad kernel_address (user_address 0x7fb05f4e6000, vm_start 0x7fb05f118000)
Jan 6 14:47:57 2023 cp-fw4-site2 kernel: [fw4_0];fwzeco_vm_ops_shinfo_fault: fwk0_0, bad kernel_address (user_address 0x7fb05f4e7000, vm_start 0x7fb05f118000)
Jan 6 14:47:57 2023 cp-fw4-site2 kernel: [fw4_0];fwzeco_vm_ops_shinfo_fault: fwk0_0, bad kernel_address (user_address 0x7fb05f4e8000, vm_start 0x7fb05f118000)
Jan 6 14:47:57 2023 cp-fw4-site2 kernel: [fw4_0];fwzeco_vm_ops_shinfo_fault: fwk0_0, bad kernel_address (user_address 0x7fb05f4e9000, vm_start 0x7fb05f118000)
Jan 6 14:47:57 2023 cp-fw4-site2 kernel: [fw4_0];fwzeco_vm_ops_shinfo_fault: fwk0_0, bad kernel_address (user_address 0x7fb05f4ea000, vm_start 0x7fb05f118000)
Jan 6 14:47:57 2023 cp-fw4-site2 kernel: [fw4_0];fwzeco_vm_ops_shinfo_fault: fwk0_0, bad kernel_address (user_address 0x7fb05f4eb000, vm_start 0x7fb05f118000)
So far I've tried disabling/reenabling Content Awareness. Disabling CoreXL.
The more common problem gateway are the 2 that were readded, but the first 2 members are also occasionally experiencing the same segfault.
I see the configuration changed message - this isn't me doing anything. From a different SK I read this is update manager?
Can anyone shed any light on what could be causing this? I've already started a case with TAC but its dragging a bit.