Hello,
Recently we did upgrade cluster of open servers (HP) from Gaia R77.30 to R80.40 (with cpuse clean install).
Before upgrade open servers used current settings:
- on-board network interfaces
- license for 4 cpu cores
- CoreXL settings: 2/2 (2x SND / 2x FW Instance)
- blade: FW only
Before upgrade affinity was used:
[Expert@sg1:0]# fw ctl affinity -l -r -v -a CPU 0: eth0 (irq 91) eth3 (irq 107) CPU 1: eth1 (irq 123) eth2 (irq 131) CPU 2: fw_1 CPU 3: fw_0 CPU 4: CPU 5: CPU 6: CPU 7: CPU 8: CPU 9: CPU 10: CPU 11: CPU 12: CPU 13: CPU 14: CPU 15: All: rtmd mpdaemon fwd cpd cprid |
After upgrade to R80.40 we noticed that only one SND cpu core is utilized:
[Expert@sg1:0]# fw ctl affinity -l -r -v -a CPU 0: eth0 (irq 96) eth1 (irq 111) eth2 (irq 101) eth3 (irq 106) CPU 1: CPU 2: fw_1 in.asessiond rtmd mpdaemon fwd lpd cprid cpd CPU 3: fw_0 in.asessiond rtmd mpdaemon fwd lpd cprid cpd CPU 4: CPU 5: CPU 6: CPU 7: CPU 8: CPU 9: CPU 10: CPU 11: CPU 12: CPU 13: CPU 14: CPU 15: All: The current license permits the use of CPUs 0, 1, 2, 3 only. |
Cpview showed second SND cpu core state as OTHER.
We noticed also repeting cpu cpikes related to used SND cpu core in /var/log/messages:
Feb 19 09:40:48 2021 sg1 kernel: [fw4_1];CLUS-120200-1: Starting CUL mode because CPU-00 usage (81%) on the local member increased above the configured threshold (80%). Feb 19 09:40:58 2021 sg1 kernel: [fw4_1];CLUS-120202-1: Stopping CUL mode after 10 sec (short CUL timeout), because no member reported CPU usage above the configured threshold (80%) during the last 10 sec. Feb 19 09:45:00 2021 sg1 kernel: [fw4_1];CLUS-120200-1: Starting CUL mode because CPU-00 usage (86%) on the local member increased above the configured threshold (80%). Feb 19 09:45:10 2021 sg1 kernel: [fw4_1];CLUS-120202-1: Stopping CUL mode after 10 sec (short CUL timeout), because no member reported CPU usage above the configured threshold (80%) during the last 10 sec. Feb 19 09:46:56 2021 sg1 kernel: [fw4_1];CLUS-120200-1: Starting CUL mode because CPU-00 usage (82%) on the local member increased above the configured threshold (80%). Feb 19 09:47:06 2021 sg1 kernel: [fw4_1];CLUS-120202-1: Stopping CUL mode after 10 sec (short CUL timeout), because no member reported CPU usage above the configured threshold (80%) during the last 10 sec. |
We tried to force automatic affinitity for interfaces by:
- reboot both SG
- perform: sim affinity -a command
- push more traffic from different src/dst
but with no luck to push both SND start to work.
As these open servers using on-board network cards then Multi Queue is not supported.
As we didn't want to set affinities manually we opened TAC tikcet to investigate why automatic sim affinity isn't working.
Finally we received information from TAC:
I have checked internally and found that the feature for 'sim' to have an automatic affinity, where the interface affinity would be checked and adjusted accordingly every minute as per the load is not available on R80.40. sk98737 is also under process to be updated accordingly. From the activity, the sim affinity balancing on the interface will not be achieved as it's deprecated, We will have to set the affinity of interfaces manually, please try using: fw ctl affinity -s -i <Interface Name> <CPU ID> |
Can anyone confirm above that automatic sim affinity is deprecated starting from R80.40?
Best Regards
Daniel Szydelko.