Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Collaborator

R80.40 High CPU usage on snd cores

Hello,

we are on R80.40 Take 67 using VSX VSLS.

We noticed that our SND Cores are always between 80 and 100 %. This didn't even changed when we added some cores to MultiQueue. We changed MutiQueue to 8 Queues.

At the moment there are 2 Cores at >80%. Is this a normal behaviour? We didn't seem to have had this issue with R80.30.

Best regards,

Jan

 

 

%Cpu0 : 0.0 us, 2.3 sy, 0.0 ni, 88.7 id, 0.0 wa, 0.7 hi, 8.3 si, 0.0 st
%Cpu1 : 0.0 us, 2.5 sy, 0.0 ni, 17.2 id, 0.0 wa, 0.4 hi, 80.0 si, 0.0 st
%Cpu2 : 0.0 us, 5.4 sy, 0.0 ni, 12.8 id, 0.0 wa, 0.4 hi, 81.4 si, 0.0 st
%Cpu3 : 0.0 us, 0.3 sy, 0.0 ni, 93.3 id, 0.0 wa, 0.3 hi, 6.0 si, 0.0 st
%Cpu4 : 10.1 us, 4.0 sy, 0.0 ni, 85.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 18.5 us, 4.0 sy, 0.0 ni, 76.8 id, 0.0 wa, 0.3 hi, 0.3 si, 0.0 st
%Cpu6 : 11.0 us, 5.0 sy, 0.0 ni, 83.7 id, 0.0 wa, 0.3 hi, 0.0 si, 0.0 st
%Cpu7 : 14.8 us, 4.9 sy, 0.0 ni, 79.9 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu8 : 20.7 us, 5.0 sy, 0.0 ni, 73.6 id, 0.0 wa, 0.3 hi, 0.3 si, 0.0 st
%Cpu9 : 18.2 us, 4.7 sy, 0.0 ni, 76.4 id, 0.0 wa, 0.3 hi, 0.3 si, 0.0 st
%Cpu10 : 11.0 us, 6.3 sy, 0.0 ni, 82.0 id, 0.0 wa, 0.3 hi, 0.3 si, 0.0 st
%Cpu11 : 16.4 us, 5.4 sy, 0.0 ni, 77.5 id, 0.0 wa, 0.3 hi, 0.3 si, 0.0 st
%Cpu12 : 14.9 us, 5.3 sy, 0.0 ni, 78.9 id, 0.0 wa, 0.3 hi, 0.7 si, 0.0 st
%Cpu13 : 13.7 us, 5.3 sy, 0.0 ni, 80.7 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu14 : 19.3 us, 6.6 sy, 0.0 ni, 73.1 id, 0.0 wa, 0.3 hi, 0.7 si, 0.0 st
%Cpu15 : 8.7 us, 5.0 sy, 0.0 ni, 86.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu16 : 0.0 us, 0.3 sy, 0.0 ni, 92.7 id, 0.0 wa, 0.3 hi, 6.6 si, 0.0 st
%Cpu17 : 0.0 us, 0.3 sy, 0.0 ni, 91.0 id, 0.0 wa, 0.3 hi, 8.4 si, 0.0 st
%Cpu18 : 0.0 us, 0.0 sy, 0.0 ni, 81.9 id, 0.0 wa, 0.7 hi, 17.4 si, 0.0 st
%Cpu19 : 0.0 us, 0.3 sy, 0.0 ni, 93.4 id, 0.0 wa, 0.3 hi, 6.0 si, 0.0 st
%Cpu20 : 7.9 us, 5.0 sy, 0.0 ni, 86.8 id, 0.0 wa, 0.3 hi, 0.0 si, 0.0 st
%Cpu21 : 19.3 us, 3.7 sy, 0.0 ni, 76.7 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu22 : 7.6 us, 4.3 sy, 0.0 ni, 87.7 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu23 : 5.0 us, 3.0 sy, 0.0 ni, 92.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu24 : 12.6 us, 1.3 sy, 0.0 ni, 85.4 id, 0.0 wa, 0.3 hi, 0.3 si, 0.0 st
%Cpu25 : 4.3 us, 2.7 sy, 0.0 ni, 93.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu26 : 11.0 us, 2.0 sy, 0.0 ni, 86.6 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu27 : 9.0 us, 1.7 sy, 0.0 ni, 89.0 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu28 : 26.2 us, 4.6 sy, 0.0 ni, 66.9 id, 0.0 wa, 0.3 hi, 2.0 si, 0.0 st
%Cpu29 : 16.7 us, 2.0 sy, 0.0 ni, 80.6 id, 0.0 wa, 0.3 hi, 0.3 si, 0.0 st
%Cpu30 : 12.1 us, 3.6 sy, 0.0 ni, 83.0 id, 0.0 wa, 0.3 hi, 1.0 si, 0.0 st
%Cpu31 : 26.9 us, 3.7 sy, 0.0 ni, 67.8 id, 0.0 wa, 0.7 hi, 1.0 si, 0.0 st

 

[Expert@fw-inet-01:0]# mq_mng -o -vv
Total 32 cores. Multiqueue 8 cores: 0,16,1,17,2,18,3,19
i/f type state mode cores
------------------------------------------------------------------------------------------------
Mgmt igb Up Auto (2/2)* 0(58),0(182)
eth1-01 ixgbe Up Manual (8/8) 0(70),1(71),2(74),3(75),16(76)
,17(77),18(78),19(79)
eth1-02 ixgbe Up Manual (8/8) 0(81),1(82),2(83),3(89),16(90)
,17(91),18(92),19(93)
eth3-01 ixgbe Up Manual (8/8) 0(65),1(66),2(95),3(96),16(97)
,17(98),18(99),19(100)
eth3-02 ixgbe Up Manual (8/8) 0(67),1(68),2(102),3(103),16(1
04),17(105),18(112),19(113)
* Management interface
------------------------------------------------------------------------------------------------
Mgmt <igb> max 2 cur 2
07:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03)
------------------------------------------------------------------------------------------------
eth1-01 <ixgbe> max 16 cur 8
87:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
------------------------------------------------------------------------------------------------
eth1-02 <ixgbe> max 16 cur 8
87:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
------------------------------------------------------------------------------------------------
eth3-01 <ixgbe> max 16 cur 8
04:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
------------------------------------------------------------------------------------------------
eth3-02 <ixgbe> max 16 cur 8
04:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)

core interfaces queue irq rx packets tx packets
------------------------------------------------------------------------------------------------
0 eth3-02 eth3-02-TxRx-0 67 5066 1743166
eth3-01 eth3-01-TxRx-0 65 3103828 5212580
eth1-01 eth1-01-TxRx-0 70 4297313 4704791
eth1-02 eth1-02-TxRx-0 81 5924371 3096900
Mgmt Mgmt-TxRx-0 58 120897 260792
Mgmt-TxRx-1 182 24383 29745
1 eth3-02 eth3-02-TxRx-1 68 33 1595748
eth3-01 eth3-01-TxRx-1 66 370712775 370857057
eth1-01 eth1-01-TxRx-1 71 3439626 7456325
eth1-02 eth1-02-TxRx-1 82 6387267 1973380
2 eth3-02 eth3-02-TxRx-2 102 0 1022588
eth3-01 eth3-01-TxRx-2 95 363411661 363079854
eth1-01 eth1-01-TxRx-2 74 3832140 4880679
eth1-02 eth1-02-TxRx-2 83 4962696 2656322
3 eth3-02 eth3-02-TxRx-3 103 2038 1322402
eth3-01 eth3-01-TxRx-3 96 2782309 4946975
eth1-01 eth1-01-TxRx-3 75 3354513 4134583
eth1-02 eth1-02-TxRx-3 89 4446265 1362053
16 eth3-02 eth3-02-TxRx-4 104 2024 1312522
eth3-01 eth3-01-TxRx-4 97 4133164 4228302
eth1-01 eth1-01-TxRx-4 76 3947081 6611134
eth1-02 eth1-02-TxRx-4 90 5499293 1598963
17 eth3-02 eth3-02-TxRx-5 105 0 1653859
eth3-01 eth3-01-TxRx-5 98 3609174 5663060
eth1-01 eth1-01-TxRx-5 77 4522628 7028742
eth1-02 eth1-02-TxRx-5 91 5599895 1303251
18 eth3-02 eth3-02-TxRx-6 112 19 1588342
eth3-01 eth3-01-TxRx-6 99 22066519 23223659
eth1-01 eth1-01-TxRx-6 78 4633862 4358520
eth1-02 eth1-02-TxRx-6 92 6150507 1791513
19 eth3-02 eth3-02-TxRx-7 113 4519 2089422
eth3-01 eth3-01-TxRx-7 100 4126068 4193200
eth1-01 eth1-01-TxRx-7 79 4193682 4987312
eth1-02 eth1-02-TxRx-7 93 5728181 1414670

 

[Expert@fw-inet-01:0]# fw ctl affinity -l
VS_0 fwk: CPU 4 5 6 7 8 9 10 11 12 13 14 15 20 21 22 23 24 25 26 27 28 29 30 31
VS_1 fwk: CPU 4 5 6 7 8 9 10 11 12 13 14 15 20 21 22 23 24 25 26 27 28 29 30 31
VS_2 fwk: CPU 4 5 6 7 8 9 10 11 12 13 14 15 20 21 22 23 24 25 26 27 28 29 30 31
VS_3 fwk: CPU 4 5 6 7 8 9 10 11 12 13 14 15 20 21 22 23 24 25 26 27 28 29 30 31
VS_4 fwk: CPU 4 5 6 7 8 9 10 11 12 13 14 15 20 21 22 23 24 25 26 27 28 29 30 31
VS_5 fwk: CPU 4 5 6 7 8 9 10 11 12 13 14 15 20 21 22 23 24 25 26 27 28 29 30 31
VS_6 fwk: CPU 4 5 6 7 8 9 10 11 12 13 14 15 20 21 22 23 24 25 26 27 28 29 30 31
VS_7 fwk: CPU 4 5 6 7 8 9 10 11 12 13 14 15 20 21 22 23 24 25 26 27 28 29 30 31
VS_8 fwk: CPU 4 5 6 7 8 9 10 11 12 13 14 15 20 21 22 23 24 25 26 27 28 29 30 31
VS_9 fwk: CPU 4 5 6 7 8 9 10 11 12 13 14 15 20 21 22 23 24 25 26 27 28 29 30 31
VS_22 fwk: CPU 4 5 6 7 8 9 10 11 12 13 14 15 20 21 22 23 24 25 26 27 28 29 30 31
VS_24 fwk: CPU 4 5 6 7 8 9 10 11 12 13 14 15 20 21 22 23 24 25 26 27 28 29 30 31
Interface Mgmt: has multi queue enabled
Interface Sync: has multi queue enabled
Interface eth3-01: has multi queue enabled
Interface eth3-02: has multi queue enabled
Interface eth1-01: has multi queue enabled
Interface eth1-02: has multi queue enabled

 

 

 

 

 

0 Kudos
Reply
12 Replies
Employee+
Employee+

Hi @Jan_Kleinhans ,

 

according to your outputs we see that eth3-01 have major traffic:

 

eth3-01 eth3-01-TxRx-1 66 370712775 370857057

eth3-01 eth3-01-TxRx-2 95 363411661 363079854

 

eth3-01 ixgbe Up Manual (8/8) 0(65),1(66),2(95),3(96),16(97)
,17(98),18(99),19(100)

so from the above it looks CPU1 and CPU2 working hard.

can you tell which traffic is running on eth3-01? is it different traffic then it was in R80.30?

 

Thanks,

Ilya 

0 Kudos
Reply
Collaborator

Hello,

traffic is the same as it was before. eth3-01 is part of a LACP bond with eth1-01.

 

Thanks,

Jan

0 Kudos
Reply
Employee+
Employee+

can you share "cat /proc/net/bonding/<bond-name>" and from clish "show bonding groups"?

0 Kudos
Reply
Collaborator

@Ilya_Yusupov 

 

[Expert@fw-inet-02:0]# cat /proc/net/bonding/bond1
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 200
Down Delay (ms): 200

802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: 00:1c:7f:66:69:d8
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 2
Actor Key: 15
Partner Key: 32773
Partner Mac Address: 00:23:04:ee:be:02

Slave Interface: eth1-01
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: 00:1c:7f:66:69:d8
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: 00:1c:7f:66:69:d8
port key: 15
port priority: 255
port number: 1
port state: 61
details partner lacp pdu:
system priority: 32667
system mac address: 00:23:04:ee:be:02
oper key: 32773
port priority: 32768
port number: 261
port state: 61

Slave Interface: eth3-01
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 4
Permanent HW addr: 00:1c:7f:66:42:1f
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: 00:1c:7f:66:69:d8
port key: 15
port priority: 255
port number: 2
port state: 61
details partner lacp pdu:
system priority: 32667
system mac address: 00:23:04:ee:be:02
oper key: 32773
port priority: 32768
port number: 16645
port state: 61
[Expert@fw-inet-02:0]#

 

[Expert@fw-inet-02:0]# cat /proc/net/bonding/bond2
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 200
Down Delay (ms): 200

802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: 00:1c:7f:66:69:d9
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 2
Actor Key: 15
Partner Key: 32782
Partner Mac Address: 00:23:04:ee:be:02

Slave Interface: eth1-02
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: 00:1c:7f:66:69:d9
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: 00:1c:7f:66:69:d9
port key: 15
port priority: 255
port number: 1
port state: 61
details partner lacp pdu:
system priority: 32667
system mac address: 00:23:04:ee:be:02
oper key: 32782
port priority: 32768
port number: 270
port state: 61

Slave Interface: eth3-02
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 3
Permanent HW addr: 00:1c:7f:66:42:20
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: 00:1c:7f:66:69:d9
port key: 15
port priority: 255
port number: 2
port state: 61
details partner lacp pdu:
system priority: 32667
system mac address: 00:23:04:ee:be:02
oper key: 32782
port priority: 32768
port number: 16654
port state: 61

0 Kudos
Reply
Champion
Champion

Is your LACP bond supposed to be load sharing between eth3-01 and eth1-01?  It isn't, looks like pretty much everything is coming through eth3-01.  Any Remote Access VPNs in use?  If so see here: sk165853: High CPU usage on one CPU core when the number of Remote Access users is high

Beyond that you may have some heavy/elephant flows stuck to queues eth3-01-TxRx-1 and eth3-01-TxRx-2.

Gaia 3.10 Immersion Self-paced Video Series
now available at http://www.maxpowerfirewalls.com
0 Kudos
Reply
Collaborator

Hello @Timothy_Hall 

here is the configuration of the bonds on Checkpoint and Cisco Nexus side:

 

add bonding group 1
add bonding group 2
add bonding group 1 interface eth1-01
add bonding group 1 interface eth3-01
add bonding group 2 interface eth1-02
add bonding group 2 interface eth3-02
set bonding group 1 mode 8023AD
set bonding group 1 lacp-rate slow
set bonding group 1 xmit-hash-policy layer3+4
set bonding group 2 mode 8023AD
set bonding group 2 lacp-rate slow
set bonding group 2 xmit-hash-policy layer3+4

interface Ethernet1/5
description fw-inet-01 eth1-01
switchport mode trunk
switchport trunk allowed vlan 1,640-641,850,904-905,907,909,913-915,918-919,95
0-952,954-955,959-960,963-964,983,991-992
spanning-tree port type edge trunk
channel-group 5 mode active
no shutdown

interface port-channel5
speed 10000
description fw-inet-01 eth1-01
switchport mode trunk
switchport trunk allowed vlan 1,640-641,850,904-905,907,909,913-915,918-919,95
0-952,954-955,959-960,963-964,983,991-992
spanning-tree port type edge trunk
vpc 5

 

We added the kernel parameter cphwd_medium_path_qid_by_mspi 0 but this doesn't change anything.

As we have upgraded from R80.30, should we enable prioq which is disabled at the moment? 

Regards,

 

Jan

0 Kudos
Reply

Hi @Jan_Kleinhans 

As described by @Ilya_Yusupov , you can see that the mq cpu's are very busy. Ixgbe network card drivers support max 16 queues. You use 8 queues. Because your CoreXL instances are not fully utilized, so you can use more cores for MQ. I would use a 12/20 (first step) or 16/16 (second step) core distribution here.

0 Kudos
Reply
Employee+
Employee+

I agree with @HeikoAnkenbrand  that you can add more cores to MQ but also i have suspicious as @Timothy_Hall mention that Bond LS is not working well.

0 Kudos
Reply
Collaborator

We changed mulitqueue to 12 cores. Now the 2 High CPUs are at 70%, the other ones stay at < 20%.

LACP is configured as src-ip-dst on the Cisco Nexus switches and Layer3+4 on Checkpoint.

 

Regards,

Jan

0 Kudos
Reply
Collaborator

Hello @HeikoAnkenbrand,

thanks for your response. But why are only 2 MQ-Cores at high CPU. The other ones are <20%.

 

0 Kudos
Reply
Employee+
Employee+

@Jan_Kleinhans ,

 

i will contact you offline for further investigation.

Advisor

Hello, was there ever a fix for this issue? I'm facing a situation now where a R80.40 Take 89 VSX is having 1 SND always around 100% and others around 10%.

0 Kudos
Reply