Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Kaspars_Zibarts
Authority
Authority

VSX VR (Virtual Router) high CPU on R80.30 (3.10) T155

It's been a while since I have worked with VRs (probably R67 VSX..) so need to ask some opinions regarding virtual router CPU resources. We just retired our "jumbo jet" 41k chassis in favour of 26k appliance. Actual upgrade went flawlessly despite the complexity: R76SP50 VSX > R80.30, HW and SW upgrade, introducing VR for inter-VS traffic.

But now I'm facing a challenge with CPU consumption on the newly introduced virtual router. It seems to be single threaded and already is chewing nearly 100% CPU:

image.png

 

I couldn't find any means of enabling multithreading on it.

It runs under 10Gbps traffic combined on all interfaces

 

And that's only at 1/3 of regular traffic due to Corona limitations.

Any ideas if there's anything we can do about it?

0 Kudos
5 Replies
Alex-
Advisor

Hi Kaspars,

 

I'm also planning to install a VR following a migration from blade chassis to 26K, so this concerns me a bit. I will follow this thread.

Overall I find VS operations sensibly slower on 3.10 than 2.6 (cpstop/cpstart, vsx_util reconfigure).

I didn't have to use a VR in production yet, can't they be fine-tuned like VS in terms of capacity?

 

Kind regards,

Alex

0 Kudos
Timothy_Hall
Champion
Champion

That does seem odd.  But one of the benefits of moving the Firewall Workers into process space is the ability to look more closely at what they are doing.  The worker could be busy-waiting for something or trying to access some kind of contentious resource over and over.  Try these tools to gain some insight into what the heck that process is so busy doing:

lsof -p 180383 (show all open file descriptors)

top -Hbn1 -p 180383 (check utilization of individual threads of a process)

pstree (to see child processes and relationships)

peekfd (new tool in Gaia 3.10 to monitor process file descriptors read/writes)

strace (monitor all process system calls in real-time, not installed by default so get it here for Gaia 3.10: http://vault.centos.org/3.8/os/x86_64/RedHat/RPMS/strace-4.5.14-0.EL3.1.x86_64.rpm)

 

New 2021 IPS/AV/ABOT Immersion Self-Guided Video Series
now available at http://www.maxpowerfirewalls.com
0 Kudos
Kaspars_Zibarts
Authority
Authority

@Alex- - unfortunately there are no options officially to add more workers to VR. But. Just talked to R&D and we might do just that manually on GW as a workaround whilst investigating actual root cause. And yes - I totally agree, vsx start/stop times are horrendous in 3.10 kernel. But I'm hoping that will improve soon - all seems sequential start that hopefully can get run in parallel

--

Had a long session with R&D -seems like there is a bug when traffic is passed from Node-A > VS1 > VR > VS2 > Node-B, in such case traffic is not accelerated in VR and gets passed onto VR fwk... so we managed to  push it to 100% with approx 10Gbps traffic

Expectation was that traffic would get accelerated of course in VR and not hit fwk

It works correctly for "external" case: Node-A > VS1 > VR > Ext_router > Node-B or vice-versa. In this case traffic is accelerated via VR.

As an alternative workaround, we are considering to re-introduce a virtual switch between two busiest VSes as then acceleration would work.

I'll send an update once I have something useful.

 

0 Kudos
Alex-
Advisor

Many thanks for the update @Kaspars_Zibarts, I will take that into account for the solution.

 

0 Kudos
Kaspars_Zibarts
Authority
Authority

I need to make a correction in my statement about potential issue. R&D says:

"we believe that the problem is that when packet arrive to VR on SND it doesn’t accelerate but rather F2F"

We have a workaround in place for now - we connected two busiest VSes using virtual switch instead.

I'll update once we have a permanent solution. Will write a separate article about 26k performance thoughts 🙂

0 Kudos