R80.10 SMS slowness

Cegeka_Networki · ‎2018-03-04

Hello,

One of our customers is using a R80.10 SMS in Amazon Cloud.

We are experiencing slowness while doing most of the operation tasks like: connecting to Smart Console takes more than a minute, moving between different pages inside Smart Console takes 5-10 seconds, loading a policy to display it takes 5-10 seconds, loading objects in Object Explore takes more than 10 seconds, etc.

The virtual machine has 32 GB of RAM, 8 CPUs and 100 GB disk space. See more in the attached file.

There are 11 gateways managed from that SMS and SMS is logging server as well.

I'm wondering if that slowness can be reduced.

Could you please advise how to approach this slowness investigation and if there are any tips or guide lines to check?

Thank you!

Adrian

Timothy_Hall · ‎2018-03-04

Your processor and memory specs look good, so it is probably an issue with the performance of the disk path. While performing operations in the SmartConsole, run top. Do you see a lot of CPU time expended in "wa"? That means the CPU is blocked waiting for the disk path to respond. You can also look at this historically by just running sar with no options, "wa" will be labelled "iowait" in the sar output.

Also I assume that your 8 CPUs are actually 8 discrete CPUs and not 4 CPUs hyperthreaded as it is not currently recommended to enable hyperthreading on an SMS for performance reasons. May not matter or be relevant in AWS but I thought I'd throw it out there.

--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com

Attend my Gateway Performance Optimization R81.20 course
CET (Europe) Timezone Course Scheduled for July 1-2

Cegeka_Networki · ‎2018-03-04

Thank you Tim for response.

I checked wa values while performing some operations in Smart Console and I noticed that CPU 0 had all the time wa values above zero but not bigger than a few percentages (in my estimation its average was around 1.00 %). Also CPU 1 had from time to time wa values above zero.

I attached the output of "sar" command. Are those values relevant to consider the disk space the issue for that slowness?

Also please advise if there are other things we should check.

Many thanks,

Adrian

Timothy_Hall · ‎2018-03-04

Those numbers on your SMS look fine, not sure why your access is so slow. Most of the heavy lifting is being done on the SMS end, and the SmartConsole is just displaying and manipulating what is sent to it. I assume the network between where the SmartConsole is running and the SMS is good? Try running some continuous big pings to the SMS from the same workstation using SmartConsole while executing some particularly slow operations in the SmartConsole as stated in my book:

A great trick to help you determine whether a particular network path is experiencing
latency or loss is to send extra-large test packets with the ping command, which have a
knack for irritating any underlying network problems thus making them more
pronounced and easier to identify:

Gaia/Linux: ping -s 1400 129.82.102.32
Windows: ping -l 1400 129.82.102.32

Better yet, most Linux-based versions of the ping command also support a flood
option (-f) which instead of sending one echo request per second, will send a flood of
them as fast as it can and note how much loss and/or latency is encountered.

Any packet loss or wild swings in latency? Also please post output of the following run on the SMS:

free -m

netstat -ni

grep -c ^processor /proc/cpuinfo

netstat -s

--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com

Attend my Gateway Performance Optimization R81.20 course
CET (Europe) Timezone Course Scheduled for July 1-2

Cegeka_Networki · ‎2018-03-05

Tim,

I attached the output of those commands and also ping results.

Indeed, there is a considerably distance between the PCs where we run Smart Console (one in UK, Europe and another one in Sydney, Australia) and SMS server that is in USA. There was no dropped ping as can be seen in the attached file.

I will follow with customer to see if there is any server they have in AWS from where to run Smart Console to check if there will be a better experience for Smart Console operation.

Best Regards,

Adrian

Timothy_Hall · ‎2018-03-05

The network latency is a little high (~150ms) but seems to be relatively stable with low jitter. No significant fragmentation and everything looks healthy at the network level. There does seem to be a slightly unusual number of TCP RSTs but I doubt they matter:

251183 connection resets received
173534 resets sent
111605 connections reset due to unexpected data
136180 connections reset due to early user close

I suppose CPMI could be one of those protocols that does not handle relatively high latency very well, and is doing a lot of waiting around for application-level acknowledgements of operations. Very curious to see what happens when you run SmartConsole from inside AWS to the SMS which is also inside AWS, instead of traversing a relatively high latency Internet with the CPMI traffic.

For grins do a tail -f on $FWDIR/log/cpm.elg and $FWDIR/log/fwm.elg on the SMS while attempting some slow SmartConsole operations to see if any interesting error messages or warnings are being barfed into these files at that time.

One last thing to look into *might* be increasing FWASYNC_MAXBUF but unless you are seeing the "fwasync_connbuf_realloc: Connection buffer overflow" warnings mentioned in sk109236: High CPU / process crashes / timeout due to large database / first time operations / load ... arbitrarily increasing this value is not recommended.

--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com

Attend my Gateway Performance Optimization R81.20 course
CET (Europe) Timezone Course Scheduled for July 1-2

Cegeka_Networki · ‎2018-03-06

Thank you for your great support.

Smart Console experience from a server in AWS it's much better than from UK or Sydney.

There was no feeling of delay.

I attached the output of cpm.elg while doing some operational activities in Smart Console.

Please let me know if you see anything relevant in there. Afterwards, from my point of view we can close this treat.

Timothy_Hall · ‎2018-03-06

Nothing unusual going on in cpm.elg either, I'd say your SMS itself is fine. As I theorized earlier CPMI must be one of those protocols whose operations can't be pipelined or parallelized, and as such high network latency will slow it down as data is sent and some kind of application-level acknowledgement must be received before proceeding. I seem to recall the ability to use compressed connections with older copies of the SmartDashboard, but since the issue is not lack of bandwidth in this case I don't see how that would help. Early versions of SMB/CIFS suffered performance issues over high latency networks as well and were mentioned briefly in my book.

I'll go ahead and tag Dameon Welch Abernathy‌ here, Daemon are there any internal resources talking about tuning CPMI for performance over high latency networks? I'm suspecting that the delays are being caused by how CPMI itself works, and trying to tweak TCP (smaller MSS, more RTT tolerance, etc) would not help.

--
Second Edition of my "Max Power" Firewall Book
Now Available at http://www.maxpowerfirewalls.com

Attend my Gateway Performance Optimization R81.20 course
CET (Europe) Timezone Course Scheduled for July 1-2

PhoneBoy · ‎2018-03-06

Not sure much can be done to tune CPMI for higher latency networks but can't hurt to ask around.

Are you a member of CheckMates?

R80.10 SMS slowness