Kaspars Zibarts

Security Gateway Performance Optimization - VSX

Discussion created by Kaspars Zibarts on Oct 1, 2018
Latest reply on Nov 21, 2018 by Kaspars Zibarts

This is a feedback to the awesome session by Tim Hall TechTalk: Security Gateway Performance Optimization with Tim Hall - was great presentation. Thanks heaps 

I thought that it would be worthwhile adding some comments regarding VSX as it has it's own little idiosyncrasies when it comes to SXL, CoreXL, SMT and MQ. In general every technology point that was discussed by Tim actually applies to VSX too and it is vital to understand all of them before you start digging into VSX tuning.

As always - VSX is deployed in so many different ways that making general recommendations is impossible: this is merely some "gotcha's" learned over years using it. 



Watch out for under-dimensioned VS. By default connections table is set to 15000 concurrent connections that is rather low. When AA turns on, it may impact working traffic, i.e. we had issues with some very specific RDP running over SSL when aggressive aging kicked in. Keep close eye on fw ctl pstat



Make sure that you are utilising R80+ in full by switching (if not done by default) VSes to 64 bit support thus avoiding memory shortage for connections



By default, all your VSes will share the same resource pool of fw workers. On one hand it is "an easy maintenance" approach, but if you have critical VSes, I would recommend to protect them by allocating them dedicated CPU resources to avoid situations with "elephant connections" and/or "elephant VSes" killing your critical resource access.


Default split between SXL (0-3) and CoreXL (4-23) on VSX


And dedicated approach: VS0,1,2,5 on core 4, VS3,4,7 on core 7, VS6 on 12-21 and VS8 - using default all. It's just an example!



If you do decide to go with strict resource allocation, i.e dedicated CPU cores for each VS, create a "map" of your CPUs to visualize allocation. Especially important with hyper-threaded CPU cores as numbering is not sequential anymore. Couple of rules to keep in mind to keep best performance

  • do not split one VS across two physical cores (look at VS4 and VS5 below)
  • allocate both "main" core and it's HT sibling, not just main cores or HT instances (look below for example VS0 has two cores 4 and 28 allocated, not 4 and 5)
  • highly loaded VSes keep on the same physical CPU as SXL as you may gain from caching
  • we noticed best performance when we had one to one mapping of a specific fw workers on VS to a CPU core - but that can eat up all your available cores quickly.

Below is a sample of 48 core hyper-threaded system map



We learned hard away, especially prior R80.10 release that pepd and pdpd can go "nuts" and consume a lot of CPU resources, therefore to avoid any impact on fw workers we put them on dedicated cores


I realise that I'm digging my own grave here  but I'll probably learn something new soo