@Wolfgang has done a great job of assessing your situation. After looking at all your outputs there are a couple of different things slowing you down:
1) You said that the Direct Access traffic is IPSec tunneled in HTTPS, but I strongly suspect it is the other way around where HTTPS is tunneled in IPSec. Only TCP and UDP traffic can be throughput accelerated (or templated) by SecureXL. So that IPSec traffic that is traversing the firewall is one cause of the high PXL even though only "fw" is enabled; PXL must be handled on a firewall worker/instance.
2) As Wolfgang said if you have non-tunneled CIFS traffic traversing the firewall, CIFS is its own protocol that can't be accelerated by SecureXL and will account for even more PXL traffic.
3) The IPSec traffic in PXL appears to be getting handled on the unfortunate fw worker instance #27 which is either CPU #2 or #3 depending on which command you are running. I suspect that instance is being saturated because all the IPSec traffic is from one IP address to another single address and essentially is a single elephant flow as Wolfgang mentioned. The single firewall instance hitting 100% is causing your bottleneck; the rest of the firewall is very idle as you noted.
4) In your R80.30 release the PXL packets of a single connection can only be handled on one firewall worker at a time, and load cannot be shared or shed from the saturated instance. All the Dynamic Dispatcher can do is cause new connections to stay clear of the saturated instance, but the connections/flows on that saturated instance are stuck there. You can use the fw ctl multik gconn command to see which connections are assigned to the saturated instance (number 27 in your scenario), and I can pretty much guarantee you will see the IPSec "connection" elephant flow there. The fw ctl multik print_heavy_conn command will show you all current connections that the firewall has classified as heavy (elephant flow) as well as those detected for the last 24 hours.
So that brings us to what you can do about it now with your current code level.
1) As Wolfgang mentioned you will want to identify the attributes of the elephant flow and use the fast_accel feature to fastpath it through SecureXL on a SND/IRQ instance. Because the IPSec traffic is not going F2F this should work. Your performance will definitely increase, but you may then saturate the single SND/IRQ instance which can cause some other nasty effects such as packet loss due to RX-DRP.
2) So I'd recommend enabling Multi-Queue on all interfaces handling this IPSec elephant flow to avoid RX-DRP. To some degree Multi-Queue should be able to spread the load out a bit among the SND/IRQ cores.
3) As a result of doing this, hopefully almost all traffic through the firewall will be fully accelerated and handled by the 4 SND/IRQ cores with the 28 instances sitting idle since there is now little PXL and F2F traffic. You will probably want to change your 4/28 CoreXL split to 8/24 but this may not be necessary if you don't see any SND/IRQ cores getting saturated after the change.
Next let's discuss how this situation is handled in future releases such as R80.40+.
1) R80.40 introduced the concept of Dynamic Split but it is disabled by default; it is enabled by default in R81 and later. This will auto-adjust your core split on the fly if all SND/IRQ cores or all Firewall Workers start getting saturated. This wouldn't really help your current scenario.
2) What would help is that R80.40 introduced a set of new paths tagged with the "pipeline" moniker (visible with fwaccel stats -s) that I believe are still disabled by default in the latest R80.40 Jumbo HFAs. However I think these new pipeline paths will be enabled by default in R81.10 (they may be in R81 GA as well, not sure). The pipeline paths allow limited "spreading" of elephant flow load across multiple worker cores to avoid a single worker getting saturated with all the connections trapped on it.
3) In R80.40 and later Multi-Queue is enabled on all interfaces by default (except the management interface in R80.40 GA and the early Jumbo HFAs).
Whew that got a lot longer than I intended, let me know if you have any questions.
Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com