Indeed in R81.20 and earlier versions, the fwd process handles most of its processing on a single main thread.
BTW, fwd is often busy handling logs, but it also handles a few more functions such as cluster sync and more.
In R82 we introduced a significant architecture improvement that we call internally "fwd scale-out". Instead of having just one fwd process, we have multiple fwd worker processes that take most of the logging processing away from the main one.
The main benefit is increased log sending throughput, especially on gateways that have many cores and handle a lot of traffic. Another benefit is resiliency since heavy logging is less likely to impact other functionality running on fwd.
The improved log sending capacity goes well with another feature on the Management / Log Server side called "log distribution" in which you can configure a gateway to distribute the log to multiple log servers, which also helps with resiliency and provides improved log ingestion capacity.