Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
HeikoAnkenbrand
Champion Champion
Champion
Jump to solution

High CPU utilization during process fwk0_dev_0 (UMFW vs. KMFW)

What can be the problem on a 16 core open server R80.30 if the following process fwk0_dev_0 is at 300%-400% high CPU utilization. This prozess "fwk0_dev" indicates VSX in older versions. VSX is not enabled on this system.

What does this prozess do?

Who can help here?

# top

 

 

 

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 19437 admin      0 -20 18.402g 0.015t 538544 S 309.9  1.0   8626:21 fwk0_dev_0
 19831 admin     20   0  755376 186344  41628 S   5.6  0.0 459:42.51 fw_full

 

 

 

 # pstree -l |grep -A 15 -B 15 fwk0_dev_0

 

 

 

-cprid_wd---cprid
     |-cpwd-+-AutoUpdaterServ-+-AutoUpdater---22*[{AutoUpdater}]
     |      |                 `-sleep
     |      |-DAService_scrip-+-DAService---5*[{DAService}]
     |      |                 `-sleep
     |      |-avi_del_tmp_fil---sleep
     |      |-ci_http_server
     |      |-cpd---4*[{cpd}]
     |      |-cphamcset
     |      |-cpview_services
     |      |-cpviewd---{cpviewd}
     |      |-4*[dlpu]
     |      |-fw_full-+-in.acapd---{in.acapd}
     |      |         |-in.asessiond---{in.asessiond}
     |      |         |-in.msd---3*[{in.msd}]
     |      |         |-stormd
     |      |         |-ted---temain---46*[{temain}]
     |      |         |-usrchkd
     |      |         |-vpnd---3*[{vpnd}]
     |      |         `-10*[{fw_full}]
     |      |-fwk_forker---fwk0_dev_0---33*[{fwk0_dev_0}]
     |      |-fwk_wd
     |      |-lpd

 

 

 

 

➜ CCSM Elite, CCME, CCTE
57 Replies
_Val_
Admin
Admin

An additional note. This discussion is already escalated to R&D, but today is already weekend in TA, so some official response is expected the next week.

Thanks for your understanding.

HeikoAnkenbrand
Champion Champion
Champion

Thanks!

 

➜ CCSM Elite, CCME, CCTE
0 Kudos
Wolfgang
Authority
Authority

It is not only a problem of misconfiguration. We had 4 open server installations and they  all have UMFW enabled.

16 and 8 core only ( real cores, not only licensed). No one enabled UMFW on these systems, the installation script does.

This behaviour and the high CPU utilization should be answered by R&D or TAC.

We had this problem since more then a week and the only response for this problem was this discussion here on CheckMates. I‘m happy that I‘m not the only one  with that and I´ll hope we get an answer soon. To know R&D is already involved are good news.

Wolfgang

PS: For my individual convienence, because it is christmas😉.......Please guys cool down, this is only a technical problem, it has to be solved and explained from Check Point and we should not fight here with words.

Samuel_T_
Explorer

We have the same problem with some of our customer Firewalls.

0 Kudos
Zoltan_Polowsky
Participant

FYI:

Same issue on HPE DL380 Gen10

- 2 x Intel Xeon-Platinum 8268 (2.9 GHz/24-core/205 W)

- Check Point license for 16 cores

During installation R80.30 kernel 3.10, USFW is selected at the firewall with 48 cores. But only 16 of the 48 cores are used for the firewall.

 

 

Timothy_Hall
Champion
Champion

Thanks, so it sounds like if more than 40+ cores are present during install/upgrade USFW is enabled, even if fewer cores actually end up being licensed later on open hardware, and the number of licensed cores is not checked again later as far as USFW activation is concerned.  This is similar to the trial license "core crunch" effect described in the second edition of my "Max Power" book. 

Of course this never happens with Check Point gateway appliances, which are always licensed for the same number of cores as are physically present...

 

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
HeikoAnkenbrand
Champion Champion
Champion

Hi @Timothy_Hall,

To confirm this I called a friend (He's a HP dealer.) and asked him if he had a HP DL380 with more then 40 cores in his company:-) Two hours later we were sitting in his LAB and installed R80.30 on this system.

Result of my test:

GAIA version/ Kernel/ Cores Firewall mode Check
R80.30 kernel 3.10 more then 35 cores UMFW is enabled checked on HP DL 380 G10
2 * Platinum 8180MProcessor 28 cores = 56 cores
R80.30 kernel 3.10 less then 35 cores KMFW is enabled checked on HP DL 380 G10
1 * Platinum 8180MProcessor 28 cores
R80.30 kernel 2.6 KMFW is enabled checked on VMWare with 30 cores and with 46 cores
R80.40 EA kernel 3.10

UMFW is enabled
(I think UMFW is enabled by default)

checked on VMWare with 4 cores

 

If the info should not be correct, please small info to me, then I change that in the article.

If you install R80.30 kernel 3.10 on a system with more than 40 cores, the UMFW is used. If there are fewer cores, the “Kernel Mode Firewall” is used. In my tests only the time of installation was important for the selection of the mode (UMFW vs. KMFW). We have tested this - as described above -  with 28 cores in relation to 56 cores.

➜ CCSM Elite, CCME, CCTE
DE_Hans1205
Explorer

Hello, Heiko,

I think I have read the whole thread by now, but I haven't seen this tip:
In a remote session a support technician showed me that you could switch the view in "top" with SHIFT + H, so that the single fwk0_dev_0 process with very high CPU load becomes a separate fwk0_x processes again. Perhaps quite helpful.

Furthermore CP recommends to stay in user mode, even though we only have 8 licensed cores, unfortunately without any explanation. Opinions?

Izhar_Shoshani_
Employee
Employee

Hi,

 

In R80.30 30.10 Open servers  always load in USFW mode.

 

If the  Open server has less than 35 fw instances it’s safe to move to kernel mode even on R80.30 with kernel 3.10. 

the number of fw instances is derived from the number of cores on the server and the number of core defined by the license.

If you want to change the mode from USFW to Kernel this can be done by changing the registry parameter FwIsUsermode by cpprod_util command.

 

In USFW the fw instances are threads of the fwd_dev so by default the top shows all the threads cpu utilization under the main thread. 

Top has the option to present the utilization per thread as well.

the USFW and VSX shared the same architecture of fwk instances, so the fact that you see fwk doesn't mean you have a VSX GW.

 

 

0 Kudos
_Val_
Admin
Admin

@Izhar_Shoshani_ thanks for this

I have a couple of follow-up questions:

 

1. Why UMFW is enabled by default when installing open servers?

2. Heiko and others in this threat say that in UMFW mode CPU usage is higher than when kernel mode is on, for the same type of traffic. Is this true? if it is, why? 

0 Kudos
B_P
Advisor

@_Val_ wrote:

I have a couple of follow-up questions:
1. Why UMFW is enabled by default when installing open servers?
2. Heiko and others in this threat say that in UMFW mode CPU usage is higher than when kernel mode is on, for the same type of traffic. Is this true? if it is, why? 


@_Val_did you ever find answers for your two questions?

I would also like to find out if there is any real benefit to running in UMFW mode even when you have <35 cores.

_Val_
Admin
Admin
0 Kudos
HeikoAnkenbrand
Champion Champion
Champion

Hi @Izhar_Shoshani_ 

thanks for your answer. I have also a couple of follow-up questions:

For my tests I switched a LAB firewall R80.30 kernel 3.10 with 4 cores under VMWare to a UMFW.

1) Is this really the process fwd_dev you describe? I can't find anything here to process fwd_dev with "top" and "ps" on the firewall. I think this is the process "fwk0_dev_0". Here I can find more threads with top if USFM is enabled (see CLI commands).

2) What are the following processes responsible for and what is the function?

    - fwk0_X
    - fwk0_dev_X
    - fwk0_kissd
    - fwk0_hp

I think out loud: 

If I use cpconfig to change the CoreXL cores, the number of processes fwk0_X and fwk0_dev_X (X stands for the CoreXL instance)  will also change. I think those are our firewall workers processes at a USFW. If this is the case, I can use the following command "top -Hbn1 -p <prozess ID fwk0_dev_0>to see how the firewall worker threads are being utilized. When I look at the process fwk0_dev_0 with top I see a very high load in sum. This could be the sum of all thread utilizations from process fwk0_dev_0.

A small calculation sample for the utilization of process fwk0_dev_0:

                                 max_CoreXL_number            max_CoreXL_number
fwk0_dev_0      =      ∑       fwk0_x                    +                fwk0_dev_x          +        fwk0_kissd        +          fwk0_hp
                                 x=0                                              x=0

This would explain the high process utilization of the process. Should that be the case, I answered one of my questions myself 🙂 

I'd appreciate it if we could work this out. Thanks in advance.

 

CLI commands:

-------------------------------------------------------------------------

[Expert@GW2:0]# cpprod_util FwIsUsermode
1                                                                                                                             >>> USFW is enabled

[Expert@GW2:0]# top

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
10219 admin 0 -20 1070m 449m 134m S 2 24.0 0:17.19 fwk0_dev_0          >>> fwk0_dev_0 process id 10219

[Expert@GW2:0]# top -Hbn1 -p 10219                                                              >>> threads for process 10219 (fwk0_dev_0)

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
10219 admin 0 -20 1070m 449m 134m S 0 24.0 0:03.49 fwk0_dev_0
10220 admin 0 -20 1070m 449m 134m S 0 24.0 0:00.00 fwk0_kissd
10436 admin 0 -20 1070m 449m 134m S 0 24.0 0:00.57 fwk0_0
10437 admin 0 -20 1070m 449m 134m S 0 24.0 0:00.64 fwk0_1
10438 admin 0 -20 1070m 449m 134m S 0 24.0 0:00.67 fwk0_2
10439 admin 0 -20 1070m 449m 134m S 0 24.0 0:00.80 fwk0_3
10440 admin RT -20 1070m 449m 134m S 0 24.0 0:00.76 fwk0_hp
10441 admin 0 -20 1070m 449m 134m S 0 24.0 0:00.15 fwk0_dev_1
10442 admin 0 -20 1070m 449m 134m S 0 24.0 0:00.09 fwk0_dev_2
10443 admin 0 -20 1070m 449m 134m S 0 24.0 0:00.09 fwk0_dev_3

# pstree -p 10219
fwk0_dev_0(10219)-+-{fwk0_dev_0}(10220)
                                    |-{fwk0_dev_0}(10436)
                                    |-{fwk0_dev_0}(10437)
                                    |-{fwk0_dev_0}(10438)
                                    |-{fwk0_dev_0}(10439)
                                    |-{fwk0_dev_0}(10440)
                                    |-{fwk0_dev_0}(10441)
                                    |-{fwk0_dev_0}(10442)
                                   `-{fwk0_dev_0}(10443)

 

 

 

 

➜ CCSM Elite, CCME, CCTE
Izhar_Shoshani_
Employee
Employee

Hi,

 

Yes, it is fwk0_dev. It was a typo.  In  VSX we have fwk_dev per VS so it is fwk0_dev, fwk1_dev ...

Your calculation is right, it is the sum of the threads 🙂

 

Thanks,

Izhar

 

HeikoAnkenbrand
Champion Champion
Champion

Hi @Izhar_Shoshani_,

THX

One question!

What are the following processes responsible for and what is the function?

- fwk0_X
- fwk0_dev_X
- fwk0_kissd
- fwk0_hp

 

➜ CCSM Elite, CCME, CCTE
HeikoAnkenbrand
Champion Champion
Champion

Hi @Izhar_Shoshani_,

any news in this case?

 

➜ CCSM Elite, CCME, CCTE
0 Kudos
Izhar_Shoshani_
Employee
Employee

- fwk0_X - fw instance thread that takes care for the packet processing
- fwk0_dev_X - the thread that takes care for communication between fw instances and other CP daemons 
- fwk0_kissd - legacy (obsolete)
- fwk0_hp - (high priority) cluster thread

HeikoAnkenbrand
Champion Champion
Champion

Hi @Izhar_Shoshani_,

Thanks for the information. You were able to clarify some interesting questions here.

 

➜ CCSM Elite, CCME, CCTE
0 Kudos
_Val_
Admin
Admin

@Izhar_Shoshani_ Thanks a lot for your valuable input here

0 Kudos
uror
Contributor

Hello @Izhar_Shoshani_ 

I also have a question?

From how many cores should USFW be used?
Is there a recommendation here?

0 Kudos
Izhar_Shoshani_
Employee
Employee

If you want to run more than 40 fw instances, you can't do it in Kernel mode.

So this is the time to move to USFW.

 

Timothy_Hall
Champion
Champion

Actually it is less than 40 in my experience.  Example: 26000 model which has SMT and USFW enabled by default on R80.30 Take 111 Gaia 3.10.  If you turn off USFW and reboot it, it never comes back onto the network.  Looking at the console output, a failure occurs when attempting to load the 34th kernel-based Firewall Worker, and it drops to a console shell prompt.  This appears to happen before the network interfaces are initialized, so for all intents and purposes the box is dead network-wise.  I assume the failure happened because it bumped the 2GB kernel limit during worker load, setting 32 Firewall Worker instances and rebooting with USFW disabled on the 26000 came up fine.

 

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
HeikoAnkenbrand
Champion Champion
Champion

This becomes exciting with R80.40 here UMFW is enabled by default. Since hyper threading is now allowed on open servers, some more cores are possible. For example on a HP DL 360 with 28 cores per CPU.

So the following values are achievable:

2 CPUs * 28 Cores *2 HT = 112 Cores

➜ CCSM Elite, CCME, CCTE
0 Kudos
HeikoAnkenbrand
Champion Champion
Champion

Infos added to article:

R80.x - Performance Tuning Tip – User Mode Firewall vs. Kernel Mode Firewall  

➜ CCSM Elite, CCME, CCTE
0 Kudos
Khalid_Aftas
Contributor

We have a 16000 series (16cores with HT 32) USFW mode is enabled by default, and we see high cpu usage of the fwk dev and with shift+h individiual fwk are quiet high than before (maxing most of the time) compared to 77.30

 

is switching to kmfw a possible solution for this abnormal performance ? (we are in a call with TAC seems new to them.)

0 Kudos
Khalid_Aftas
Contributor

SK in our case says that UMFW should be one (based on accel stats).

 

We will try to get r&d look at it, but atm we don't have any solutions (moving 445 and 443 traffic with fast_accel, only put the loads on SND core...)

 

 

0 Kudos
shais
Employee
Employee

Hi,

As was already mentioned here, the reason you see 300% is that TOP is aggregating the CPU for all the threads.
If you still see high CPU utilization per thread, I suggest opening a ticket to support to look into it - if you already have one, I will appreciate it if you can send me the SR number (privately).

 

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events