Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Ankur_Datta
Collaborator

Installation failed. Reason: Load on Module failed - failed to load security policy

Hi All,

we have a environment, where management is on R80 and gateway is on R75.40 SPLAT.

 

We sometime faces following error when we install policy on gateway:

"Installation failed. Reason: Load on Module failed - failed to load security policy"

 

clearing the string_dictionary_table resolves the issue, but this time it didn't. We increased the limit of string_dictionary_table from 65,536‬ to 131072 as table was already reached peak limit. After doing this still we are getting same error when installing the policy.

 

We ran cpd debug and debug the process of fetching of the local policy from the temporary directory. we get following error:

fetching the local policy:

 

fw_atomic_download: sizeof struct fwatomload 872
fw_atomic_download: FWATOMICLOAD 40047a03
fw_atomic_download: FWATOMICLOAD done ret=-1
fw_atomic_download: FWATOMICLOAD failed: Invalid argument
fw_atomic_download: unlocking mutex: install_policy_mutex

Failed to Load Security Policy: Invalid argument
fw_rfetchx_local_ex: failed to load Security Policy
update_load_connection: no connection
In fwa_vrfy_db restore = -1
logo_directory_restore: dir=/opt/CPsuite-R75.40/fw1/state/__tmp/FW1/
Failed to Load Security Policy: Invalid argument
Fetching Security Policy Failed

 

CPD logs:

 

[22 Nov 13:21:26] Installing Security Policy XXXXXXX on all.all@XXXXXXX

[22 Nov 13:21:26] fwasync_mux_timeout: 281: timed out after 100000 miliseconds
[22 Nov 13:21:26] fwasync_mux_timeout: 281: inbuf: 0/12 outbuf: 0/0 state: 77f1f440 1
[22 Nov 13:21:26] fwasync_mux_timeout: 281: calling handler 77f1f640
[22 Nov 13:21:26] resched timeout to conn_id=281, conn=6d5ea280, comm=6d200738, due to 1 active sessions
[22 Nov 13:21:28] opsec_send_datagram_e: SESSION ID:4 is sending DG_ID=4 DG_TYPE=0x1701(???)
[22 Nov 13:21:28] pushing dgtype=1701 len=18828 to list=0x8f44adc
[22 Nov 13:21:28] pulling dgtype=1701 len=18828 to list=0x8f44adc
[22 Nov 13:21:28] demultiplex type=1701 session-id=4
[22 Nov 13:21:28] amon_client_handle_reply: return code - 0
[22 Nov 13:21:28] opsec_comm_notify: COM 0x8f4adb0 got signal 131074
[22 Nov 13:21:30] Failed to Load Security Policy: Invalid argument

[22 Nov 13:21:31] ckpSSL_do_read: read 12 bytes
[22 Nov 13:21:31] fwasync_conn_get: get max buffer size (1048576) .
[22 Nov 13:21:31] ckpSSL_InputPending 1 pending bytes
[22 Nov 13:21:31] ckpSSL_InputPending 1 pending bytes
[22 Nov 13:21:31] ckpSSL_do_read: read 8 bytes
[22 Nov 13:21:31] fwasync_conn_get: get max buffer size (1048576) .
[22 Nov 13:21:31] demultiplex type=d session-id=7
[22 Nov 13:21:31] opsec_got_ping_peer_request
[22 Nov 13:21:31] got_peer_req: sess: 7, peer_dg_id:2, query:0
[22 Nov 13:21:31] ckpSSL_do_write: write 20 bytes
[22 Nov 13:21:31] opsec_comm_notify: COM 0x6d2af208 got signal 131074
[22 Nov 13:21:31] cpd_server_signal_handler: session=0x6d259ba0, event=135683
[22 Nov 13:21:31] Failed to Load Security Policy: Invalid argument

[22 Nov 13:21:31] Fetching Security Policy Failed

[22 Nov 13:21:31]

[22 Nov 13:21:31] Commit_exec_cb : RTPM_SUCCESS - l_nRetCode = 11
[22 Nov 13:21:31] Commit_exec_cb : Executable Failed, returned Load on Module failed - failed to load Security Policy.
[22 Nov 13:21:31] sendDatagramOfCommitInstall: policy commit failed
[22 Nov 13:21:31] readMessagesFile: file with messages doesn't exist, there are no commit messages
[22 Nov 13:21:31] removeMessageFile: Removing file with warnings
[22 Nov 13:21:31] removeMessageFile: File doesn't exist, nothing to do
[22 Nov 13:21:31] opsec_send_datagram_e: SESSION ID:7 is sending DG_ID=7 DG_TYPE=0x1202(???)
[22 Nov 13:21:31] ckpSSL_do_write: write 18 bytes
[22 Nov 13:21:31] opsec_comm_notify: COM 0x6d2af208 got signal 131074
[22 Nov 13:21:31] cpd_server_signal_handler: session=0x6d259ba0, event=135683
[22 Nov 13:21:31] ckpSSL_do_read: read 12 bytes
[22 Nov 13:21:31] fwasync_conn_get: get max buffer size (1048576) .
[22 Nov 13:21:31] demultiplex type=3 session-id=7
[22 Nov 13:21:31] Destroying session (6d259ba0) id 7 (ent=8a82690) reason=PEER_ENDED
[22 Nov 13:21:31] SESSION ID:7 already resumed read
[22 Nov 13:21:31] All sessions removed from comm 0x6d2af208. Peer may close it.
[22 Nov 13:21:31] opsec_send_datagram_e: is sending DG_ID=0 DG_TYPE=0xa(DGTYPE_MAY_CLOSE_COMM)
[22 Nov 13:21:31] ckpSSL_do_write: write 12 bytes

 

I checked sk33893 but didn't find any solution that can be applied.

 

Device model - UTM 3070

Management - MDS - smart -1 50

 

any suggestions please, how to resolve this.

 

Thanks

 

0 Kudos
11 Replies
G_W_Albrecht
Legend
Legend

Let us be honest first: R75.40 SPLAT is out of support since at least Apr 2016. Fetching local policy is not what the usual policy install does, so i can see no connection between your Installation failed issue and the presented cpd debug.

First of all i would do a reboot - a very old remedy when we encounter Load on Module failed situations. Or, there had been a prior policy change that is making the methusalem GW choking ?

CCSE CCTE CCSM SMB Specialist
0 Kudos
Ankur_Datta
Collaborator

Hi Albrecht,

At this moment, i installed security policy.
[22 Nov 13:21:26] Installing Security Policy XXXXXXX on all.all@XXXXXXX

and i get this log after few logs in CPD.elg file.

[22 Nov 13:21:30] Failed to Load Security Policy: Invalid argument

Can this be reason due to which policy installation is failing.

The device is in replacement with the new gateway soon.


0 Kudos
mdjmcnally
Advisor

Found some SK articles referring to IPS Updates

 

sk101559 possibly.

However not 100% certain

0 Kudos
Ankur_Datta
Collaborator

We are using only fw blade but IPS signatures are automatically being updated.

0 Kudos
Jerry
Mentor
Mentor

echoing G_W with that plus:

1. do the upgrade ASAP otherwise loads of things like that will happen ...
2. t-shooting now 75.40 is like trying to find a particular problem with 20y vehicle - waste of time!
3. timeouts happen when stuff is overloaded and simply slowing down due to the resources being overutilized or mismanaged vs. misconfigured all depends
4. are you really going to t-shoot this issue on 75.40 ?

wonder why there is so many worldwide SG's still running R65 ... 😞
Jerry
0 Kudos
Ankur_Datta
Collaborator

Hi Jerry,

The replacement of device and upgrade is already in pipeline.

 

Can we find out root cause of this issue please

 

Regards

0 Kudos
Jerry
Mentor
Mentor

fw_atomic_download: sizeof struct fwatomload 872
fw_atomic_download: FWATOMICLOAD 40047a03
fw_atomic_download: FWATOMICLOAD done ret=-1
fw_atomic_download: FWATOMICLOAD failed: Invalid argument
fw_atomic_download: unlocking mutex: install_policy_mutex

Failed to Load Security Policy: Invalid argument
fw_rfetchx_local_ex: failed to load Security Policy
update_load_connection: no connection
In fwa_vrfy_db restore = -1
logo_directory_restore: dir=/opt/CPsuite-R75.40/fw1/state/__tmp/FW1/
Failed to Load Security Policy: Invalid argument
Fetching Security Policy Failed

--

above is all you need to understand

you have or have had aparently the connectivity issues on either SIC or MDS->SG.
please find out if that "sync" works in between and t-shoot your networking first.
when Layer3-4 are just FINE in between the MGMT and Gateway(s) then we'll move it, alright?
1st thigns 1st.

cheers
Jerry
0 Kudos
Ankur_Datta
Collaborator

Hi Jerry,

 

I checked there is no issue in network connectivity b/w MDS and gateway. SIC is showing communicating for both GW objects.

 

 

0 Kudos
Timothy_Hall
Champion
Champion

There are basically two reasons why the atomic policy load ("commit") would fail out on the gateway:

1) Memory Shortage - Reboot the system and try again, would not be surprised if there is memory leaking on a code version that old.

2) Error in the complied policy that the SMS did not catch (but should have), and the gateway's sanity check against the compiled policy it was sent has failed.  This will normally require a kernel debug during the load process to figure out what part is the problem, see Debug Step 3 here:

sk84700: Methodology for debugging "Load on Module Failed" error

Depending on what the debug says you may be able to modify your policy and work around whatever the issue is.

There is a laundry list of compiled policy error situations you can go through in the following SK, but you can easily spend hours going through all the scenarios; quickly scan through them and see if any correlate to recent changes that you might have made.  I'd just go for the debug during load as mentioned above.

sk33893: 'Installation failed. Reason: Load on Module failed - failed to load security policy' error...

 

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
Ankur_Datta
Collaborator

Hi Tim,

 

I followed sk84700 and debug as you advised. 

 

I found something and i suspect due to this we are facing the problem:

 

fw_atomic_add_ws_sgen_buffers: struct_buffer is now full, size 225761 bytes

fw_atomic_add_rad_sgen_buffers: SGen_rad_global_struct_to_buffer

fw_atomic_add_rad_sgen_buffers:rad add buffer is OK

fw_atomic_download: sizeof struct fwatomload 872
[fw_atomic_download: FWATOMICLOAD 40047a03
[fwioctl:FWATOMICLOAD] [Start]
[fwioctl:FWATOMICLOAD] [End]
fw_atomic_download: FWATOMICLOAD done ret=-1

fw_atomic_download: FWATOMICLOAD failed: Invalid argument

 

i ran kernel debug on fw module as well using filter flag:

 

26Nov2019 12:57:39.964681;[cpu_0];[fw_0];FW-1: Attempting to create an already existing table: cmik_loader_sync_htab_table (7999);
;26Nov2019 12:57:39.964684;[cpu_0];[fw_0];fwk_cmi_prepare: failed to create cmik_loader_sync_htab_table table.;
;26Nov2019 12:57:39.964684;[cpu_0];[fw_0];fwk_atomic_load_prepare: fwk_cmi_prepare failed;

This error leads to sk86961

I can see IPS was updated on Management server automatically after last policy installed. We are not using IPS blade and neither Application Control blade or URL Filtering blade is enabled. Only Fw blade is being used. Can this be related to article sk86961

Thanks

 

0 Kudos
Timothy_Hall
Champion
Champion

Yep sk86961 looks like the issue for sure.  Unfortunately the fix is to upgrade to R75.45 or load a hotfix.  But please provide the output of these commands:

fw ctl get int appi_referrer_ghtab_hash_size
fw ctl get int appi_referrer_ghtab_limit_size

The values for both of these variables in R80.30 is 5000, curious to see what it is in your version.  I think these are the data structures that you are running out of, it MAY be possible to manually increase these but the SK seems to imply that they are already at their maximum allowed value and the hotfix allows them to be increased further.  DO NOT try to change these just yet as very bad things might happen.

 

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events