- CheckMates
- :
- Products
- :
- Developers
- :
- API / CLI Discussion
- :
- Re: API 1.9 run-script ignores the timeout setting
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Are you a member of CheckMates?
×- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
API 1.9 run-script ignores the timeout setting
Hi,
FW1 build number:
This is Check Point Security Management Server R81.20 - Build 440
This is Check Point's software version R81.20 - Build 703
I've recently come across something I assume is a bug or at least a limitation of the API run-script call.
My use case is invoking a script that uploads log files ( *.log and *.adtlog ) from an MDS running R81.20 to a remote server using the rsync command. This can easily require a very long time, since daily we will need to upload several GB of data from several firewalls.
This server is a fresh install used only for testing.
My previous tests with API 1.8 on an older R81.10 failed because of the 5 minutes hardcoded API timeout but since the 1.9 version claims the timeout is now configurable, I thought it could provide an elegant solution to this problem.
I wrote two scripts, the first talks to the MDS API, the second is uploaded to the server and run using the run-script endpoint.
the relevant snippets from the first script are:
function do_login {
echo login to server
SID=$(https ${HTTPS_OPTS} POST ${checkpoint_host}/web_api/login user="$username" password="$password" session-timeout:=3600 | jq '.sid' -r)
https --offline ${HTTPS_OPTS_WITH_SESSION} ${checkpoint_host} X-chkp-sid:"$SID" Content-Type:application/json Accept:application/json
}
This is simple login logic with a session timeout of one hour, the SID is then set to the variable $HTTPS_OPTS_WITH_SESSION, which is used throughout the rest of the script.
function do_run_script () {
TASKID=$(https ${HTTPS_OPTS_WITH_SESSION} \
POST ${checkpoint_host}/web_api/run-script \
script-name=checkpoint-log-backup \
targets="$checkpoint_object_name" \
script=@${script_file} \
timeout:=3600 \
args="some_secret '$RSYNCD_ARGS_BASE64'" | \
tee script-log | \
jq --raw-output '.tasks[0]["task-id"]'
)
echo "created task: $TASKID"
while true; do
https ${HTTPS_OPTS_WITH_SESSION} POST ${checkpoint_host}/web_api/show-task \
task-id="$TASKID" details-level=full | \
jq '.tasks[0]' > out.json
status=$(cat out.json | jq --raw-output '.status')
progress=$(cat out.json | jq --raw-output '.["progress-percentage"]')
echo "$status... ${progress}%"
if [[ "$status" == "in progress" ]]; then
https ${HTTPS_OPTS_WITH_SESSION} POST ${checkpoint_host}/web_api/keepalive &> /dev/null
sleep 3
else
cat out.json | jq --from-file show_task_log.jq --raw-output
break
fi
done
}
This creates the task, takes the task_id, and then loops while the task is in progress. The timeout is set to 3600 seconds, and the API accepts it as such, however this always fails with a timeout status if the uploaded script takes more than 5 minutes to run.
This is the output of the trial runs:
$ time bash test.sh
...snipped
created task: 823d93fd-f643-4c62-b018-dd6d3fc80c07
in progress... 10%
...snipped
in progress... 10%
failed... 100%
STDOUT
Operation timed out
END STDOUT
STDERR
END STDERR
logout from server
real 5m5.171s
user 0m36.334s
sys 0m3.925s
Notice how the progress is always set to 10%. I assume that the API is calculating the progress using the timeout of 3600.
However when the 5 minute mark is reached, the status is set to timeout, even though the script continues to run in the server.
In fact I found that the upload always completes, regardless of the time it takes. This suggests that something in the API code is ignoring the timeout setting and eagerly marking the process as timed-out, but not killing it...
I've confirmed this also happens with the mgmt_cli:
mgmt_cli --format json --user admin --password some_password run-script script-name teste script "sleep 600" timeout 1200 targets.1 mds-target
---------------------------------------------
Time: [14:44:39] 8/8/2023
---------------------------------------------
"tlx-mds-lab - teste" in progress (10%)
---------------------------------------------
Time: [14:44:49] 8/8/2023
---------------------------------------------
"tlx-mds-lab - teste" in progress (10%)
...
---------------------------------------------
Time: [14:49:30] 8/8/2023
---------------------------------------------
"tlx-mds-lab - teste" in progress (10%)
---------------------------------------------
Time: [14:49:40] 8/8/2023
---------------------------------------------
"tlx-mds-lab - teste" failed (100%)
{
"tasks" : [ {
"uid" : "1144c12c-8b16-4888-b45d-a23199578739",
"name" : "mds-target - teste",
"type" : "CdmTaskNotification",
"domain" : {
"uid" : "a0eebc99-afed-4ef8-bb6d-fedfedfedfed",
"name" : "System Data",
"domain-type" : "mds"
},
"task-id" : "4cf21a37-cdbd-4860-9cfb-9907eed08540",
"task-name" : "mds-target - teste",
"status" : "failed",
"progress-percentage" : 100,
"start-time" : {
"posix" : 1691502278741,
"iso-8601" : "2023-08-08T14:44+0100"
},
"last-update-time" : {
"posix" : 1691502578913,
"iso-8601" : "2023-08-08T14:49+0100"
},
"suppressed" : false,
"task-details" : [ {
"uid" : "8c03189b-f28a-49d4-8e83-7848d98041d0",
"domain" : {
"uid" : "a0eebc99-afed-4ef8-bb6d-fedfedfedfed",
"name" : "System Data",
"domain-type" : "mds"
},
"color" : "black",
"statusCode" : "failed",
"statusDescription" : "Operation timed out",
"taskNotification" : "1144c12c-8b16-4888-b45d-a23199578739",
"gatewayId" : "7ea4f133-9497-0d43-8c8d-736a85da2e53",
"gatewayName" : "",
"transactionId" : 713187613,
"responseMessage" : "T3BlcmF0aW9uIHRpbWVkIG91dA==\n",
"responseError" : "",
"meta-info" : {
"validation-state" : "ok",
"last-modify-time" : {
"posix" : 1691502579540,
"iso-8601" : "2023-08-08T14:49+0100"
},
"last-modifier" : "admin",
"creation-time" : {
"posix" : 1691502278774,
"iso-8601" : "2023-08-08T14:44+0100"
},
"creator" : "admin"
},
"tags" : [ ],
"icon" : "General/globalsNa",
"comments" : "",
"display-name" : ""
} ],
"comments" : "Failed",
"color" : "black",
"icon" : "General/globalsNa",
"tags" : [ ],
"meta-info" : {
"lock" : "unlocked",
"validation-state" : "ok",
"last-modify-time" : {
"posix" : 1691502578940,
"iso-8601" : "2023-08-08T14:49+0100"
},
"last-modifier" : "admin",
"creation-time" : {
"posix" : 1691502278768,
"iso-8601" : "2023-08-08T14:44+0100"
},
"creator" : "admin"
},
"read-only" : false,
"available-actions" : {
"edit" : "true",
"delete" : "true",
"clone" : "false"
}
} ]
}
This is the process tree on the server while the script is running:
244 `- udevd
7305 `- auditd
9991 `- pm
10009 `- confd
10010 `- searchd
10012 `- rconfd
6179 `- rconfd-temp-scr
6464 `- rsync
the long running command:
--password-file=${RSYNC_PASSWORD_FILE} \
--stats \
--itemize-changes \
--out-format="%n" \
--delete \
--delete-after \
--archive \
--bwlimit=500000 \
"${STAGING_DIR}" ${RSYNC_USER}@${RSYNC_HOST}::${RSYNC_MODULE} && \
retval=0 || retval=1
Is there something I'm missing?
any help is appreciated, thank you
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Run script command timeout is 5 minutes.
You can't set it to more than 5 minutes.
If the script on the GW takes a lot of time, maybe alternatives should be examined.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please open a TAC request for this: https://help.checkpoint.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @_Val_ thanks for your reply,
I'll return from vacation on the 21st of August and open the TAC request then
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am not sure why you would run such a script from the API directly.
In Ansible I have created jobs that copy such scripts to the target and configure the scripts and the cron entry on a daily basis.
So Ansible is in control of HOW it is done exactly and it makes sure it is there on each node that qualifies (say on each SmartCenter that we manage).
The unit itself takes care of doing the job as insctructed. Like making a backup file daily.
The Ansible comes in and collects the results.
A siimilar approach can be done through the API as you can copy a script file to the unit.
So rethinking the workflow might be the better solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Hugo_vd_Kooij, thanks for taking the time to respond,
I understand your suggestion and I also considered ansible for this, however I wanted to try the leanest possible approach first.
Usage of the API alone fits our environment perfectly, since the jobs would be triggered from our CI/CD platform, which is already tightly integrated with our daily workflows (our single pane of glass).
This would also require only HTTPS access to the boxes, whereas with ansible we will need SSH.
With ansible we would need to provision and manage more than what I currently think is reasonable for such a simple task.
The API docs clearly state a timeout script and I assume the sort of behaviour I encountered is unexpected, and this is why I posted this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Run script command timeout is 5 minutes.
You can't set it to more than 5 minutes.
If the script on the GW takes a lot of time, maybe alternatives should be examined.
