Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
ClaranetPT
Explorer
Jump to solution

API 1.9 run-script ignores the timeout setting

Hi,

 

Click to Expand

 FW1 build number:
This is Check Point Security Management Server R81.20 - Build 440
This is Check Point's software version R81.20 - Build 703

I've recently come across something I assume is a bug or at least a limitation of the API run-script call.

My use case is invoking a script that uploads log files ( *.log and *.adtlog ) from an MDS running R81.20 to a remote server using the rsync command. This can easily require a very long time, since daily we will need to upload several GB of data from several firewalls.

This server is a fresh install used only for testing.

My previous tests with API 1.8 on an older R81.10 failed because of the 5 minutes hardcoded API timeout but since the 1.9 version claims the timeout is now configurable, I thought it could provide an elegant solution to this problem.

I wrote two scripts, the first talks to the MDS API, the second is uploaded to the server and run using the run-script endpoint.

the relevant snippets from the first script are:

function do_login {
    echo login to server
    SID=$(https ${HTTPS_OPTS} POST ${checkpoint_host}/web_api/login user="$username" password="$password" session-timeout:=3600 | jq '.sid' -r)
    https --offline ${HTTPS_OPTS_WITH_SESSION} ${checkpoint_host} X-chkp-sid:"$SID" Content-Type:application/json Accept:application/json
}

This is simple login logic with a session timeout of one hour, the SID is then set to the variable $HTTPS_OPTS_WITH_SESSION, which is used throughout the rest of the script.

function do_run_script () {
    TASKID=$(https ${HTTPS_OPTS_WITH_SESSION} \
        POST ${checkpoint_host}/web_api/run-script \
        script-name=checkpoint-log-backup \
        targets="$checkpoint_object_name" \
        script=@${script_file} \
        timeout:=3600 \
        args="some_secret '$RSYNCD_ARGS_BASE64'" | \
        tee script-log | \
        jq --raw-output '.tasks[0]["task-id"]'
    )

    echo "created task: $TASKID"
    while true; do
        https ${HTTPS_OPTS_WITH_SESSION} POST ${checkpoint_host}/web_api/show-task \
            task-id="$TASKID" details-level=full | \
            jq '.tasks[0]' > out.json
        status=$(cat out.json | jq --raw-output '.status')
        progress=$(cat out.json | jq --raw-output '.["progress-percentage"]')
        echo "$status... ${progress}%"
        if [[ "$status" == "in progress" ]]; then
            https ${HTTPS_OPTS_WITH_SESSION} POST ${checkpoint_host}/web_api/keepalive &> /dev/null
            sleep 3
        else
            cat out.json | jq --from-file show_task_log.jq --raw-output
            break
        fi
    done
}

This creates the task, takes the task_id, and then loops while the task is in progress. The timeout is set to 3600 seconds, and the API accepts it as such, however this always fails with a timeout status if the uploaded script takes more than 5 minutes to run.

 

This is the output of the trial runs:

$ time bash test.sh
...snipped
created task: 823d93fd-f643-4c62-b018-dd6d3fc80c07
in progress... 10%
...snipped
in progress... 10%
failed... 100%
STDOUT
Operation timed out
END STDOUT

STDERR

END STDERR
logout from server

real    5m5.171s
user    0m36.334s
sys     0m3.925s

Notice how the progress is always set to 10%. I assume that the API is calculating the progress using the timeout of 3600.

However when the 5 minute mark is reached, the status is set to timeout, even though the script continues to run in the server.

In fact I found that the upload always completes, regardless of the time it takes. This suggests that something in the API code is ignoring the timeout setting and eagerly marking the process as timed-out, but not killing it...

I've confirmed this also happens with the mgmt_cli:

mgmt_cli --format json --user admin --password some_password run-script script-name teste script "sleep 600" timeout 1200 targets.1 mds-target 


---------------------------------------------
Time: [14:44:39] 8/8/2023
---------------------------------------------
"tlx-mds-lab - teste"  in progress  (10%)  


---------------------------------------------
Time: [14:44:49] 8/8/2023
---------------------------------------------
"tlx-mds-lab - teste"  in progress  (10%)  
...
---------------------------------------------
Time: [14:49:30] 8/8/2023
---------------------------------------------
"tlx-mds-lab - teste"  in progress  (10%)  


---------------------------------------------
Time: [14:49:40] 8/8/2023
---------------------------------------------
"tlx-mds-lab - teste"  failed  (100%)  
{
  "tasks" : [ {
    "uid" : "1144c12c-8b16-4888-b45d-a23199578739",
    "name" : "mds-target - teste",
    "type" : "CdmTaskNotification",
    "domain" : {
      "uid" : "a0eebc99-afed-4ef8-bb6d-fedfedfedfed",
      "name" : "System Data",
      "domain-type" : "mds"
    },
    "task-id" : "4cf21a37-cdbd-4860-9cfb-9907eed08540",
    "task-name" : "mds-target - teste",
    "status" : "failed",
    "progress-percentage" : 100,
    "start-time" : {
      "posix" : 1691502278741,
      "iso-8601" : "2023-08-08T14:44+0100"
    },
    "last-update-time" : {
      "posix" : 1691502578913,
      "iso-8601" : "2023-08-08T14:49+0100"
    },
    "suppressed" : false,
    "task-details" : [ {
      "uid" : "8c03189b-f28a-49d4-8e83-7848d98041d0",
      "domain" : {
        "uid" : "a0eebc99-afed-4ef8-bb6d-fedfedfedfed",
        "name" : "System Data",
        "domain-type" : "mds"
      },
      "color" : "black",
      "statusCode" : "failed",
      "statusDescription" : "Operation timed out",
      "taskNotification" : "1144c12c-8b16-4888-b45d-a23199578739",
      "gatewayId" : "7ea4f133-9497-0d43-8c8d-736a85da2e53",
      "gatewayName" : "",
      "transactionId" : 713187613,
      "responseMessage" : "T3BlcmF0aW9uIHRpbWVkIG91dA==\n",
      "responseError" : "",
      "meta-info" : {
        "validation-state" : "ok",
        "last-modify-time" : {
          "posix" : 1691502579540,
          "iso-8601" : "2023-08-08T14:49+0100"
        },
        "last-modifier" : "admin",
        "creation-time" : {
          "posix" : 1691502278774,
          "iso-8601" : "2023-08-08T14:44+0100"
        },
        "creator" : "admin"
      },
      "tags" : [ ],
      "icon" : "General/globalsNa",
      "comments" : "",
      "display-name" : ""
    } ],
    "comments" : "Failed",
    "color" : "black",
    "icon" : "General/globalsNa",
    "tags" : [ ],
    "meta-info" : {
      "lock" : "unlocked",
      "validation-state" : "ok",
      "last-modify-time" : {
        "posix" : 1691502578940,
        "iso-8601" : "2023-08-08T14:49+0100"
      },
      "last-modifier" : "admin",
      "creation-time" : {
        "posix" : 1691502278768,
        "iso-8601" : "2023-08-08T14:44+0100"
      },
      "creator" : "admin"
    },
    "read-only" : false,
    "available-actions" : {
      "edit" : "true",
      "delete" : "true",
      "clone" : "false"
    }
  } ]
}


This is the process tree on the server while the script is running:

Click to Expand
1 init
244 `- udevd
7305 `- auditd
9991 `- pm
10009 `- confd
10010 `- searchd
10012 `- rconfd
6179 `- rconfd-temp-scr
6464 `- rsync

the long running command:

Click to Expand
rsync \
--password-file=${RSYNC_PASSWORD_FILE} \
--stats \
--itemize-changes \
--out-format="%n" \
--delete \
--delete-after \
--archive \
--bwlimit=500000 \
"${STAGING_DIR}" ${RSYNC_USER}@${RSYNC_HOST}::${RSYNC_MODULE} && \
retval=0 || retval=1


Is there something I'm missing?

any help is appreciated, thank you

 

0 Kudos
1 Solution

Accepted Solutions
Youssef_Obeidal
Employee
Employee

Run script command timeout is 5 minutes.
You can't set it to more than 5 minutes.

If the script on the GW takes a lot of time, maybe alternatives should be examined.

View solution in original post

5 Replies
_Val_
Admin
Admin

Please open a TAC request for this: https://help.checkpoint.com

0 Kudos
ClaranetPT
Explorer

Hi @_Val_ thanks for your reply,

I'll return from vacation on the 21st of August and open the TAC request then

0 Kudos
Hugo_vd_Kooij
Advisor

I am not sure why you would run such a script from the API directly.

In Ansible I have created jobs that copy such scripts to the target and configure the scripts and the cron entry on a daily basis.

So Ansible is in control of HOW it is done exactly and it makes sure it is there on each node that qualifies (say on each SmartCenter that we manage).

The unit itself takes care of doing the job as insctructed. Like making a backup file daily.

The Ansible comes in and collects the results. 

A siimilar approach can be done through the API as you can copy a script file to the unit.

So rethinking the workflow might be the better solution.

<< We make miracles happen while you wait. The impossible jobs take just a wee bit longer. >>
0 Kudos
ClaranetPT
Explorer

Hi @Hugo_vd_Kooij, thanks for taking the time to respond,

I understand your suggestion and I also considered ansible for this, however I wanted to try the leanest possible approach first.

Usage of the API alone fits our environment perfectly, since the jobs would be triggered from our CI/CD platform, which is already tightly integrated with our daily workflows (our single pane of glass).

This would also require only HTTPS access to the boxes, whereas with ansible we will need SSH.

With ansible we would need to provision and manage more than what I currently think is reasonable for such a simple task.

The API docs clearly state a timeout script and I assume the sort of behaviour I encountered is unexpected, and this is why I posted this.

0 Kudos
Youssef_Obeidal
Employee
Employee

Run script command timeout is 5 minutes.
You can't set it to more than 5 minutes.

If the script on the GW takes a lot of time, maybe alternatives should be examined.

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events