Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
marcyn
Collaborator
Collaborator

Issues with ThreatExtraction via API to ThreatCloud and to local gateway

Hi guys,

I'm struggling with ThreatExtraction for a long time now.
Unfortunately TAC was not helpful in this case, and that's why I decided to ask Community... who knows maybe someone faced this issue as well.

I'm using eval api key to test ThreatPrevention mechanisms in ThreatCloud environment and I have no issues with AntiVirus and ThreatEmulation ... but I can't figure out how to deal with ThreatExtraction.

Regarding ThreatExtration:
I followed https://app.swaggerhub.com/apis/Check-Point/Threat-Prevention-API/1.0 to get well formated request and I'm using curl to send API calls to ThreatCloud.
What I was expecting is a cleaned file (without any macros, malicious links /for example to phishing sites/, without embedded images, etc.) with a header saying "Check Point Threat Extraction secured this document" and link to "Get Original".
But for some reason I'm getting cleaned file that is almost the same as my source file (filesize is different, so it is not the same file - ThreatCloud created new one based on the file that I send) - with macros, links, embedded images, and without this header.
I'm struggling with TAC for more then two months now, and based on my experience I have no hope.
With ThreatEmulation and AntiVirus as I wrote on top - I have absolutely no issues, they work exactly as I expected.

As you see there is no appliance involved in this process - it's 100% cloud based.
And because of that I thought maybe I will check how it will look like on appliance (who knows maybe TEX doesn't work on cloud and needs local appliance ... I doubt that, but I wanted to check it as well).

Of course I followed: sk113599 and even sk137032 (but in my opinion this is obsolete).
So TEX API should be enabled localy (without api-key in TPAPI.ini I receive "{"response":[{"protocol_version":"1.1","src_ip":""}]}" with api-key I get 404).
But ... it doesn't work as well.
Probably there is some minor error, something I forgot to configure, ... maybe you will be able to point me into correct direction.

Below couple of requests and responses with TEX (for ThreatCloud, and local gateway):

1) ThreatCloud request:

curl -X 'POST' \
  'https://te-api.checkpoint.com/tecloud/api/v1/file/query' \
  -H 'accept: application/json' \
  -H 'Authorization: TE_API_KEY_123456789qwertyusdfg' \
  -H 'Content-Type: application/json' \
  -d '{
  "request": {
    "sha1": "430da81c8cdcd6aab6f16b875bfc22a5efa4aa49",
    "features": [
      "extraction"
    ],
    "file_name": "makro.docm",
    "extraction": {
      "extracted_parts_codes": [
        1034, 1026, 1019, 1018, 1139, 1142, 1143, 1141, 1150, 1151, 1137, 1021
      ],
      "method": "clean"
    }
  }
}'

2) ThreatCloud response (after I uploaded this file):

{
  "response": {
    "status": {
      "code": 1001,
      "label": "FOUND",
      "message": "The request has been fully answered."
    },
    "sha1": "430da81c8cdcd6aab6f16b875bfc22a5efa4aa49",
    "md5": "08a90a0170de5d8e4f1715d98ffda24e",
    "sha256": "00ea9b303473910be6f40208fac3f7779fd354b21f83baa672ee37f5086bfba7",
    "file_type": "",
    "file_name": "makro.docm",
    "features": [
      "extraction"
    ],
    "extraction": {
      "method": "clean",
      "extract_result": "CP_EXTRACT_RESULT_SUCCESS",
      "extracted_file_download_id": "94d66bcb-1a0d-401a-ba5e-9dd104e84be0",
      "output_file_name": "makro.cleaned.docm.docx",
      "time": "0.529",
      "extract_content": "Macros and Code",
      "extraction_data": {
        "input_extension": "docm",
        "input_real_extension": "docm",
        "message": "OK",
        "output_file_name": "makro.cleaned.docm.docx",
        "protection_name": "Extract potentially malicious content",
        "protection_type": "Content Removal",
        "protocol_version": "",
        "real_extension": "docm",
        "risk": 5,
        "scrub_activity": "Active content was extracted - DOCM file was saved as DOCX",
        "scrub_method": "Clean Document",
        "scrub_result": 0,
        "scrub_time": "0.529",
        "scrubbed_content": "Macros and Code"
      },
      "tex_product": false,
      "status": {
        "code": 1001,
        "label": "FOUND",
        "message": "The request has been fully answered."
      }
    }
  }
}

As you can see everything looks perfect - it should look like that.

But after I download file with id mentioned above ("extracted_file_download_id": "94d66bcb-1a0d-401a-ba5e-9dd104e84be0") ... I get what I get (already explained above).

3) ThreatCloud - download:

curl -X 'GET' \
  'https://te-api.checkpoint.com/tecloud/api/v1/file/download?id=94d66bcb-1a0d-401a-ba5e-9dd104e84be0' \
  -H 'accept: */*' \
  -H 'Authorization: TE_API_KEY_123456789qwertyusdfg'

 

Now as for local TEX part:
1) TEX local request (query):

curl -X POST -k https://10.1.1.1:18194/tecloud/api/v1/file/query -H accept: application/json -H Authorization: DT235ffzwEz8u777wSuIJmTq34D3VL -H Content-Type: application/json -d {
          "request": {
            "sha1": "f38abb67d47a4f69536ae67aa9c6df7287c08869",
            "features": [
              "extraction"
            ],
            "file_name": "mal/0d01b24f7666f9bccf0f16ea97e41e0bc26f4c49cdfb7a4dabcc0a494b44ec9b.docx",
            "extraction": {
              "method": "clean",
              "extracted_parts_codes": [ 1025, 1026, 1034, 1137, 1139, 1141, 1142, 1143, 1150, 1151, 1018, 1019, 1021 ]
            }
          }
        }

2) TEX local response (query):

{
   "response" : {
      "features" : [
         "extraction"
      ],
      "file_name" : "mal/0d01b24f7666f9bccf0f16ea97e41e0bc26f4c49cdfb7a4dabcc0a494b44ec9b.docx",
      "md5" : "3f326da2affb0f7f2a4c5c95ffc660cc",
      "sha1" : "f38abb67d47a4f69536ae67aa9c6df7287c08869",
      "sha256" : "0d01b24f7666f9bccf0f16ea97e41e0bc26f4c49",
      "status" : {
         "code" : 1004,
         "label" : "NOT_FOUND",
         "message" : "Couldn't find the requested file, please upload it"
      }
   }
}

3) so .. I'm uploading this file - request for upload:

curl -X POST -k https://10.1.1.1:18194/tecloud/api/v1/file/upload -H accept: application/json -H Authorization: DT235ffzwEz8u777wSuIJmTq34D3VL -H Content-Type: multipart/form-data -F request={
                "request": {
                    "file_name": "mal/0d01b24f7666f9bccf0f16ea97e41e0bc26f4c49cdfb7a4dabcc0a494b44ec9b.docx",
                    "file_type": "docx",
                     "features": [
                        "extraction"
                     ],
                    "extraction": {
                      "method": "clean",
                      "extracted_parts_codes": [ 1025, 1026, 1034, 1137, 1139, 1141, 1142, 1143, 1150, 1151, 1018, 1019, 1021 ]
                    }
                }
            } -F file=@mal/0d01b24f7666f9bccf0f16ea97e41e0bc26f4c49cdfb7a4dabcc0a494b44ec9b.docx;type=application/msword

4) and to my surprise ... here is the response for this request:

{
   "response" : {
      "features" : [
         "extraction"
      ],
      "file_name" : "mal/0d01b24f7666f9bccf0f16ea97e41e0bc26f4c49cdfb7a4dabcc0a494b44ec9b.docx",
      "file_type" : "docx",
      "md5" : "3f326da2affb0f7f2a4c5c95ffc660cc",
      "sha1" : "f38abb67d47a4f69536ae67aa9c6df7287c08869",
      "sha256" : "0d01b24f7666f9bccf0f16ea97e41e0bc26f4c49cdfb7a4dabcc0a494b44ec9b",
      "status" : {
         "code" : 1004,
         "label" : "NOT_FOUND",
         "message" : "Couldn't find the requested file, please upload it"
      }
   }
}

As you can see ... exactly the same one as for query.

And I'm struggling with this for a long time.
What I miss ... what is wrong with my approach ?
Again ... no issues at all with AV and TE (to ThreatCloud and to local appliance as well).

BTW
Of course I saw:
https://community.checkpoint.com/t5/Security-Gateways/Best-Practices-for-Threat-Prevention-API-Calls...   (but it is obsolete in my opinion ... these python files uses old method as in sk137032)
https://community.checkpoint.com/t5/Threat-Prevention/Demonstration-of-Threat-Prevention-API-on-a-lo...
But unfortunatelly these two articles are focused on TE.

Anybody ? 🙂

--
Best
m.

0 Kudos
2 Replies
StuartGreen
Employee
Employee

Hi Marcyn, I'm a bit confused about your testing as you mention you're using cloud only - but one of the examples shows a local appliance on port 18194 but using a /tecloud formatted query. I've just tested extraction via Sandblast Cloud directly and it works as expected for cleaning the original doc and converting to PDF. My queries are almost the same as yours, but on the upload request you don't need to specify which content types to remove (I left mine at default) and the file_name potentially has an invalid character (you've got a "/" character in there, plus you can leave the filename blank and the server will figure it out for you).

Extraction is fine for me and you can verify that it has been changed by comparing the hash values of the file uploaded against the one downloaded. If the only piece missing is that you're looking for the 'cleaned by Check Point' header - I think that's only possible using TEX on a local appliance. 

 

Stu

0 Kudos
marcyn
Collaborator
Collaborator

Hi Stuart,

Thank you for your feedback.

Ragarding "/" in file name - this is because this file was in subfolder, but yes this is invalid character for file_name so I modified a little bit my script and now it is fine.

After sending new request to ThreatCloud everything looks the same.
I realized that macros are indead removed (even before) but there still is a link to malicious site.
Probably TEX doesn't remove links - personally I thought that links in general will not be removed but these for sites that ThreatCloud knows as to be malicious (ex. phising) will be "cleaned" (ex. will be regular text, but not as "clicable" hyperlink).
But it looks like links are not taken into consiteration at all.

As for "extracted_parts_codes" - yes, I know they are not relevant for method=pdf, but they can be selected for method=clean (but not needed as there is default setting) 🙂

Maybe someone knows a document that clearly indicates what parts are cleaned by TEX (I mean something more specific then "dynamic content").

As for my local TEX - yes I'm using the same syntax as for ThreatCloud - because as far as I know it is valid syntax in these days.
The previous syntax (https://[SG_IP]/UserCheck/TPAPI) is obsolute - isn't that true ?
But anyway I also used this "old one" but here we have only "QueryFile", and "UploadFile" ... there is no "DownloadFile" 🙂
Anyway it doesn't work for me as well .... even it would work this syntax has major flaw (file should be uploaded into post variable "file_enc_data" as base64 ... which in almost all cases ends up with "too much data"). This "new" syntax allows to post file as multipart/form-data.

Ok so we can consider that ThreatCloud indead works ... I don't know why I saw macro in a cleaned file before ... but these links makes me curious 🙂
What about TEX in local appliance - does anyone of you faced this issue (using "tecloud" sytax) with upload response that I mentioned in my first message ?

As for header message - you're probably right, I also think that it is added only when using local appliance.
And I just want to confirm that ... but unfortunately have this issue with upload response in local apliance 🙂

Maybe I'm wrong, and this syntax that is used for ThreatCloud is not valid for local appliances ?
I think is should be valid based on this:

 

Accessing the API
URL Format
A request format depends on the required API Web service.
Access the API with the URL:
https://<service_address>/tecloud/api/<version>/file/<API_name>

service address:
te-api.checkpoint.com for the Check Point Threat Prevention cloud services, or the IP address of a TE appliance.
Note - To run Check Point Threat Prevention API on a local gateway, you must specify the port. The port is 18194.
https://<service_address>:18194/tecloud/api/file/query
(...)

 

(taken from API reference guide for ThreatPrevention: https://app.swaggerhub.com/apis/Check-Point/Threat-Prevention-API/1.0)

--
Best
m.

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events