Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
cjrnz
Participant
Jump to solution

CPotelcol (OpenTelemetry Collector) automatic update gone awry?

Today one of our old CloudGuard gateways started burning an abnormal amount of CPU.

The process I found was "sklnctl".

/opt/CPotelcol/sklnctl collector --upgrade /opt/CPotelcol/config.json --skip-validation --deprecated was the full command running, and appeared to have been launched by /var/log/AutoUpdater/metadata/diagnostics/CPotelcol/CPotelcol_AutoUpdate/68/product_scripts/post_action.sh

I ended up killing the process after it had been running for 3hrs as I suspect it was never doing to finish doing whatever it was trying to do.  I did not manage to work out where it was logging to though so cannot tell precisely why it was not completing.
 
Is there any reason why the OpenTelemetry Collector is being pushed to old product versions that don't support it?  I can't help but wonder if this should not have been there in the first place, let alone actively running scripts.
 
(R80.10 CloudGuard VM which, before you ask, cannot be upgraded to anything newer, it's stuck there unsupported until the platform is retired.)
 
Cheers.
Chris.
0 Kudos
2 Solutions

Accepted Solutions
Elad_Chomsky
Employee
Employee

Hi All, A new release of Skyline in AutoUpdater was released to legacy versions ( Out of support versions, R80.10, R80.20, R80.30 ) - with older kernel which doesn't support Skyline (2.16), unintentionally, Please run the following workaround to resolve the issue,

1) kill -9 $(pidof sklnctl)

2) autoupdatercli disable CPotelcol

We are working on an immediate fix to this problem that should resolve the issue.

View solution in original post

Elad_Chomsky
Employee
Employee

We are working on releasing a new version in the upcoming days that should prevent the occurrence of this issue - We will publish an SK on this issue as well.  

View solution in original post

18 Replies
Elad_Chomsky
Employee
Employee

Hi @cjrnz ,

We are looking into this, we will update once we have more info.

Thanks, Elad

Douglas_Rich
Contributor

This is also happening on one of my lab VMs.  R80.30 simple firewall

got alerts in vmware for CPU usage

sklnctl @ 300% CPU in top

0 Kudos
cjrnz
Participant

Thanks, I see another one doing the same thing, I can leave it a while if more info is needed but I can't leave it like that when we enter business hours in ~90min or so.

0 Kudos
cjrnz
Participant

Belay that, two more now pegged at 100%.

0 Kudos
cjrnz
Participant

found the log, /opt/CPInstLog/AutoUpdateLogs/CPotelcol

Looks like it is stuck in an install / revert cycle, every 3 hours

0 Kudos
Elad_Chomsky
Employee
Employee

Hi All, A new release of Skyline in AutoUpdater was released to legacy versions ( Out of support versions, R80.10, R80.20, R80.30 ) - with older kernel which doesn't support Skyline (2.16), unintentionally, Please run the following workaround to resolve the issue,

1) kill -9 $(pidof sklnctl)

2) autoupdatercli disable CPotelcol

We are working on an immediate fix to this problem that should resolve the issue.

Rodrigo_Mezetti
Participant

I'm experiencing this problem with 4 x 15400 in version 80.10. It's been about 10 hours since the sklnctl process started using 500% of cpu and I had already seen the process, but now in the post I ran it safely and killed it and disabled it. It's back to normal.

Let's wait for new information. Thanks

0 Kudos
_Val_
Admin
Admin

@Rodrigo_Mezetti Please apply the workaround Elad mentioned to fix this. Also, R80.10 is out of support for a while now.

0 Kudos
Elad_Chomsky
Employee
Employee

We are working on releasing a new version in the upcoming days that should prevent the occurrence of this issue - We will publish an SK on this issue as well.  

MiguelHernandez
Employee
Employee

Hi Elad, please share the SK once is published.

customer is asking us for an official statement of the issue.

Thanks.

0 Kudos
stva
Employee
Employee

I had a remote session with Elad earlier today.  During the session Elad noticed, that terminating sklnctl on it's own is insufficuent, as AutoUpdater keeps on trying to re-install it.  As a result, steps that Elad suggested were to comment Check Point Open Telemetry Connector component from /opt/AutoUpdater/latest/conf/products_config.xml and kill AutoUpdater process (which will restart)

Below is the action plan that my customer enacted on their gateways:



Step 1) Disable CPotelcol AutoUpdater Component

[Expert@R8030host:0]# autoupdatercli disable CPotelcol
Updates state changed to off for component CPotelcol
[Expert@R8030host:0]#

Step 2) Edit /opt/AutoUpdater/latest/conf/products_config.xml and comment out the session for CPotelcol

[Expert@R8030host:0]# cd /opt/AutoUpdater/latest/conf
[Expert@R8030host:0]# vi products_config.xml

[Expert@R8030host:0]# diff -u products_config.xml.orig products_config.xml
--- products_config.xml.orig 2023-10-03 11:55:12.000000000 +0000
+++ products_config.xml 2023-10-03 11:56:11.000000000 +0000
@@ -415,7 +415,7 @@
</Product>
<Product name="diagnostics">
<Components>
- <Component name="CPotelcol" display_name="Check Point Open Telemetry Collector">
+ <!-- <Component name="CPotelcol" display_name="Check Point Open Telemetry Collector">
<Versions>
<unversioned branch="CPotelcol_AutoUpdate"/>
</Versions>
@@ -430,7 +430,7 @@
</postActions>
<postVerify script="post_verify.sh"/>
</InstallationPlan>
- </Component>
+ </Component> -->
<Component name="CPviewExporter" display_name="Check Point CPview Metrics Exporter">
<Versions>
<unversioned branch="CPviewExporter_AutoUpdate"/>
[Expert@R8030host:0]#


Step 3) Terminate the sklnctl and AutoUpdater processes (AutoUpdater process will restart)

[Expert@R8030host:0]# kill -9 $(pidof sklnctl) $(pidof AutoUpdater)
[Expert@R8030host:0]#

 

HTH.  HAND.

(1)
Patrick_Jung
Participant

Hello Stva,

I am an engineer at CCSP, and we have encountered a similar issue in our customer’s OS(15600 H.A) as previously described by Elad Chomski. Following the provided instructions, I have terminated the sklnctl process using the kill command and subsequently disabled the autoupdater.

I initially followed Elad's post, hence I have not killed the config.xml file and autoupdater process. However, your post suggests that executing these two additional steps appears necessary to prevent the re-installation of autoupdater.

Could you clarify if this will impact other daemon processes? Our customer utilizes FW, HTTPS inspection, APCL, and URLF, so I am particularly concerned about any unintended consequences of editing products_config.xml and terminating the autoupdater process.

Fortunately, it's been roughly an hour since I employed the "kill" command on the sklnctl process and disabled the autoupdater via CPotelcol. So far, the sklnctl has not reappeared.

May I request a prompt final resolution on this matter?

Best regards.

patrick.

0 Kudos
Elad_Chomsky
Employee
Employee

Hi All, I am working with the AutoUpdater R&D to ensure what are the recommended guidelines, if you still have any issue, please follow the guidelines above - SK should be released today or tomorrow on the issue. It will include all possible workarounds.

0 Kudos
Elad_Chomsky
Employee
Employee

After talking to the AutoUpdater R&D team the following was recommended,

1) To check there is no issue on the disable solution ( Run kill -9 $(pidof AutoUpdater) ) - And inspect the AutoUpdater log ( /opt/AutoUpdater/AutoUpdater.log ) if it is still trying to install the component ( Should be log records ), re-run the workaround using the steps @stva mentioned  . 

2) Notice that if your AutoUpdater version is below 990180269 ( Run 'cpvinfo /opt/AutoUpdater/latest/bin/AutoUpdater' ) it is recommended to start immediately from the workaround mentioned by @stva .

0 Kudos
Rodrigo_Mezetti
Participant

Guys, after I carried out the @stva procedures, the scanengine_k process started using a lot of CPU
7341 admin 15 0 827m 629m 99m S 834 1.0 150:28.85 scanengine_k

Is there something related or was it coincidence? Have you ever been through this?

0 Kudos
Elad_Chomsky
Employee
Employee

It is not related to this issue, I recommend to open a ticket to the CheckPoint support, so we assist you to investigate this issue.

0 Kudos
Elad_Chomsky
Employee
Employee

https://support.checkpoint.com/results/sk/sk181528

sk is now released on this issue

Patrick_Jung
Participant

Thank you for providing SK quickly.

0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events