Skyline - a new version is now released

Elad_Chomsky · ‎2024-01-17

A new version of Skyline is now released. You can see here and here for the required AutoUpdater packages.

What’s new in Skyline?

Skyline is now supported as EA on SMB Quantum Spark starting R81.10.08
On Quantum – Additional support for new third-party export targets, and multiple support targets. See sk181863. ( Supported are DynaTrace, SolarWinds, Splunk, and variants of Prometheus – VictoriaMetrics and AWS Managed Prometheus )
On Quantum – The ability to filter metrics ( See sk178566 )

Coming Soon ( First half of 2024 ) :

On Quantum – The ability to use Custom scripts to configure your metrics
On Quantum – New metrics for Skyline, Including:

Information on installed jumbos
license status on blades
On Multi-Domain Server – The domain names will be added as labels
Information on processes ( Memory, usage, disk, and uptime )
Information based on CheckPoint "WatchDog" on processes activation.

Alexander_Wilke · ‎2024-01-17

I miss the pull model from prometheus.

It makes sure that the target device is "up". You can monitor the status of OpenTelemetry connection and you can be sure you receive data.

You can centrally define the scrape interval. You can use "metrics_relabel_config" to add additional custom labels per scrape.
You can monitor scrape durations and see if the scrape duration takes too long.

Prometheus PULL has the ability that all targets in a job will we loadbalanced over the scrape_interval. This will prevent the situation that all devices will be scraped at the same time. At the moment if all GWs send metrics at the same time it may lead to congestion on prometheus side.

PS:
Can you send different sets of metrics to different targets or only one set to all targets?

I would like to see different metrics sets I can scrape with different intervals.
I do not need jumbo version, licenses, disk usage and uptime, enabled blöades, IPS signature staus every 15s. I need this every hour or once a day. This would reduce load on generating metrics on the GW.

In addition it would allow me to scrape a very small set of metrics eg. every 5s like CPU usage and throughput.

PPS:
If you are worried that someone may scrape the GW to often then just create a cache and set a lower limit of 15s and if Prometheus scrapes the GW and the cache is not older than 15s then just provide the old metrics and do not collect a new set.

Elad_Chomsky · ‎2024-01-17

Hi @Alexander_Wilke ,

1) Unfortunately Prometheus PULL method and scraping is not on the roadmap, some Security considerations were brought up, which caused us to put it aside for now.

2) The new metrics will come as a new method of data collection from the machine, which will be with a set interval per collection, we will take your feedback and see what we can do to allowing to set it - or changing the initial intervals

3) For now the filtering is for all the exporters, but as part of the roadmap - we will add the option to set a separate filter.

4) CPView metrics currently have a set interval - We will try to see if we can additional functionality there as part of our roadmap.

Alexander_Wilke · ‎2024-01-17

Hello Elad,

I would be interested in the "Security considerations" because from our security design in the DataCenter we have security concerns if a perimeter Firewall or DMZ Firewall is allowed to send traffic from risky/untrusted zones into a secure internal zone.

Our idea is that we can reach DMZ/perimeter devices only from internal. If an attacker gets access to the perimeter Firewall he may have the possibility to initiate connections to the internal device.

However the PULL method allows to monitor the device actively. If we do not reach it the "up" status in prometheus will change an we can get an alert and we can check whats happening. At the moment we must trust that data will reach.

Maybe you can add some metrics and information at least on the gateway which tell us the status of the connection:

- samples send

- time to collect data on the GW

- time to push data to the targets

- push intervall

- timestamps of the last 10 disconnectes (target/prometheus not reachable)

- timestamps of the last 10 successfully reconnects AFTER a failed connection to prometheus

- maybe add these information as an additional metric with labels which will be pushed to prometheus, too.

- maybe add the possibility to generate a syslog entry if the "OpenTelemetry" system on the gateway restarted, connected, disconnected.

Alexander_Wilke · ‎2024-01-18

Hi again,

I wanto add a note. Please keep in mind there are customers outside in the real world which are not allowed to connect their gateways to the internet. It is annoying to download:

CPdepInst
AutoUpdater
CPOtelcol
CPviewExporter
4 packages as offline packages to do the upgrades.

And all these information are in 4 different SKs.

Maybe add a bundle - CPOtelcol and CPviewExporter sound like we need them always both.

Elad_Chomsky · ‎2024-01-18

Hi @Alexander_Wilke ,

Please open an RFE request in front of CheckPoint support, and we will see what we can do to promote this, the current status is that this is not part of the future plans for now.

Regarding your suggestions - I have noted them and we will see what we can do to push them as part of the roadmap.

Hugo_vd_Kooij · ‎2024-01-18

I thinks a lot of us have serious security consideration when it comes to devices sending data to a serve in a more secured network.

<< We make miracles happen while you wait. The impossible jobs take just a wee bit longer. >>

Alexander_Wilke · ‎2024-01-22

Seems like latest version of Telemetry Colector and CPviewExporter has issues with the "sklnctl" commands.

If you add several times the same payload you end up with additional certificates if you run "sklnctl --show_open_telemetry".

I additon if the payload enableds or disables the configuration this is not respected. "sklnctl --show_open_telemetry" still shows enabled.

So if someone else has the same issue I used as a workaround this command:

/opt/CPotelcol/REST.py --set_open_telemetry "$(cat /home/admin/skyline_payload.json)"

It is deprecated but it saves the correct values. However - now I have around 20x the same identical certificate in "show_open_telemetry".

Will open a ticket

Elad_Chomsky · ‎2024-01-22

Hi @Alexander_Wilke ,

As the amount of support increased, we changed the behavior so each call will aggregate the certificates instead of replacing or removing it as before, as it didn't comply well with multiple export targets with different types, So this is an expected behavior in the new tool.

We will take the command issue internally to see how we can present it better, and also maybe add a flag to allow you to clean the file.

Alexander_Wilke · ‎2024-01-22

Do not add the exactly same certificate several times. If there are different certificates it is ok (maybe) but if not then do not add the same certificate several times.

Elad_Chomsky · ‎2024-01-23

Thank you, We will take the suggestion and try to see if we can implement it.

Vincent_Bacher · ‎2024-02-03

Metrics using custom scripts and metrics will be very interesting and useful for us to get rid of zabbix agent.

and now to something completely different - CCVS, CCAS, CCTE, CCCS, CCSM elite

milunb · ‎2024-02-27

Hi Elad,

Can you tell me what the problem is when I configure Orchestrator to send data to multiple locations, it only sends to one.

Br

Elad_Chomsky · ‎2024-02-27

Hi @milunb ,

In case of any issues, try to use the old method: /opt/CPotelcol/REST.py --set_open_telemetry "$(cat payload.json)"

milunb · ‎2024-02-27

Hi, now is working
Thanks

Are you a member of CheckMates?

Skyline - a new version is now released