Hello Elad,
I would be interested in the "Security considerations" because from our security design in the DataCenter we have security concerns if a perimeter Firewall or DMZ Firewall is allowed to send traffic from risky/untrusted zones into a secure internal zone.
Our idea is that we can reach DMZ/perimeter devices only from internal. If an attacker gets access to the perimeter Firewall he may have the possibility to initiate connections to the internal device.
However the PULL method allows to monitor the device actively. If we do not reach it the "up" status in prometheus will change an we can get an alert and we can check whats happening. At the moment we must trust that data will reach.
Maybe you can add some metrics and information at least on the gateway which tell us the status of the connection:
- samples send
- time to collect data on the GW
- time to push data to the targets
- push intervall
- timestamps of the last 10 disconnectes (target/prometheus not reachable)
- timestamps of the last 10 successfully reconnects AFTER a failed connection to prometheus
- maybe add these information as an additional metric with labels which will be pushed to prometheus, too.
- maybe add the possibility to generate a syslog entry if the "OpenTelemetry" system on the gateway restarted, connected, disconnected.