Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
David_Evans
Contributor

Skyline "Cluster Object" which has the stats for active member

I've just started working with our Skyline install and I have been unable to find one feature I was really hoping for was some object that would allow grafana to understand a cluster object in such a way that would allow you to show the data for items like CPU and networks stats for the active member, whatever that member is, seamlessly before and after a cluster fail over.

Has anyone found a way or already existing value that can easily display the data from the active member only, over time?
With the access level of the scripts that are producing the data and sending it on to prometheus it would see like this could be a value that is set.   ie,  just its cluster name from within smart console.


use case:
I have implemented on several platforms what I call "anomaly detection lite".    In graphs I overlay the real time data from this hour with this time slot one week prior.   I really like to be able to overlay the average of the previous 3 weeks time slot to the current real time data.   This instantly gives support a view of "is this is normal?".

If you get some good thresholds over time with this, you can even do anomaly alerting with kind of data.

However, this does not work if your active and standby members switch if your using something that just queries the real IP's of your cluster members.
(it also has issues with things like time changes, holidays... etc,  but there are work arounds for those.)

10 Replies
Arik_Ovtracht
Employee
Employee

Hi @David_Evans

did you check out the cluster dashboard here? While it was not created by us (the Skyline dev team), I do think it should help you at least get started.

Also check the ClusterXL metrics here.

0 Kudos
David_Evans
Contributor

I've seen that info, but how do I find the other two pieces of the below data if I only have one?   The cluster stats dashboard doesn't have any way to find that or display just the active member.   

1.ClusterName

2. ClusterMemberA

3. ClusterMemberB

 

I want to be able to select ClusterName in a dashboard and have it show me the data for the ONLY the active member of the cluster over time.   If I choose 1 day for the time frame, and that cluster's active member has failed between A and B 5 times over that 1 day, I want to see one CPU graph that is seamless over time of just the active member.
I'd also like to be able to select ClusterMemberA in a dashboard and be able to know what ClusterName and ClusterMemberB  are without having to know anything about our naming convention to make it work.    I'd use this to either display info about  the other member in the current dashboard or provide links to the information to spawn a whole different dashboard.
Most importantly have consistent alerting and visibility after something like a CDT rollout of a new jumbo to hundreds of firewalls where the active member has changed on most of them over a weekend.   Something like that is when it is very nice to be able to see quickly how a Jumbo or OS upgrade affected the hardware after a change.

JozkoMrkvicka
Authority
Authority

I second this idea !

In case of cluster, it would be nice to have some way to display data only from active member without switching between different members. In case of snmp monitoring it is easy, just use VIP of cluster and you have all the time data from active member. Some similar logic would be nice to have also for Skyline while working with clusters (and VSX too).

Kind regards,
Jozko Mrkvicka
0 Kudos
Arik_Ovtracht
Employee
Employee

Ah, understood.

Unfortunately, the current version of Skyline does not provide the Cluster ID (which I believe is what you meant when you referred to "Cluster Name"), but we do intend to add it soon.

For now, you have the option to define both members of the cluster as the same 'environment', and then view each one in a 'Single Machine' dashboard (or create your own dashboard which shows all the members in a layout you define).

David_Evans
Contributor

Yes, cluster ID would work.   Yes there are some currently available ways to work around the issue but none are as convenient or dynamic as just having the ID to tie the members together.
One big plus I was hoping for skyline was to have easy access to this kind of data so that the work arounds and custom code I have had to implement in other monitoring tools to understand clustering, VSX, and maestro, would not be needed in Skyline.

I see that VSX has some of this implemented, but it sill suffers from the cluster issue.  
Maestro is also partially there but for every SG it lists all the blades I don't have setup as "lost".

These are the kinds of problems I was hoping to not have to solve in a custom provided monitoring tool.


Do we have any kind of a ETA for the ID to be included?

Arik_Ovtracht
Employee
Employee

@David_Evans I can't promise anything right now, but I believe it should be available in a few months.

0 Kudos
David_Evans
Contributor

Any updates on to when we might see this added to skyline?

0 Kudos
Elad_Chomsky
Employee
Employee

Hi @David_Evans ,

We are still aiming to add cluster data as part of our roadmap, it was delayed, don't have a concreate ETA yet.

In the meantime, we are aiming to release on Feb - March a new support for custom scripts, so any metrics not added yet, you will be able to add as a temporary solution on your side. 

0 Kudos
David_Evans
Contributor

We have had issues doing this when you want to also get the data from the real IP.   The current active member then gets queried twice, once via the VIP and once via its real IP.    We have seen issues where SNMP just becomes unresponsive / overloaded.    That is where you wind up having to get creative in working with how SNMP pulls the data and displays the data over time, with 'virtual objects' etc or checking early on in the SNMP cycle for the active member and then stopping the rest of the queries.   But all that complexity is what I was hoping to avoid with skyline.

0 Kudos
JozkoMrkvicka
Authority
Authority

I see. It makes more sense to monitor real IPs of cluster members only. It might be tricky to figure out which one was active at the specific time, but with some other SNMP queries it can be possible without losing too much time checking each and every member individually (in theory up to 13 members). Another option is just to overlay graph with specific data from all members (like number of concurrent connections). 1 member is supposed to handle all the traffic (split brain is another story 😄 ).

Another dirty solution would be to use different dedicated SNMP tool to get data only from the VIP.

Kind regards,
Jozko Mrkvicka
0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events