Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 
Charles_Palmer
Contributor

Management Server High-CPU post upgrade to R80.30 from R80.10

About a month ago, I upgrade my Smart-1 410 model Management Server from R80.10 to R80.30 and installed Take 50 immediately. I did an upgrade, not a clean install. I had a few issues with high CPU and contacted support and we ended up installing Take 76 on my management server to address a high-CPU issue with Java. This seems to have corrected the high-CPU from Java issue. I still had high-CPU from postgres processes. After a few hours, those settled down and it operated normally the rest of the week. On Saturday, I had the processors spike to near-100% and stay that way until late Monday/early Tuesday and then it cleared up again. It was the postgres processes that were consuming the processor. While observing it, postgres process consume the processor for about 45 minutes out of every hour with a break of about 15 minutes. This is enough to have my Indeni monitoring put the management server into cooldown and start monitoring it again only to have it spike while in cool down and therefore Indeni stops its normal interrogation and limits it to only CPU and Memory monitoring. I have tried to address this with support and they don't have any further guidance for me thus far. This is the third weekend since my upgrade where this process has happened.

This screams of some scheduled process that is running that takes high-CPU, but I don't know what it might be. I may have just reached the end of the cycle for this week as it has been almost 20 minutes since the CPU stopped being high this time. But it generally has been 2-3 days of mostly high-CPU on my management server starting sometime on Saturday.

Thank you for any guidance or assistance in what I should check to figure out what is causing this high-CPU condition each week.

9 Replies
PhoneBoy
Admin
Admin

I know there are some periodic processes that can consume a lot of CPU but at low priority.
Which means: if something else needs the CPU, it backs off, but if nothing needs the CPU, it will use it.
That might be the case here, but without seeing output of ps -auxwww, I'm not 100% sure that's the case here.
0 Kudos
Charles_Palmer
Contributor

Thank you for your reply.

I will keep an eye on it and if the problem happens this weekend for the 4th time, I will run "ps -auxwww" and save the output to a file and I can upload it for your review at that point. You may be right from the standpoint that it is low priority because I haven't notices any performance issues that had me hunting. Indeni reported the sustained high-CPU which put it into only monitor CPU and Memory mode repeatedly that brought it to my attention. 

Is there anything else I should collect besides that (looks like that is a pretty comprehensive chunk of data dumped already) I should run as well?

 

0 Kudos
Chris_Atkinson
Employee Employee
Employee

By way of background are SmartEvent and or Compliance Blade enabled on the Smart-1 410, how about any OPSEC connections?

CCSM R77/R80/ELITE
0 Kudos
Charles_Palmer
Contributor

Yes, SmartEvent Server and Correlation Unit are both enabled as well as Compliance. Additionally, Network Policy Management, Logging & Status/Identity Logging and User Directory are also checked. Endpoint Policy Management is not check because we don't use it. Workflow is greyed out and uncheck while Provisioning is greyed out but checked.

 

Charles_Palmer
Contributor

I have the high-CPU situation this morning (though not as extreme as it was the previous three weekends) and I ran the ps -auxwww as requested. I have it saved to a file. Should I just post the contents into one of these messages? If not, how shall I get the results to you?

0 Kudos
Charles_Palmer
Contributor

I missed the little paperclip until I had already clicked the Post button.

0 Kudos
Timothy_Hall
Legend Legend
Legend

Looking at your ps output there is some low-priority SOLR log indexing going on, but the number of postgres: related processes and their CPU utilization looks far too high for the resource profile assigned by default to a Smart-1 410.  Not sure if spawned postgres processes are getting "stuck" or what (Parent process IDs are not shown in your output) but I'd say a TAC case is definitely in order here as that doesn't look right to me. 

Are you using any third-party log analysis tools that might account for the postgres activity?

 

Gateway Performance Optimization R81.20 Course
now available at maxpowerfirewalls.com
0 Kudos
Charles_Palmer
Contributor

Unfortunately, I have already tried to address this with TAC twice and they aren't seeing any problem or at least not any explanation for the problem. I do not have any third-party log analysis running at current. I do have Indeni doing performance and best practices monitoring, but it doesn't mess with the logs that I am aware of. Indeni is what tipped me off to the problem initially because the first 3 weekends, my CPU was pegged for 45+ minutes out of every hour which had Indeni going to cooldown monitoring where it only checks CPU and Memory until such time as the CPU is not pegged. I didn't get the same email explosion from Indeni this weekend that I did the previous three weekends (it was emailing me about hourly about the issue on the previous weekends and I only got one notification this month). This is telling me that while it is high right now, it isn't staying high for most of the hour like it was before. Maybe whatever it was is settling down now that I am a month into my upgrade from 80.10.

 

0 Kudos
Chris_Atkinson
Employee Employee
Employee

If you have the option/ability to run without compliance blade enabled for a period it's something you could try as a method of isolating the symptoms further.

Additionally there are further SmartEvent CPU optimizations in JHF T107 (ongoing).

If the load persists longer term without resolution you may need to look at distributing roles (SmartEvent) onto other VMs or hardware to alleviate. 

Refer also: R80.X Security Management Performance Tuning Guide

CCSM R77/R80/ELITE
0 Kudos

Leaderboard

Epsum factorial non deposit quid pro quo hic escorol.

Upcoming Events

    CheckMates Events