AnsweredAssumed Answered

R80.10 take 42 - Smartevent - log_indexer always crashed (often 100% CPU usage)

Question asked by Simon Drapeau on Dec 10, 2017
Latest reply on Jan 11, 2018 by Simon Drapeau

R80.10 distributed architecture

 

  • 2 X 15600 appliances in VSX cluster VSLS (one VS currently active) .... 7 others coming
  • SMART-1 3150 appliance (management server)

 

Since last week, smartevent is not able to index all the logs at the right time. On heavy load, we got a 3-4 hours delay with the real time.

 

DIAGNOSTIC : 

 

I get a verbose error message about log_indexer when the logs are loaded just after the process crashes:

  • A message in the file: /opt/Cprt-R80/log_indexer.elg indicates that an error might have occured. The message is: [log_indexet 13211 4053793680]@fw_name[Date time] SolrClient::Send: connection failure with 127.0.0.1:8210 (culr error: )(curl error number:56). This message indicates that indexer process (log_indexer) coudn^t send the logs to the Log Database engine.

 

I tried to change the priority of this process to help the system to prioritize this function.

 

The default priority for the log_indexer process is 19. I changed the priority of the log_indexer process to a better priority.  The better priority available is 0.

  • renice -n 0 -p <pid number>

 

No significant improvement.

 

Ticket opened, TAC said :  

  • Log_indexer process crashing every 5-20 minutes.
  • Log_indexer.elg shows “I’m sleep” / connection failed to 127.0.0.1
  • All other processes appear to be working correctly
  • Log_Indexer consuming 100 CPU
    TROUBLESHOOTING:
  • Referenced previous tickets all point to fresh install.
  • R&D will be our next step.

 

Any hints regarding this issue ? no more idea.

 

regards

 

Simon 

Outcomes