Create a Post
cancel
Showing results for 
Search instead for 
Did you mean: 

Script to check health on SMB

Hello SMB Admins!

I am writing a script that is checking some key SMB sensors and sending e-mail should any of them goes outside pre-define thresholds.

It is currently evaluating:

- Cooling fan speed
- CPU temperature
- Motherboard temperature
- Amount of free OS memory
- Free disk space on /storage, /logs and /pfrm2.0 volumes
- Load average (1 min)

- Voltage readings 

- Core dump 

- ICMP probes

The script is written in LUA which is very light and fast embedded script language. You can review it with any text editor.

It currently works on 14x0 appliances only.

If you want to try it:

1. Copy the attached two files to the appliance. You may put them in /logs/smb_health_check/ so that they are not deleted on firmware upgrade. Or anywhere else, doesn't matter.

2. Using vi, edit smb_health_check.cfg. Set as a minimum, e-mail sender and recipient and IP of your mail server (which must be accessible from the appliance of course).

3. You may check pre-defined thresholds for the different sensors and adjust them as needed for your environment. 

4. Test-run the script:

   expert# chmod 700 smb_check_health.lua
   expert# ./smb_health_check.lua

5. If you like, run it as a cron job every 5 minutes or so.

Careful with the comma's in the config file or script will fail Smiley Happy

Questions or suggestions are welcome. Mind that I am still working on it so not everything is polished yet.

3 Replies

If you are going to place that script in crontab you will need to do it again after firmware upgrade because crontab will be reset. Keep that in mind.

Contributor

Hello Hristo,

 Thanks for sharing the script. We were testing it and had a doubt about the tresholds used, did you take them from some CheckPoint doc/sk or any other reference? I was searching something similar to  sk119232 for SMB appliances to adjust tresholds but no luck. 

0 Kudos
Reply

Hello Daniel,

If you are talking about temperature and voltage thresholds then I took them from 'show diag' cli command:

...

On board temperature: 50.0C (valid: -5C ~ 85C)
CPU temperature: 55.0C (valid: 0C ~ 105C)
Voltage VDD_3P3V: 3.3880V (valid: 3.1255V ~ 3.4755V)
Voltage VDD_1P05V: 1.064V (valid: 0.988V ~ 1.113V)
Voltage VDD_5V: 5.135V (valid: 4.722V ~ 5.282V)
Voltage DDR_VTT_0P75V: 0.756V (valid: 0.703V ~ 0.798V)
Voltage VDD_DDR_1P5V: 1.5140V (valid: 1.4155V ~ 1.5855V)
Voltage VDD_1P03V: 1.028V (valid: 0.969V ~ 1.092V)
Voltage VDD_CPU: 1.080V (valid: 0.855V ~ 1.128V)
Voltage VDD_0P9V: 0.9100V (valid: 0.8455V ~ 0.9555V)
Voltage VDD_1P8V: 1.8220V (valid: 1.7005V ~ 1.9005V)
 

0 Kudos
Reply