check_netapp_shelfenv (Shelf-Environment)

Checks the shelf-status and various shelf-specific metrics and states on a NetApp-filer.

Description

This plugin checks status and collects metrics of shelves and their environment (temperature, cooling-devices, power-supplies, voltage-sensors, current-sensors).

Most of the checks do not receive any thresholds but rely on the ones set in DataONTAP. (Exception for temperature - see Advanced Examples below.)

The string or pattern given to the --exclude|-X and --include|-I parameters is matched against the instances full-name (e.g. ‘SBXTEST-01 channel0a shelf0 temp1’). This enables the checking of specific channels or shelfs but also single elements. See also the section Advanced Examples.

Available Subcommands

check_netapp_shelfenv allows you to monitor several aspects of the shelves. Not all of them are explained with examples here. At the time of this writing there are:

  • bay → Bay operational status.
  • boot-device → Boot device operational status
  • coin-battery → Coin battery operation status and voltage (mV)
  • current → Current sensor operational status and current (mA)
  • dimm → DIMM (dual in-line memory module) operational status
  • fan → Fan (cooling device) operational status and rpm
  • help → Help about any command
  • psu → Power supply unit’s operational status
  • shelf-status → Shelf’s operational status
  • temperature → Temperature sensor’s operational status and temperature reading(°C)
  • voltage → Voltage sensor operational status and voltage (V)

Please use the main commands --help for the latest list of checks.

Examples

Simple Examples

$ check_netapp_shelfenv shelf-status -H filer
NETAPP SHELFENVIRONMENT OK - 1 shelf checked
SBXTEST-01 channel0a shelf0: normal

Checks all shelfs. Returns CRITICAL if the status is not ’normal’


$ check_netapp_shelfenv temperature -H filer
NETAPP SHELFENVIRONMENT OK - 7 temperature-sensors checked
SBXTEST-01 channel0a shelf0 temp1: ok normal_temperature_range(26°C)
SBXTEST-01 channel0a shelf0 temp2: ok normal_temperature_range(34°C)
SBXTEST-01 channel0a shelf0 temp3: ok normal_temperature_range(30°C)
[...]
| SBXTEST-01_channel0a_shelf0_temp1=26°C;;;; SXJTEST-01_channel0a_shelf0_temp2=34°C;;;; SBXTEST-01_channel0a_shelf0_temp3=30°C;;;; [...]

Checks the temperature-sensors in all shelfs. Returns CRITICAL if one or more sensors report an error.


$ check_netapp_shelfenv temperature  -H filer --perfdata-uom-string=empty
NETAPP SHELFENVIRONMENT OK - 7 temperature-sensors checked
[...]
|  | SBXTEST-01_channel0a_shelf0_temp1=26;;;; SBXTEST-01_channel0a_shelf0_temp2=34;;;; SBXTEST-01_channel0a_shelf0_temp3=30;;;; [...]

Same as above but frees the perf-datas uom from the potential trouble-maker ‘°C’ (degree-symbol).


$ check_netapp_shelfenv fan -H filer
NETAPP SHELFENVIRONMENT OK - 4 fans checked
SBXTEST-01 channel0a shelf0 fan1: ok (2970rpm)
SBXTEST-01 channel0a shelf0 fan2: ok (3000rpm)
SBXTEST-01 channel0a shelf0 fan3: ok (3000rpm)
SBXTEST-01 channel0a shelf0 fan4: ok (3000rpm)
 | SBXTEST-01_channel0a_shelf0_fan1=2970rpm;;;0; SBXTEST-01_channel0a_shelf0_fan2=3000rpm;;;0; [...]

Checks all cooling-fans in all shelves.


$ check_netapp_shelfenv psu -H filer
NETAPP SHELFENVIRONMENT OK - 2 power-supplies checked
SBXTEST-01 channel0a shelf0 psu1(type: 9C): ok
SBXTEST-01 channel0a shelf0 psu2(type: 9C): ok

Checks all power-supplies in all shelves.


$ check_netapp_shelfenv voltage -H filer
NETAPP SHELFENVIRONMENT OK - 4 voltage-sensors checked
SBXTEST-01 channel0a shelf0 volt1: ok normal_operating_range (5.70V)
SBXTEST-01 channel0a shelf0 volt2: ok normal_operating_range (12.300V)
SBXTEST-01 channel0a shelf0 volt3: ok normal_operating_range (5.70V)
SBXTEST-01 channel0a shelf0 volt4: ok normal_operating_range (12.180V)
| SBXTEST-01_channel0a_shelf0_volt1=5.70V;;;; SBXTEST-01_channel0a_shelf0_volt2=12.300V;;;; [...]

Checks all voltage-sensors in all shelves

Consider to set --perfdata-uom-string=empty if the ‘V’ (Volts) uom confuses your monitoring-systems graphing engine.


$ check_netapp_shelfenv current -H filer
NETAPP SHELFENVIRONMENT OK - 4 current-sensors checked
SBXTEST-01 channel0a shelf0 current1: ok normal_operating_range (4.29A)
SBXTEST-01 channel0a shelf0 current2: ok normal_operating_range (5.58A)
SBXTEST-01 channel0a shelf0 current3: ok normal_operating_range (4.57A)
SBXTEST-01 channel0a shelf0 current4: ok normal_operating_range (0A)
 | SBXTEST-01_channel0a_shelf0_current1=4.29A;;;; SBXTEST-01_channel0a_shelf0_current2=5.58A;;;;  [...]

Checks all current-sensors in all shelves.

Consider to set --perfdata-uom-string=empty if the ‘A’ (Ampere) uom confuses your monitoring-systems graphing engine.


$ check_netapp_shelfenv coin-battery -H filer
NETAPP SHELFENVIRONMENT OK - 2 coin-batteries checked
SBXTEST-01.shelf21.1.coin-battery2, status: normal\E'
SBXTEST-01.shelf21.1.coin-battery1, status: normal\E'
 

Checks all coin-batteries status in all shelves.


Advanced Examples

Including and Excluding Instances

$ check_netapp_shelfenv cool -H filer
NETAPP SHELFENVIRONMENT OK - 4 cooling-elements checked.
TOASTER-01 channel0a shelf0 cool1: ok (2970rpm)
TOASTER-01 channel0a shelf0 cool2: ok (3000rpm)
TOASTER-01 channel0a shelf1 cool1: ok (2940rpm)
TOASTER-01 channel0a shelf1 cool2: ok (3000rpm)

Checks all cooling-elements in all shelves on the cluster


$ check_netapp_shelfenv cool -H filer -X ~cool2$
NETAPP SHELFENVIRONMENT OK - 2 cooling-elements checked.
TOASTER-01 channel0a shelf0 cool1: ok (2970rpm)
TOASTER-01 channel0a shelf1 cool1: ok (2940rpm)

Excludes any element whose name contains ‘cool2’.

$ check_netapp_shelfenv cool -H filer -X ~shelf0\ cool2$
NETAPP SHELFENVIRONMENT OK - 3 cooling-elements checked.
TOASTER-01 channel0a shelf0 cool1: ok (2970rpm)
TOASTER-01 channel0a shelf1 cool1: ok (2940rpm)
TOASTER-01 channel0a shelf1 cool2: ok (3000rpm)

Excludes ‘cool2’ on shelf0 only.


$ check_netapp_shelfenv cool -H filer -I ~shelf1\ cool
NETAPP SHELFENVIRONMENT OK - 2 cooling-elements checked.
TOASTER-01 channel0a shelf1 cool1: ok (2940rpm)
TOASTER-01 channel0a shelf1 cool2: ok (3000rpm)

Checks only on shelf1.

Note the backslashed space after the shelfs name (otherwise also shelf10, shelf11 etc. would get checked.)!


Thresholds for the temperature

The temperature check allows to set thresholds which check in addition to the internal status.

$ check_netapp_shelfenv temperature -H filer --warning=40
NETAPP SHELFENVIRONMENT WARNING - 7 temperature-sensors checked
SBXTEST-01 channel0a shelf0 temp1: ok normal_temperature_range(26°C)
SBXTEST-01 channel0a shelf0 temp2: warning normal_temperature_range(41°C)
SBXTEST-01 channel0a shelf0 temp3: ok normal_temperature_range(30°C)
[...]
| SBXTEST-01_channel0a_shelf0_temp1=26°C;;;; SXJTEST-01_channel0a_shelf0_temp2=41°C;;;; SBXTEST-01_channel0a_shelf0_temp3=30°C;;;; [...]

Checks the temperature-sensors in all shelfs. Returns CRITICAL if one or more temperature-sensors report an error, but in addition already returns a WARNING if one of them is over 40 degrees.


$ check_netapp_shelfenv temperature -H filer --critical=50
NETAPP SHELFENVIRONMENT CRITICAL - 7 temperature-sensors checked
SBXTEST-01 channel0a shelf0 temp1: critical normal_temperature_range(52°C)
SBXTEST-01 channel0a shelf0 temp2: ok normal_temperature_range(34°C)
SBXTEST-01 channel0a shelf0 temp3: ok normal_temperature_range(30°C)
[...]
| SBXTEST-01_channel0a_shelf0_temp1=52°C;;;; SXJTEST-01_channel0a_shelf0_temp2=34°C;;;; SBXTEST-01_channel0a_shelf0_temp3=30°C;;;; [...]

Checks the temperature-sensors in all shelfs. Returns CRITICAL if one or more sensors report an error, or if one of them is over 50 degrees

Allow empty bays

Our plug-in logic changes the state of bays whose state is unknown and which have no disk to ’empty’. This enables the check to be configured so that empty bays are no longer reported as critical.

$ check_netapp_shelfenv bay -H filer --ok-state=~^(normal|empty)$
NETAPP SHELFENVIRONMENT OK - 132 bays checked
nac06-01.shelf1.0.bay11: normal
[...]
nac06-01.shelf1.2.bay31: empty
[...]