check_netapp_health

Checks the health state of the system and various subsystems.

Description

This plugin monitors the:

  • health state of the system
  • health state of the subsystems

A typical output would look like:

./check_netapp_health --host=sim96            
NETAPP HEALTH OK - 7 (sub)systems checked
subsystem.fhm_bridge (ok)
subsystem.fhm_switch (ok)
subsystem.metrocluster_node (ok)
subsystem.metrocluster (ok)
subsystem.cifs_ndo (ok)
subsystem.switch_health (ok)
system.health (ok)

The patterns given to the --exclude|-X and --include|-I parameters allow to check specific (sub-)systems only.

With --ok-status=<regex> the statuses which should be considered ok (e.g. confirmed errors) can be defined. See also the section Examples.

Examples

Simple Examples

$ ./check_netapp_health --host=sim96            
NETAPP HEALTH OK - 7 (sub)systems checked
subsystem.fhm_bridge (ok)
subsystem.fhm_switch (ok)
subsystem.metrocluster_node (ok)
subsystem.metrocluster (ok)
subsystem.cifs_ndo (ok)
subsystem.switch_health (ok)
system.health (ok)

Checks all subsystems and the overall system-health status on sim96. Will return CRITICAL if at least one of the sub-systems or the system have a state other than ok.


$ ./check_netapp_health --host=sim96 --ok-status=^(ok|ok_with_suppressed)$
NETAPP HEALTH OK - 7 (sub)systems checked
subsystem.fhm_bridge (ok)
subsystem.fhm_switch (ok)
subsystem.metrocluster_node (ok)
subsystem.metrocluster (ok)
subsystem.cifs_ndo (ok)
subsystem.switch_health (ok)
system.health (ok_with_suppressed)

Same as above but returns OK even with confirmed errors.

Advanced Examples

$ ./check_netapp_health -H sim97 -I ^subsystem --alarm-limit=WARNING
NETAPP_PRO HEALTH WARNING - 6 (sub)systems checked, 1 WARNING
subsystem.fhm_bridge (ok)
subsystem.fhm_switch (ok)
subsystem.metrocluster_node (WARNING) (degraded)
subsystem.metrocluster (ok)
subsystem.cifs_ndo (ok)
subsystem.switch_health (ok)

Check only the subsystems and do not send a CRITICAL but a WARNING only if at least one of them is not ok.