Checks the health state of the system and various subsystems.
This plugin monitors the:
A typical output would look like:
./check_netapp_health --host=sim96
NETAPP HEALTH OK - 7 (sub)systems checked
subsystem.fhm_bridge (ok)
subsystem.fhm_switch (ok)
subsystem.metrocluster_node (ok)
subsystem.metrocluster (ok)
subsystem.cifs_ndo (ok)
subsystem.switch_health (ok)
system.health (ok)
The patterns given to the --exclude|-X
and --include|-I
parameters allow to check specific (sub-)systems only.
With --ok-status=<regex>
the status can be defined which should be considered ok (e.g. confirmed errors). See also the section Examples.
$ ./check_netapp_health --host=sim96
NETAPP HEALTH OK - 7 (sub)systems checked
subsystem.fhm_bridge (ok)
subsystem.fhm_switch (ok)
subsystem.metrocluster_node (ok)
subsystem.metrocluster (ok)
subsystem.cifs_ndo (ok)
subsystem.switch_health (ok)
system.health (ok)
Checks all subsystems and the overall system-health status on sim96. Will return CRITICAL if at least one of the sub-systems or the system have a state other than ok.
$ ./check_netapp_health --host=sim96 --ok-status=^(ok|ok_with_suppressed)$
NETAPP HEALTH OK - 7 (sub)systems checked
subsystem.fhm_bridge (ok)
subsystem.fhm_switch (ok)
subsystem.metrocluster_node (ok)
subsystem.metrocluster (ok)
subsystem.cifs_ndo (ok)
subsystem.switch_health (ok)
system.health (ok_with_suppressed)
Same as above but returns OK even with confirmed errors.
$ ./check_netapp_health -H sim97 -I ^subsystem --alarm-limit=WARNING
NETAPP_PRO HEALTH WARNING - 6 (sub)systems checked, 1 WARNING
subsystem.fhm_bridge (ok)
subsystem.fhm_switch (ok)
subsystem.metrocluster_node (WARNING) (degraded)
subsystem.metrocluster (ok)
subsystem.cifs_ndo (ok)
subsystem.switch_health (ok)
Check only the subsystems and do not send a CRITICAL but a WARNING only if at least one of them is not ok.