check_eseries_health

Checks the health state of the system and various aspects. The health-check ist actively trigged on the target system.

Consider performance impacts on the monitored device if the health check is run too often. You may choose a longer interval between running the checks and disable retries.

Description

This plugin monitors the:

  • health state of the system
  • the health state of the components

A typical output would look like:

NETAPP ESERIES HEALTH CRITICAL - 16 health aspects checked, 1 CRITICAL
netapp04.missingVolumes: notCompleted (CRITICAL)
netapp04.integratedHealthCheck: ok
netapp04.dbSubRecordsValidation: ok
netapp04.melEventCheck: ok
netapp04.validPassword: ok
netapp04.failedDrivesPresent: ok
netapp04.exclusiveOperations: ok
netapp04.driveCheck: ok
netapp04.nvsramDisableCfwDownloads: ok
netapp04.hotSparesInUse: ok
netapp04.controllerStatusOptimal: ok
netapp04.volumeGroupsComplete: ok
netapp04.objectGraphSyncCheck: ok
netapp04.configurationDatabaseCheck: ok
netapp04.spmDatabaseVerification: ok
netapp04.storageDeviceAccessible: ok

The patterns given to the --exclude|-X and --include|-I parameters allow to check specific aspects only.

With --ok-status=<regex> the status can be defined which should be considered ok (e.g. notCompleted). See also the section Examples.

Examples

Simple Examples

$ ./check_eseries_health --host=netapp04            
NETAPP ESERIES HEALTH CRITICAL - 16 health aspects checked, 1 CRITICAL
netapp04.missingVolumes: notCompleted (CRITICAL)
netapp04.integratedHealthCheck: ok
netapp04.dbSubRecordsValidation: ok
netapp04.melEventCheck: ok
netapp04.validPassword: ok
netapp04.failedDrivesPresent: ok
netapp04.exclusiveOperations: ok
netapp04.driveCheck: ok
netapp04.nvsramDisableCfwDownloads: ok
netapp04.hotSparesInUse: ok
netapp04.controllerStatusOptimal: ok
netapp04.volumeGroupsComplete: ok
netapp04.objectGraphSyncCheck: ok
netapp04.configurationDatabaseCheck: ok
netapp04.spmDatabaseVerification: ok
netapp04.storageDeviceAccessible: ok

Checks all aspects on netapp04. Will return CRITICAL if at least one of the sub-systems or the system have a state other than ok.


$ ./check_eseries_health --host=netapp04 --ok-status=^(ok|notCompleted)$
NETAPP ESERIES HEALTH OK - 16 health aspects checked
netapp04.missingVolumes: notCompleted 
netapp04.integratedHealthCheck: ok
netapp04.dbSubRecordsValidation: ok
netapp04.melEventCheck: ok
netapp04.validPassword: ok
netapp04.failedDrivesPresent: ok
netapp04.exclusiveOperations: ok
netapp04.driveCheck: ok
netapp04.nvsramDisableCfwDownloads: ok
netapp04.hotSparesInUse: ok
netapp04.controllerStatusOptimal: ok
netapp04.volumeGroupsComplete: ok
netapp04.objectGraphSyncCheck: ok
netapp04.configurationDatabaseCheck: ok
netapp04.spmDatabaseVerification: ok
netapp04.storageDeviceAccessible: ok

Same as above but returns OK even with notCompleted checks.

Advanced Examples

$ ./check_eseries_health --host=netapp04  --alarm-limit=WARNING         
NETAPP ESERIES HEALTH WARNING - 16 health aspects checked, 1 WARNING
netapp04.missingVolumes: notCompleted (WARNING)
netapp04.integratedHealthCheck: ok
netapp04.dbSubRecordsValidation: ok
netapp04.melEventCheck: ok
netapp04.validPassword: ok
netapp04.failedDrivesPresent: ok
netapp04.exclusiveOperations: ok
netapp04.driveCheck: ok
netapp04.nvsramDisableCfwDownloads: ok
netapp04.hotSparesInUse: ok
netapp04.controllerStatusOptimal: ok
netapp04.volumeGroupsComplete: ok
netapp04.objectGraphSyncCheck: ok
netapp04.configurationDatabaseCheck: ok
netapp04.spmDatabaseVerification: ok
netapp04.storageDeviceAccessible: ok

Do not send a CRITICAL but a WARNING only if at least one of the health aspects is not ok.