check_eseries_health

Checks the health state of the system and various aspects. The health-check is actively triggered on the target system.

Consider performance impacts on the monitored device if the health check is run too often. You may choose a longer interval between running the checks and disable retries.

Description

This plugin monitors the:

  • health state of the system
  • the health state of the components

A typical output would look like:

NETAPP ESERIES HEALTH CRITICAL - 16 health aspects checked, 1 CRITICAL
netapp04.missingVolumes: notCompleted (CRITICAL)
netapp04.integratedHealthCheck: ok
netapp04.dbSubRecordsValidation: ok
netapp04.melEventCheck: ok
netapp04.validPassword: ok
netapp04.failedDrivesPresent: ok
netapp04.exclusiveOperations: ok
netapp04.driveCheck: ok
netapp04.nvsramDisableCfwDownloads: ok
netapp04.hotSparesInUse: ok
netapp04.controllerStatusOptimal: ok
netapp04.volumeGroupsComplete: ok
netapp04.objectGraphSyncCheck: ok
netapp04.configurationDatabaseCheck: ok
netapp04.spmDatabaseVerification: ok
netapp04.storageDeviceAccessible: ok

The patterns given to the --exclude|-X and --include|-I parameters allow to check specific aspects only.

With --ok-result=<regex> the status can be defined which should be considered ok (e.g. notCompleted). See also the Examples section below.

On some controllers the time for running the health checks could be quite long. If you are getting strange errors or no results, set the --debug switch and look for messages like “Timeout exceeded”. In these cases you could fix that with a longer timeout (e.g. --timeout=300).

Examples

Simple Examples

$ ./check_eseries_health --host=netapp04
NETAPP ESERIES HEALTH CRITICAL - 16 health aspects checked, 1 CRITICAL
netapp04.missingVolumes: notCompleted (CRITICAL)
netapp04.integratedHealthCheck: ok
netapp04.dbSubRecordsValidation: ok
netapp04.melEventCheck: ok
netapp04.validPassword: ok
netapp04.failedDrivesPresent: ok
netapp04.exclusiveOperations: ok
netapp04.driveCheck: ok
netapp04.nvsramDisableCfwDownloads: ok
netapp04.hotSparesInUse: ok
netapp04.controllerStatusOptimal: ok
netapp04.volumeGroupsComplete: ok
netapp04.objectGraphSyncCheck: ok
netapp04.configurationDatabaseCheck: ok
netapp04.spmDatabaseVerification: ok
netapp04.storageDeviceAccessible: ok

Checks all aspects on netapp04. Will return CRITICAL if at least one of the sub-systems or the system have a state other than ok.


$ ./check_eseries_health --host=netapp04 --ok-result=^(ok|notCompleted)$
NETAPP ESERIES HEALTH OK - 16 health aspects checked
netapp04.missingVolumes: notCompleted
netapp04.integratedHealthCheck: ok
netapp04.dbSubRecordsValidation: ok
netapp04.melEventCheck: ok
netapp04.validPassword: ok
netapp04.failedDrivesPresent: ok
netapp04.exclusiveOperations: ok
netapp04.driveCheck: ok
netapp04.nvsramDisableCfwDownloads: ok
netapp04.hotSparesInUse: ok
netapp04.controllerStatusOptimal: ok
netapp04.volumeGroupsComplete: ok
netapp04.objectGraphSyncCheck: ok
netapp04.configurationDatabaseCheck: ok
netapp04.spmDatabaseVerification: ok
netapp04.storageDeviceAccessible: ok

Same as above but returns OK even with notCompleted checks.

Advanced Examples

$ ./check_eseries_health --host=netapp04  --alarm-limit=WARNING
NETAPP ESERIES HEALTH WARNING - 16 health aspects checked, 1 WARNING
netapp04.missingVolumes: notCompleted (WARNING)
netapp04.integratedHealthCheck: ok
netapp04.dbSubRecordsValidation: ok
netapp04.melEventCheck: ok
netapp04.validPassword: ok
netapp04.failedDrivesPresent: ok
netapp04.exclusiveOperations: ok
netapp04.driveCheck: ok
netapp04.nvsramDisableCfwDownloads: ok
netapp04.hotSparesInUse: ok
netapp04.controllerStatusOptimal: ok
netapp04.volumeGroupsComplete: ok
netapp04.objectGraphSyncCheck: ok
netapp04.configurationDatabaseCheck: ok
netapp04.spmDatabaseVerification: ok
netapp04.storageDeviceAccessible: ok

Do not send a CRITICAL but a WARNING only if at least one of the health aspects is not ok.