Checks the health state of the system and various aspects. The health-check is actively triggered on the target system.
Consider performance impacts on the monitored device if the health check is run too often. You may choose a longer interval
between running the checks and disable retries.
This plugin monitors the:
A typical output would look like:
NETAPP ESERIES HEALTH CRITICAL - 16 health aspects checked, 1 CRITICAL
netapp04.missingVolumes: notCompleted (CRITICAL)
netapp04.integratedHealthCheck: ok
netapp04.dbSubRecordsValidation: ok
netapp04.melEventCheck: ok
netapp04.validPassword: ok
netapp04.failedDrivesPresent: ok
netapp04.exclusiveOperations: ok
netapp04.driveCheck: ok
netapp04.nvsramDisableCfwDownloads: ok
netapp04.hotSparesInUse: ok
netapp04.controllerStatusOptimal: ok
netapp04.volumeGroupsComplete: ok
netapp04.objectGraphSyncCheck: ok
netapp04.configurationDatabaseCheck: ok
netapp04.spmDatabaseVerification: ok
netapp04.storageDeviceAccessible: ok
The patterns given to the --exclude|-X
and --include|-I
parameters allow to check specific aspects only.
With --ok-result=<regex>
the status can be defined which should be considered ok (e.g. notCompleted). See also the Examples section below.
Unlike our other eSeries checks, the health-check sends POST requests. It is therefore possible that a user (-u user
) with read-only access does not have sufficient permissions.
On some controllers the time for running the health checks could be quite long. If you are getting strange errors or no results, set the --debug
switch and look for messages like “Timeout exceeded”. In these cases you could fix that with a longer timeout (e.g. --timeout=300
).
$ ./check_eseries_health --host=netapp04
NETAPP ESERIES HEALTH CRITICAL - 16 health aspects checked, 1 CRITICAL
netapp04.missingVolumes: notCompleted (CRITICAL)
netapp04.integratedHealthCheck: ok
netapp04.dbSubRecordsValidation: ok
netapp04.melEventCheck: ok
netapp04.validPassword: ok
netapp04.failedDrivesPresent: ok
netapp04.exclusiveOperations: ok
netapp04.driveCheck: ok
netapp04.nvsramDisableCfwDownloads: ok
netapp04.hotSparesInUse: ok
netapp04.controllerStatusOptimal: ok
netapp04.volumeGroupsComplete: ok
netapp04.objectGraphSyncCheck: ok
netapp04.configurationDatabaseCheck: ok
netapp04.spmDatabaseVerification: ok
netapp04.storageDeviceAccessible: ok
Checks all aspects on netapp04. Will return CRITICAL if at least one of the sub-systems or the system have a state other than ok.
$ ./check_eseries_health --host=netapp04 --ok-result=^(ok|notCompleted)$
NETAPP ESERIES HEALTH OK - 16 health aspects checked
netapp04.missingVolumes: notCompleted
netapp04.integratedHealthCheck: ok
netapp04.dbSubRecordsValidation: ok
netapp04.melEventCheck: ok
netapp04.validPassword: ok
netapp04.failedDrivesPresent: ok
netapp04.exclusiveOperations: ok
netapp04.driveCheck: ok
netapp04.nvsramDisableCfwDownloads: ok
netapp04.hotSparesInUse: ok
netapp04.controllerStatusOptimal: ok
netapp04.volumeGroupsComplete: ok
netapp04.objectGraphSyncCheck: ok
netapp04.configurationDatabaseCheck: ok
netapp04.spmDatabaseVerification: ok
netapp04.storageDeviceAccessible: ok
Same as above but returns OK even with notCompleted checks.
$ ./check_eseries_health --host=netapp04 --alarm-limit=WARNING
NETAPP ESERIES HEALTH WARNING - 16 health aspects checked, 1 WARNING
netapp04.missingVolumes: notCompleted (WARNING)
netapp04.integratedHealthCheck: ok
netapp04.dbSubRecordsValidation: ok
netapp04.melEventCheck: ok
netapp04.validPassword: ok
netapp04.failedDrivesPresent: ok
netapp04.exclusiveOperations: ok
netapp04.driveCheck: ok
netapp04.nvsramDisableCfwDownloads: ok
netapp04.hotSparesInUse: ok
netapp04.controllerStatusOptimal: ok
netapp04.volumeGroupsComplete: ok
netapp04.objectGraphSyncCheck: ok
netapp04.configurationDatabaseCheck: ok
netapp04.spmDatabaseVerification: ok
netapp04.storageDeviceAccessible: ok
Do not send a CRITICAL but a WARNING only if at least one of the health aspects is not ok.
StorageGRID eseries controllers, when not in maintenance mode, have a security feature that is intended to prevent the upgrade during operation. This may result in a false positive alarm. E.g.
./check_eseries_health -H sg01 --system-id=1
NETAPP ESERIES HEALTH CRITICAL - 16 health checks checked, 1 CRITICAL
StorageGRID-SG01.nvsramDisableCfwDownloads: failed (CRITICAL)
StorageGRID-SG01.integratedHealthCheck: ok
StorageGRID-SG01.dbSubRecordsValidation: ok
StorageGRID-SG01.melEventCheck: ok
StorageGRID-SG01.validPassword: ok
StorageGRID-SG01.failedDrivesPresent: ok
StorageGRID-SG01.exclusiveOperations: ok
StorageGRID-SG01.driveCheck: ok
StorageGRID-SG01.missingVolumes: notCompleted
StorageGRID-SG01.hotSparesInUse: ok
StorageGRID-SG01.controllerStatusOptimal: ok
StorageGRID-SG01.volumeGroupsComplete: ok
StorageGRID-SG01.objectGraphSyncCheck: ok
StorageGRID-SG01.configurationDatabaseCheck: ok
StorageGRID-SG01.spmDatabaseVerification: ok
StorageGRID-SG01.storageDeviceAccessible: ok
By using --exclude|-X <aspect>
one can skip this aspect.
E.g.
./check_eseries_health -H sg01 --system-id=1 -X nvsramDisableCfwDownloads
NETAPP ESERIES HEALTH OK - 15 health checks checked, 1 CRITICAL
StorageGRID-SG01.integratedHealthCheck: ok
StorageGRID-SG01.dbSubRecordsValidation: ok
StorageGRID-SG01.melEventCheck: ok
StorageGRID-SG01.validPassword: ok
StorageGRID-SG01.failedDrivesPresent: ok
StorageGRID-SG01.exclusiveOperations: ok
StorageGRID-SG01.driveCheck: ok
StorageGRID-SG01.missingVolumes: notCompleted
StorageGRID-SG01.hotSparesInUse: ok
StorageGRID-SG01.controllerStatusOptimal: ok
StorageGRID-SG01.volumeGroupsComplete: ok
StorageGRID-SG01.objectGraphSyncCheck: ok
StorageGRID-SG01.configurationDatabaseCheck: ok
StorageGRID-SG01.spmDatabaseVerification: ok
StorageGRID-SG01.storageDeviceAccessible: ok