The dockerized plugins started by
run.sh won’t be killed by the monitoring daemon after the global timeout (
service_check_timeout). Instead of the plugins process the daemon would kill
run.sh. Therefore we implemented a global minimum default of 120 seconds on the container-level but with the ability to get raised in case of an explicitly set
--timeout higher than 120 seconds.
You have changed the store-dir from its default by setting –storedir=/var/my/dir on the command-line but the checks do not find the files in this dir.
$ ./run.sh check_netapp AutosizeMode -H filer --storedir=/tmp No store for host 'filer' and object 'volume'. You may want to check the collector-checks. | $ ls -l /tmp/filer/ -rw-r--r-- 1 ila wheel 83149 May 13 18:02 volume.store ...
The directory (
/tmp in the above example) exists two times: One of the two is on the host and the other inside of the container. These two directories have an equal path and name but are not the same.
Change the mapping in run.sh or create an additional one.
$ perl get_netapp_perfdata.pl -H sim812 --mode=7m -o lif perf-object-counter-list-info failed: can't get instance names for lif $ perl get_netapp_perfdata.pl -H sim821n1 --mode=cm -o ifnet perf-object-instance-list-info-iter failed: Object "ifnet" was not found.
The perf-object lif is new in DataONTAP 8.2.1. It replaces ifnet,
which can be used for any filer running DataONTAP 8.1.x or older plus
7m-filers running even newer versions of DataONTAP. Therefore you can
get_netapp_*.pl --object=ifnet together with PerfIf for
cmode-filer older than 8.2.1 and for all 7m-filer.
get_netapp_*.pl --object=lif together with PerfLif is only
for cluster-mode filers with DataONTAP 8.2.1 or later.
When installing / configuring collector-checks, the collector may run a bit later than the depending check-scripts. This will temporarily (typically after having installed the checks) result in several UNKNOWNs. To avoid that you can run the collector from the command line before you restart Nagios.
--max_age and the
check_interval of the collectors
must be configured in a meaningful way. E.g. if you set the checks
max_age to 2 minutes and collect every 30 minutes then the checks
will return with UNKNOWN most of the time.
For performance-collectors also the checks
--delta must fit into the
check_interval. For more details see Configuring
The collectors (a.k.a getters) collect and process the data in memory and if anything is here, the store-phase starts and all data is immediately written to disk.
The store-duration is monitored by the getter and printed as perf-data to stdout. E.g.
Both getter and checks use file-locking, if supported by the underlying file-system.
If you want to get the latest result from a monitored device, you have to hit the "reschedule next service check" on both the responding collectors (may be more than one) and the check itself (in that order with a small delay in between!).
If a check terminates on the command-line with missing 'hostname' (as
argument or environment-variable 'NAGIOS_HOSTNAME') this is due to
rm_ack feature, which needs these environment-variables set by
Nagios. This is not a bug. Either ignore it on the command-line or
If you get such an error-message from your monitoring-system: Make sure
that it exports the required environment variables. Especially Icinga
ICINGA_HOSTNAME etc. instead of
These environment-variables are exported by the monitoring system
(e.g. Nagios, Icinga, ...) when they execute the check and tell us the
name of the host and the service-description. Host and
service-description are needed by the check to reset the correct service
acknowledgement in case of a reason change (
Some monitoring systems do not export these environment-variables by
default, or export them in a non-Nagios standard way, but instead have a
configuration-setting to change them. Please ask your monitoring-systems
support how to change these settings. In case you are going to search
for the setting yourself: In Nagios XI you would have to change
enable_environment_macros=1 (nagios.cfg). Please inform yourself about
the performance implications of changing this value!
For the curious: you can not check that on the command line! Even not
su <monitoring-user>! Again: They are exported by the
monitoring-daemon when the check is run.
This error occurs occasionally because these values are only needed in case of a reason-change, which - by its nature - happens occasionally.
For more background I recommend reading the blog articles about rm_ack.
--rm_ack=off is not a generally
recommended solution, since it may result in masked alarms. But
there are situations, where these settings are acceptable. This blog
may help to fully understand the possible consequences.
The output in the GUI contains HTML-tags like colors, <br> or </br>.
cgi.cfg so that HTML-tags are not escaped any more:
# ESCAPE HTML TAGS # This option determines whether HTML tags in host and service # status output is escaped in the web interface. If enabled, # your plugin output will not be able to contain clickable links. escape_html_tags=0
Could be the consequence of modules not in @INC. The following may help:
/usr/local/lib/perl/5.10.1$ sudo ln -s /usr/local/nagios/plugins/check_netapp_pro/ILanti/ /usr/local/lib/perl/5.10.1$ ls -l drwxr-xr-x 19 root root 4096 2013-10-29 11:34 auto lrwxrwxrwx 1 root root 46 2013-10-29 14:13 ILanti -> /usr/local/nagios/plugins/check_netapp_pro/ILanti/ -rw-rw-r-- 1 root root 5287 2013-10-29 11:34 perllocal.pod drwxrwxr-x 2 root root 4096 2013-10-29 11:26 version -r--r--r-- 1 root root 6619 2013-09-03 01:49 version.pm -r--r--r-- 1 root root 9852 2013-08-16 14:55 version.pod
Check if the getter has been run within the last e.g. 15 Minutes
(depends on the delta you have set with
--delta). Also consider that
some getters depend on other getters data (e.g. vol_snapshot depend
on volume). See also section Collector Checks vs. Stand-Alone
Fist of all: DataONTAP 7.x is not fully supported by check_netapp_pro. But some checks work. E.g. Usage can check the aggregates-usage if you set the vserver-name to the host-name:
$ ./get_netapp_7m.pl -H 10.25.0.22 -o aggregate --explore Existing data for object 'aggregate' Node: 10.25.0.22 Instance: aggr0 (uuid n/a) aggregate-name = aggr0 home-name = n/a nodes = n/a ... Explore done - now configure your nagios ... $ ./check_netapp_pro.pl Usage -H 10.25.0.22 -o aggregate -s 10.25.0.22 NETAPP_PRO USAGE WARNING - 1 aggregate checked, 0 critical and 1 warning aggr0: 842.1GiB (WARNING) | aggr0=904163696640B;808456046182.4;1039443487948.8;0;1154937208832
$ ./get_netapp_cm.pl -H filer ... -o vol_snapshot No store for host 'filer' and object 'volume'. You may want to check the collector-checks.
The getter for the volume-snapshots depends on the getter for the volumes. So the following should work if typed in this order:
$ ./get_netapp_cm.pl -H filer ... -o volume $ ./get_netapp_cm.pl -H filer ... -o vol_snapshot
Under some configurations the getter for at least the snap-mirror and the snapmirror-destination object stop working because they see instances they have already collected.
Example error message: Instance ‘NETAPP01-SVM01:NETAPP01_SVM01_xxxx01_vol’ already exists – can not continue!
Consider to use the switch
--skip_duplicates (introduced in v5.2.1).
Although having a running getter and an up-to-date storest file for the disk-object in place you see this error message:
No store (type: store) for host 'filer' and object 'disk'. You may want to check the collector-checks.
Store file (filer.disk) is out of date!
The DiskPaths check is not supported for all ONTAP versions and requires a dedicated ZAPI-getter to collect its data. Please consult the documentation of DiskPaths check (--help) for further details.
Using the newer, universal getter
get_netapp with the volume object, returns sometimes less volumes than the older Perl-based
get_netapp_cm.pl -o volume.
The modern RESTful API does not returns all volumes if the official volume endpoint is used. The universal
get_netapp will use the rest-API in favour of the older ZAPI by default.
As long as the ONTAP version still supports the ZAPI you can force the getter to use it with
Please consider a short report to us if you think that you need the volumes not returned by the official RESTful API-endpoint. We are aware of an other, private endpoint which could return a complete list.
One or both of the following occurs with an user-account:
The unigetter (get_netapp) returns with authentication error. Please check the credentials (–user, –pass or –authfile) although the credentials are proven ok.
Some getter return 0 instances although there would be instances on the filer available.
These effects disappear if you use an admin account (
The api-detection is confused because of missing or otherwise wrong configuration of the monitoring user on the filer.
Disable the api-detection and explicitly set the api (e.g.
Check your filers configuration (see Typescript Monitoring User for cdot). If you find a capability missing in the typescript please report it together with the Ontap version to the developers.