Documentation
Usage-Check
Checks used space on aggregates or volumes.
Usage
$ check_netapp_pro.pl Usage -H <host> -o volume|aggregate [...] [--help]
Description
This plugin checks used space on either aggregates or volumes (depending on
--object|-o aggregate|volume
). Specific volumes or aggregates can be included
or excluded by using --include|-I
or --exclude|-X
with a regex.
The -I
and -X
paramters can be used also to include or exclude specific nodes or SVMs.
The plugin automatically prefixes the node- or SVM-name to the instance-name. So this
methode does not depend on a naming-convention.
Depending on the switch --metric=relative|absolute
either percentages or
absolute values (bytes) will be checked. Together with --prefix=ki|Mi|Gi|...
and relative thresholds like --warning==MAX/100*70
quite sophisticated
results are possible which are necessary for some trending-tools. You'll find
more information at the description of the mentioned switches and in the
examples below.
Offline or restricted volumes cannot, by definition, be checked and will be skipped.
Example output if offline- or restricted volumes exist:
NETAPP OK - 30 volumes checked. [...]
3 volumes skipped
[...]
Performance-data will be printed for all checked volumes/aggregates.
Consider using -v
for inspection, if you are not satisfied with the result.
What do we check here?
This plugin does its checks based on the following values:
Usage in Bytes/Percent:
- size-total
- size-used
- percentage-size-used (vol)
- percent-used-capacity (aggr)
- state (to detect offline-volumes)
Inodes:
- files-total
- files-used
Free Space instead of used space
Checking free space instead of used space may be useful in case aggregates are extended and you want to avoid adjusting the absolute thresholds. Checking for free space is possible using the variable-threshold MAX. The point is you can use the variable MAX to get the max-value of the aggregate and subtract what you would like to have free. The difficult parts are, that this calculation is done in bytes, so you may want to use parentheses to document clearly your thresholds (which requires escaping them from the shell). Also you have to consider the SI-factor yourself. See the chapter Advanced Examples below for more details.
Exploring your filers performance-data
For a comprehensive definition including the present value from your filers you can run
$ get_netapp_*.pl -H <your filer> --explore --descriptive
An example output would be:
Existing data for object 'volume'
Node: vfiler0
Instance: lun1 (272914ba-0198-11e3-be4e-123478563412)
...
------------------------------------------------------------
files-total = 566
Total user-visible file (inode) count, i.e., current maximum
number of user-visible files (inodes) that this volume can
currently hold.
------------------------------------------------------------
files-used = 103
Number of user-visible files (inodes) used. This field is
valid only when the volume is online.
------------------------------------------------------------
...
------------------------------------------------------------
percentage-size-used = 80
Percentage of the volume size that is used. If the volume
is restricted or offline, a value of 0 is returned.
------------------------------------------------------------
size-available = 3981312
Number of bytes still available in the volume. If the volume
is restricted or offline, a value of 0 is returned.
------------------------------------------------------------
size-total = 19922944
Total usable size (in bytes) of the volume, not including
WAFL reserve or volume snapshot reserve. This field is valid
only when the volume is online.
------------------------------------------------------------
size-used = 15941632
The size (in bytes) that is used in the volume. This field
is valid only when the volume is online.
------------------------------------------------------------
...
If you do not see the description, you have to collect the data again
without the --fast
switch.
Dependencies
This checks reads its data from one or more local stores. The collector-scripts (also known as "getters") building these stores must have been run already for the volume and aggregate object - otherwise this check will not find any data. See the documentation and cfg-example-files which came with this script.
In case you are using the --check_only
switch with one of the lun-parameters
(with_lun / without_lun) this check depends also on the collector for the
lun-object.
Simple Examples
Checks all volumes using relative thresholds of 70% and 90%. Warns if more than 70% of a volumes space is used and sends a critical alert if more than 90% is used.
Checks all volumes using the default settings. This means retrieving absolute values but setting relative thresholds. That combination may seem to be wired but is useful, if you need absolute values for trending while still checking all the filers volumes at the same time. Absolute thresholds would not be helpful for an overall-check, if the volumes differ in size. The combination of absolute values and relative thresholds can lead to surprising results, so do not use it, until you know that you really need it for your trending-system.
Checks all aggregates using absolute thresholds of 500 and 800 GiB.
Checks all aggregates using relative thresholds of 70% and 90%. Warn if more than 70% of the available inodes are in use and send a critical-alarm if more than 90% of the inodes are occupied.
Monitor all volumes, and return just the top 3 sorted results.
Advanced Examples
Hint: The input for advanced examples is shortend. Replace the '...' below with ‑H <your filers ip or host-name>
Check the inodes on volume vol0. Warn if more than 5000 inodes are in use and send a critical-alarm if more than 9000 are occupied.
Check the inodes on volume vol0. Warn if more than 5120 inodes are in use and send a critical-alarm if more than 9216 are occupied.Hints: Note the difference between 'k' and 'ki' and 5 * 1024 = 5120; more details about these SI-multiplicators are at the help for the --factor
-switch.
Checks all aggregates. Returns a warning, if at least one aggregate is more than 50% used and a critical-alarm if more than 98% of space is used.
Checks only vol2. Uses the default metric and thresholds.
Checks any volume containing the string vol2. E.g. vol2, vol20, vol22, ...
Checks all volumes, except vol4. Uses the default metric and thresholds.
Same as above, but makes shure to exclude only vol4 and not vol40, vol4blabla, some_vol4, ... .
Checks all volumes, except vol3 and vol4. Uses the default metric and thresholds.
Returns a warning, if vol2 is more than 50% used.
Returns a warning, if more than 700GiB are used on vol2. Perfdata is in B
Same as above, since metric is set to absolute by default.
Same as above, but thresholds and output are in Terra Byte
Same as above, but perfdata is in TiB.
Returns absolute values by message and perf-data. Thresholds are relative: Warning if greater than 70%, critical if greater than 85%.
Prints the status after each volume listed in the output. E.g.
NETAPP WARNING - 3 volumes checked, 1 critical and 1 warning. Max. used volume: vol0 (91.0%)
vol0: 91.0% (CRITICAL), nfs: 82.0% (WARNING), vol1: 1.4%, | [...] .
Tip: not_ok is the default. To avoid even the CRITICAL- and WARNING-status use none.
If --show_status
would be none, the above example would (partly) change to:
vol0: 91.0%, nfs: 82.0%, vol1: 1.4%
Prints the status after each volume listed in the output.E.g. NETAPP WARNING - 3 volumes checked, 1 critical and 1 warning. Max. used volume: vol0 (91.0%)
Monitor all volumes, whose name starts with 'abc' but does not end in 'bak'.}
Examples:
'abc' => included
'xabc' => excluded
'xxx' => excluded
'abcdef' => included
'abcbakax' => included
'abc.bak' => excluded
'abcd.bak' => excluded
Monitor vol00, vol01, ... vol39 and vol43, vol44, ...vol99
In other words: exclude 'vol40', 'vol41' and 'vol42'
Monitor all volumes from finance and sales, but exclude the backups.This example assumes a naming-convention, where the volumes name starts with the departments name and backups have the string 'bak' at the end.
Monitor all volumes of project_a - any volume, whose name contains the string 'project_a'.
Monitor the project_a-volume (singular!)
Monitor all volumes, but not if their name contains 'project_a'
Checks for free space instead of used space. Warn if less than 1.8 TB are free, critical if less than 1 TB.
Checks for free space instead of used space. Warn if less than 100 MiB are free. (1 MiB is 1024*1024 Bytes whereas 1 MB would be 1000*1000 Bytes)
Same as above (checks for free space), but with different thresholds. Warn if less than 1700 MiB are free - strange threshold here, but we just want to see that it works.
Check only "normal" volume (without LUN), warn if they are more than 93% full.
Check only volumes which have a LUN on it and warn if they are more than 99% full.
Check only small volumes (up to 500GiB). This allows for different thresholds depending on the volumes total size.
Check only medium volumes (between 0,5 and 10TiB).
Check only large volumes (over 10TiB).
Check all aggregates but alarm only if they are overcommited.In other words: Aggregates with an overcommitement below 100% are always ok, even if their usage is over the --warning or --critical thersholds.