Documentation

Usage-Check

Checks used space on aggregates or volumes.

Usage

$ check_netapp_pro.pl Usage -H <host> -o volume|aggregate [...] [--help]

Description

This plugin checks used space on either aggregates or volumes (depending on --object|-o aggregate|volume). Specific volumes or aggregates can be included or excluded by using --include|-I or --exclude|-X with a regex.

The -I and -X paramters can be used also to include or exclude specific nodes or SVMs. The plugin automatically prefixes the node- or SVM-name to the instance-name. So this methode does not depend on a naming-convention.

Depending on the switch --metric=relative|absolute either percentages or absolute values (bytes) will be checked. Together with --prefix=ki|Mi|Gi|... and relative thresholds like --warning==MAX/100*70 quite sophisticated results are possible which are necessary for some trending-tools. You'll find more information at the description of the mentioned switches and in the examples below.

Offline or restricted volumes cannot, by definition, be checked and will be skipped.

Example output if offline- or restricted volumes exist:

NETAPP OK - 30 volumes checked. [...]
3 volumes skipped
[...]

Performance-data will be printed for all checked volumes/aggregates.

Consider using -v for inspection, if you are not satisfied with the result.

What do we check here?

This plugin does its checks based on the following values:

Usage in Bytes/Percent:

size-total
size-used
percentage-size-used (vol)
percent-used-capacity (aggr)
state (to detect offline-volumes)

Inodes:

files-total
files-used

Free Space instead of used space

Checking free space instead of used space may be useful in case aggregates are extended and you want to avoid adjusting the absolute thresholds. Checking for free space is possible using the variable-threshold MAX. The point is you can use the variable MAX to get the max-value of the aggregate and subtract what you would like to have free. The difficult parts are, that this calculation is done in bytes, so you may want to use parentheses to document clearly your thresholds (which requires escaping them from the shell). Also you have to consider the SI-factor yourself. See the chapter Advanced Examples below for more details.

Exploring your filers performance-data

For a comprehensive definition including the present value from your filers you can run

$ get_netapp_*.pl -H <your filer> --explore --descriptive

An example output would be:

Existing data for object 'volume'
Node: vfiler0
Instance: lun1 (272914ba-0198-11e3-be4e-123478563412)
...
------------------------------------------------------------
files-total = 566
Total user-visible file (inode) count, i.e., current maximum
number of user-visible files (inodes) that this volume can
currently hold.
------------------------------------------------------------
files-used = 103
Number of user-visible files (inodes) used. This field is
valid only when the volume is online.
------------------------------------------------------------
...
------------------------------------------------------------
percentage-size-used = 80
Percentage of the volume size that is used. If the volume
is restricted or offline, a value of 0 is returned.
------------------------------------------------------------
size-available = 3981312
Number of bytes still available in the volume. If the volume
is restricted or offline, a value of 0 is returned.
------------------------------------------------------------
size-total = 19922944
Total usable size (in bytes) of the volume, not including
WAFL reserve or volume snapshot reserve. This field is valid
only when the volume is online.
------------------------------------------------------------
size-used = 15941632
The size (in bytes) that is used in the volume. This field
is valid only when the volume is online.
------------------------------------------------------------
...

If you do not see the description, you have to collect the data again without the --fast switch.

Dependencies

This checks reads its data from one or more local stores. The collector-scripts (also known as "getters") building these stores must have been run already for the volume and aggregate object - otherwise this check will not find any data. See the documentation and cfg-example-files which came with this script.

In case you are using the --check_only switch with one of the lun-parameters (with_lun / without_lun) this check depends also on the collector for the lun-object.

Simple Examples

[S1] ./check_netapp_pro.pl Usage -H filer -o volume --metric=relative -w 70 -c 90

Checks all volumes using relative thresholds of 70% and 90%. Warns if more than 70% of a volumes space is used and sends a critical alert if more than 90% is used.

NETAPP OK - 4 volumes checked. Max. used volume: vol0 (23.0%)
vol0: 23.0%, nfs: 1.0%, vol1: 0.0%, vfiler1: 0.0%, | vol0=23%;70;90;0;100 vol1=0%;70;90;0;100 nfs=1%;70;90;0;100 vfiler1=0%;70;90;0;100

[S2] ./check_netapp_pro.pl Usage -H filer -o volume

Checks all volumes using the default settings. This means retrieving absolute values but setting relative thresholds. That combination may seem to be wired but is useful, if you need absolute values for trending while still checking all the filers volumes at the same time. Absolute thresholds would not be helpful for an overall-check, if the volumes differ in size. The combination of absolute values and relative thresholds can lead to surprising results, so do not use it, until you know that you really need it for your trending-system.

NETAPP_PRO USAGE CRITICAL - 5 volumes checked, 1 critical and 0 warning
vol0: 0.7GiB (CRITICAL)
vol12: 0.0GiB
vol11: 0.0GiB
vol1: 0.0GiB
vol2: 0.0GiB
 | sim91-01.vol0=770220032B;562920243.2;723754598.4;0;804171776 srv1.vol1=536576B;13946060.8;17930649.6;0;19922944 srv1.vol11=577536B;13946060.8;17930649.6;0;19922944 srv1.vol12=5709824B;13946060.8;17930649.6;0;19922944 srv2.vol2=466944B;13946060.8;17930649.6;0;19922944

[S3] ./check_netapp_pro.pl Usage -H filer -o aggregate --metric=absolute --factor=Gi -w 500 -c 800

Checks all aggregates using absolute thresholds of 500 and 800 GiB.

NETAPP OK - 2 aggregates checked. Max. used aggregate: aggr1 (1.2GiB)
aggr1: 1.2GiB, aggr0: 0.8GiB, | aggr0=853942272B;536870912000;858993459200;0;896532480 aggr1=1317511168B;536870912000;858993459200;0;1793064960

[S4] ./check_netapp_pro.pl Usage -H filer -o aggregate --metric=inodes_relative -w 70 -c 90

Checks all aggregates using relative thresholds of 70% and 90%. Warn if more than 70% of the available inodes are in use and send a critical-alarm if more than 90% of the inodes are occupied.

[S5] ./check_netapp_pro.pl Usage -H filer -o volume --top=3 -w 3 -c 4 --factor=MB

Monitor all volumes, and return just the top 3 sorted results.


NETAPP_PRO USAGE CRITICAL - 6 volumes checked, 1 critical and 0 warning 
vserv_a.vol1: 4.3MB (CRITICAL)
vserv_a.vol0: 2.5MB 
vserv_b.vol1: 1.7MB
[...] 
| vserv_a.vol0=2535347B;30000000;4000000;0; vserv_a.vol1=4345377B;30000000;4000000;0; vserv_a.vol2=45377B;30000000;4000000;0;vserv_a.vol3=85307B;30000000;4000000;0; vserv_b.vol0=5357B;30000000;4000000;0; vserv_b.vol1=1715397B;30000000;4000000;0;

Advanced Examples

Hint: The input for advanced examples is shortend. Replace the '...' below with ‑H <your filers ip or host-name>

[A1] ./check_netapp_pro.pl Usage ... -o volume -I ^vol0$ --metric=inodes_absolute -w 5 -c 9 --factor=k

Check the inodes on volume vol0. Warn if more than 5000 inodes are in use and send a critical-alarm if more than 9000 are occupied.

[A2] ./check_netapp_pro.pl Usage ... -o volume -I ^vol0$ --metric=inodes_absolute -w 5 -c 9 --factor=ki

Check the inodes on volume vol0. Warn if more than 5120 inodes are in use and send a critical-alarm if more than 9216 are occupied.Hints: Note the difference between 'k' and 'ki' and 5 * 1024 = 5120; more details about these SI-multiplicators are at the help for the --factor-switch.

[A3] ./check_netapp_pro.pl Usage ... -o aggregate --metric=relative -w 50 -c 98

Checks all aggregates. Returns a warning, if at least one aggregate is more than 50% used and a critical-alarm if more than 98% of space is used.

[A4] ./check_netapp_pro.pl Usage ... -o volume -I ^vol2$

Checks only vol2. Uses the default metric and thresholds.

[A5] ./check_netapp_pro.pl Usage ... -o volume -I vol2

Checks any volume containing the string vol2. E.g. vol2, vol20, vol22, ...

[A6] ./check_netapp_pro.pl Usage ... -o volume -X vol4

Checks all volumes, except vol4. Uses the default metric and thresholds.

[A7] ./check_netapp_pro.pl Usage ... -o volume -X ^vol4$

Same as above, but makes shure to exclude only vol4 and not vol40, vol4blabla, some_vol4, ... .

[A8] ./check_netapp_pro.pl Usage ... -o volume -X vol3 -X vol4

Checks all volumes, except vol3 and vol4. Uses the default metric and thresholds.

[A9] ./check_netapp_pro.pl Usage ... -o volume -I ^vol2$ --metric=relative -w 50 -c 80

Returns a warning, if vol2 is more than 50% used.

[A10] ./check_netapp_pro.pl Usage ... -o volume -I ^vol2$ --metric=absolute -w 700 -c 850

Returns a warning, if more than 700GiB are used on vol2. Perfdata is in B

[A11] ./check_netapp_pro.pl Usage ... -o volume -I ^vol2$ -w 700 -c 850

Same as above, since metric is set to absolute by default.

[A12] ./check_netapp_pro.pl Usage ... -o volume -I ^vol2$ -w 5 -c 8 --factor=Ti

Same as above, but thresholds and output are in Terra Byte

[A13] ./check_netapp_pro.pl Usage ... -o volume -I ^vol2$ -w 5TiB -c 8TiB --factor=Ti --perf_data_factor=Ti

Same as above, but perfdata is in TiB.

[A14] ./check_netapp_pro.pl Usage ... -o volume --metric=absolute -w =MAX*70/100 -c =MAX*85/100

Returns absolute values by message and perf-data. Thresholds are relative: Warning if greater than 70%, critical if greater than 85%.

[A15] ./check_netapp_pro.pl Usage ... -o volume --metric=relative -w 80 -c 90 --show_status=not_ok

Prints the status after each volume listed in the output. E.g.

NETAPP WARNING - 3 volumes checked, 1 critical and 1 warning. Max. used volume: vol0 (91.0%)
vol0: 91.0% (CRITICAL), nfs: 82.0% (WARNING), vol1: 1.4%, | [...] .

Tip: not_ok is the default. To avoid even the CRITICAL- and WARNING-status use none. If --show_status would be none, the above example would (partly) change to:

vol0: 91.0%, nfs: 82.0%, vol1: 1.4%

[A16] ./check_netapp_pro.pl Usage ... -o volume --metric=relative -w 80 -c 90 --show_status=all

Prints the status after each volume listed in the output.E.g. NETAPP WARNING - 3 volumes checked, 1 critical and 1 warning. Max. used volume: vol0 (91.0%)vol0: 91.0% (CRITICAL), nfs: 82.0% (WARNING), vol1: 1.4% (OK), | [...]

[A17] ./check_netapp_pro.pl Usage ... -o volume -I ^abc -X bak$

Monitor all volumes, whose name starts with 'abc' but does not end in 'bak'.}

Examples:

'abc' => included
'xabc' => excluded
'xxx' => excluded
'abcdef' => included
'abcbakax' => included
'abc.bak' => excluded
'abcd.bak' => excluded

Webtipp: Regular Expressions to Filter Instances

[A18] ./check_netapp_pro.pl Usage ... -o volume -I ^vol\d\d -X vol4\[012]

Monitor vol00, vol01, ... vol39 and vol43, vol44, ...vol99

In other words: exclude 'vol40', 'vol41' and 'vol42'

[A19] ./check_netapp_pro.pl Usage ... -o volume -I ^finance.* -I ^sales.* -X bak$

Monitor all volumes from finance and sales, but exclude the backups.This example assumes a naming-convention, where the volumes name starts with the departments name and backups have the string 'bak' at the end.

[A20] ./check_netapp_pro.pl Usage ... -o volume -I project_a

Monitor all volumes of project_a - any volume, whose name contains the string 'project_a'.

[A21] ./check_netapp_pro.pl Usage ... -o volume -I ^project_a$

Monitor the project_a-volume (singular!)

[A22] ./check_netapp_pro.pl Usage ... -o volume -X project_a

Monitor all volumes, but not if their name contains 'project_a'

[A23] ./check_netapp_pro.pl Usage ... -o volume -I vol_cifs_law --factor=G -w =MAX-$1800*1000*1000*1000$ -c =MAX-$1000*1000*1000*1000$

Checks for free space instead of used space. Warn if less than 1.8 TB are free, critical if less than 1 TB.

NETAPP_PRO USAGE WARNING - 2 volumes checked, 0 critical and 2 warning
vol_cifs_law: 48316.4GB (WARNING)
vol_cifs_law_mirror: 48300.0GB (WARNING)
 | vol_cifs_law=48316383297536B;48271759532032;49071759532032;0;50071759532032 vol_cifs_law_mirror=48299960770560B;48271759532032;49071759532032;0;50071759532032

[A24] ./check_netapp_pro.pl Usage ... -o aggregate -I ^aggr1$ --metric=absolute --factor=Mi -w =MAX-$100*1024*1024$ -c =MAX-$50*1024*1024$

Checks for free space instead of used space. Warn if less than 100 MiB are free. (1 MiB is 1024*1024 Bytes whereas 1 MB would be 1000*1000 Bytes)

NETAPP OK - used space on aggregate aggr1: 105.6 MiB (max: 1800.0 MiB)
 | aggr1=110714880B;1782579200;1835008000;0;1887436800

[A25] ./check_netapp_pro.pl Usage ... -o aggregate -I ^aggr1$ --metric=absolute --factor=Mi -w =MAX-$1700*1024*1024$ -c =MAX-$1650*1024*1024$

Same as above (checks for free space), but with different thresholds. Warn if less than 1700 MiB are free - strange threshold here, but we just want to see that it works.

NETAPP WARNING - used space on aggregate aggr1: 105.6 MiB (max: 1800.0 MiB)
 | aggr1=110714880B;104857600;157286400;0;1887436800

[A26] ./check_netapp_pro.pl Usage ... -o volume --check_only=without_lun -w 93 -c 97 --metric=relative

Check only "normal" volume (without LUN), warn if they are more than 93% full.

[A27] ./check_netapp_pro.pl Usage ... -o volume --check_only=with_lun -w 99 -c 100 --metric=relative

Check only volumes which have a LUN on it and warn if they are more than 99% full.

[A28] ./check_netapp_pro.pl Usage ... -o volume --check_only=..500GiB -w 80 -c 90 --metric=relative

Check only small volumes (up to 500GiB). This allows for different thresholds depending on the volumes total size.

[A29] ./check_netapp_pro.pl Usage ... -o volume --check_only=500GiB..10TiB -w 90 -c 95 --metric=relative

Check only medium volumes (between 0,5 and 10TiB).

[A30] ./check_netapp_pro.pl Usage ... -o volume --check_only=10TiB.. -w 95 -c 98 --metric=relative

Check only large volumes (over 10TiB).

[A31] ./check_netapp_pro.pl Usage ... -o aggregate --aggr_over=100 -w 95 -c 98 --metric=relative

Check all aggregates but alarm only if they are overcommited.In other words: Aggregates with an overcommitement below 100% are always ok, even if their usage is over the --warning or --critical thersholds.