check_netapp_ems (Event Management System log)

Checks the rate of specific events in the Event Management System log.

Usage

$ check_netapp_ems event-rate -H <host> [...] [--help] 

Description

This plugin reads the event log and counts the number of events within a given lookbehind-period.

A typical usage scenario is counting the number of autosize-events within the last hours. A high rate of such events could be interpreted as a sign for volumes getting too small.

Important Parameters

--name Name of EMS events whose rate should get calculated. If omitted all events are counted. A string prefixed with a tilde (~string) is matched like a regular expression. See examples below.

--lookbehind Time-period for calculating the rate of matching EMS events. Must be a positive integer followed by a time-unit: s(econd), min(ute), h(our), d(day), w(eek). Defaults to 1h.

--warning / --critical: Thresholds for the rate.

For all other parameters consult --help on the commandline.

Calculation

The lookbehind-period starts from the latest, matching event. All matching events within this period are added and divided trough the periods number of seconds. This rate is then recalculated according to --rate and finally displayed as events per time-unit in the checks output.

Examples

Simple Examples

./check_netapp_ems event-rate -H sim96
Rate of EMS events during the last hour: 4.45/minute
...

A first, probably not very useful example. It just calculates the number of events (any event!) per minute within the last hour.


./$ check_netapp_ems event-rate -H filer --name=wafl.vol.autoSize.done
Rate of wafl.vol.autoSize.done EMS events during the last hour: 0.01/minute
...

Monitors the number of wafl.vol.autoSize.done events.


./$ check_netapp_ems event-rate -H filer --name=wafl.vol.autoSize.done --rate=per_day
Rate of wafl.vol.autoSize.done EMS events during the last hour: 14.40/day
...

Same as above but displays the rate as number of autosize-events per day.

The calculation is still based on the last hour (the default value for --lookbehind.) See the next example on how to change that.


./$ check_netapp_ems event-rate -H filer --name=wafl.vol.autoSize.done --rate=per_day --lookbehind=1d
Rate of wafl.vol.autoSize.done EMS events during the last 24 hours: 13.82/day
...

Advanced Examples

Matching a Name (regex)

Using a regular expression (regex) allows to monitor similar but not exactly equal events. E.g. to monitor any raid event:

$ check_netapp_ems event-rate -H sim96 --name="~^raid\." --rate=per_hour
Rate of ~^raid\. EMS events during the last hour: 157.27/hour

Using raid.rg.media_scrub will reduce that to counting media-scrub events only:

$ check_netapp_ems event-rate -H sim96 --max-age=0 --name="~^raid\.rg\.media_scrub" --rate=per_hour
Rate of ~^raid\.rg\.media_scrub EMS events during the last hour: 127.67/hour

And setting --name to a string (no tilde in front), will count only events whose name equals exactly:

$ check_netapp_ems event-rate -H sim96 --name=raid.rg.media_scrub.done --rate=per_hour
Rate of raid.rg.media_scrub.done EMS events during the last hour: 37.44/hour

Using the ^ sign in front of the expression, anchors it to the beginning of the text and assures that only events whose name starts with raid are counted. Omitting the ^ would make the regex match also a name like somtext.raider.somthing which is probably not what you intended.
Also do not forget to escape regex-active characters like the dot (would match any character). Especially on the commandline you should also quote the whole regex as seen in the example above.