Checks the rate of specific events in the Event Management System log.
$ check_netapp_ems event-rate -H <host> [...] [--help]
This plugin reads the event log and counts the number of events within a given lookbehind-period.
A typical usage scenario is counting the number of autosize-events within the last hours. A high rate of such events could be interpreted as a sign for volumes getting too small.
--name
Name of EMS events whose rate should get calculated. If omitted all events are counted. A string prefixed with a tilde (~string
) is matched like a regular expression. See examples below.
--lookbehind
Time-period for calculating the rate of matching EMS events. Must be a positive integer followed by a time-unit: s(econd), min(ute), h(our), d(day), w(eek). Defaults to 1h.
--rate
Rate used for presenting the result in the message and the thresholds. Can be per_second
, per_minute
, per_hour
, per_day
or per_week
.
--warning
/ --critical
: Thresholds for the rate. The threshold is written as a pure number without any unit. The thresholds unit is taken from the --rate
-parameter.
Examples:
--rate=per_second --warning=3
→ warns if more than 3 events per second
--rate=per_week --warning=3
→ warns if more than 3 events per week
This has changed since v1.1.0 of the plugins! Please check existing configurations from older versions.
For all other parameters consult --help
on the commandline.
The lookbehind-period starts from the latest, matching event. All matching events within this period are added and divided trough the periods number of seconds. This rate is then recalculated according to --rate
and finally displayed as events per time-unit in the checks output.
./check_netapp_ems event-rate -H sim96
Rate of EMS events during the last hour: 4.45/minute
...
A first, probably not very useful example. It just calculates the number of events (any event!) per minute within the last hour.
./$ check_netapp_ems event-rate -H filer --name=wafl.vol.autoSize.done
Rate of wafl.vol.autoSize.done EMS events during the last hour: 0.01/minute
...
Monitors the number of wafl.vol.autoSize.done events.
./$ check_netapp_ems event-rate -H filer --name=wafl.vol.autoSize.done --rate=per_day
Rate of wafl.vol.autoSize.done EMS events during the last hour: 14.40/day
...
Same as above but displays the rate as number of autosize-events per day.
The calculation is still based on the last hour (the default value for --lookbehind
.) See the next example on how to change that.
./$ check_netapp_ems event-rate -H filer --name=wafl.vol.autoSize.done --rate=per_day --lookbehind=1d
Rate of wafl.vol.autoSize.done EMS events during the last 24 hours: 13.82/day
...
Using a regular expression (regex) allows to monitor similar but not exactly equal events. E.g. to monitor any raid event:
$ check_netapp_ems event-rate -H sim96 --name="~^raid\." --rate=per_hour
Rate of ~^raid\. EMS events during the last hour: 157.27/hour
Using raid.rg.media_scrub will reduce that to counting media-scrub events only:
$ check_netapp_ems event-rate -H sim96 --name="~^raid\.rg\.media_scrub" --rate=per_hour
Rate of ~^raid\.rg\.media_scrub EMS events during the last hour: 127.67/hour
And setting --name
to a string (no tilde in front), will count only events whose name equals exactly:
$ check_netapp_ems event-rate -H sim96 --name=raid.rg.media_scrub.done --rate=per_hour
Rate of raid.rg.media_scrub.done EMS events during the last hour: 37.44/hour
Using the ^
sign in front of the expression, anchors it to the beginning of the text and assures that only events whose name starts with raid are counted. Omitting the ^
would make the regex match also a name like somtext.raider.somthing
which is probably not what you intended.
Also do not forget to escape regex-active characters like the dot (would match any character). Especially on the commandline you should also quote the whole regex as seen in the example above.