Advanced Configurations

Collectors for Performance-Checks

The getters for performance-checks are a bit special since they have to collect some history, before the collected data can be used by the checks (a check not finding enough history will tell you so and exit with UNKNOWN). Also the getters check_interval must be configured in accordance with the checks --delta.

check_interval is a directive of the monitoring-daemons configuration whereas --delta is a parameter of the checks.

E.g. if PerfVolume is configured to check over a 5 minutes delta with a tolerance of 3 minutes (--delta=300 --tolerance=180) the performance-getter for the volume-object must be run every 5 minutes (check_interval 5).

The checks parameters --delta and --tolerance are command-line arguments for the check-script and mostly appear as $ARGn$ in a commands.cfg, whereas the interval of the getter is configured with the monitoring-systems check_interval directive typically in a services.cfg. Unless you've changed the interval_length directive from the default value of 60, the number after check_interval will mean minutes.

Distributed Monitoring

If you are using distributed monitoring (e.g. op5 configured as a peer cluster) you will face the challenge to keep the store files in sync. This can be either done by keeping them on a common network-share or by implementing some rsync logic. We ask you to check our blog for updates on this topic.

Configuring History and Overcommitment Checks (UsageTrend, OvercommitAggr)

These checks require a longer short-term-memory (history) into the past to be able to interpolate these historical trends into the future. Whenever you are using one of these checks, do not forget to set an appropriate value for the short-term-memory in the corresponding getter.

E.g. for --lookbehind=1d in UsageTrend the volume- or aggregate-getter needs an equal long short-term-memory set (--stm=1d or even better 25h).