Right now, provisioning new monitoring alert checks is unnecessarily burdensome. Moreover, it's currently very inconsistent between local & remote checks (master & NRPE), making it confusing and hard to switch from one type to the other.
We currently have:
- monitoring::service, thin wrapper over the nagios_service native puppet resource, to be consumed by naggen on the Icinga master
- nrpe::check, creates a /etc/nagios/nrpe.d config file snippet, to be included in the target host. The plugin is *not* provisioned by the definition, but one has to explicitly place it under /usr/local/lib/nagios/plugins and potentially write a sudo definition as well.
- nrpe::monitor_service, thin wrapper over monitoring::service with nrpe_check + nrpe::check
- nagios_common::check_command, creates copies plugins under /usr/lib/nagios/plugins/$title & /etc/icinga/commands/$title.cfg but with many hardcoded assumptions that make it difficult to use it outside of nagios_common::commands and monitoring "masters" (icinga & shinken).
- manually setting up checkcommands by writing Nagios config text to modules/nagios_common/files/check_commands/$title.cfg instead of having a native puppet definition.
Deploying a new check from the monitoring master to a target host needs:
- Copying the check to modules/nagios_common/files/check_commands/$title
- Writing a new modules/nagios_common/files/check_commands/$title.cfg by hand (or, alternatively, modules/nagios_common/files/checkcommands.cfg) which usually is just a silly, mostly unnecessary abstraction against the actual check parameters with positional arguments.
- Editing modules/nagios_common/manifests/commands.pp and adding it to the list
- Invoking monitoring::service from the role class or module with the check_command that was defined in (2).
Deploying a new NRPE check requires a whole different process that involves putting the plugin to /usr/local/lib/nagios/plugins with a separate File resource & using nrpe::monitoring_service. base::monitoring::host is a good example for this.
Switching a check from local to remote is a PITA. One has to basically repeat all the steps, deal with /usr/lib/nagios vs. /usr/local/lib/icinga, rewrite positional arguments into proper arguments again, make sure the NRPE check isn't exposed on the Icinga master so that the two monitoring::service won't clash etc.
This is too difficult (it took me a while to fully grasp and even when I did, it was hard to recall to document above). We should abstract all this away and provide a sensible API inside our tree.
Ideas are welcome but I vote that we should just have: a) a single definition to deploy a new check, whether it's local or remote that would just DTRT based on Puppet dependencies, b) a single definition to use it, that has a boolean parameter that specifies whether it should run locally or remotely.