We found last time that host checks are run whenever a service comes back with a non-OK state, so it is important that your host checks are as fast as possible.
However, as part of our investigations, we found that if a passive check is received with a non-OK state, the host check is initiated. We don't think this is necessary because if the passive check has come from the host, the host must be up.
What if the passive check is not related to the host? For instance, if a host is setup as a central syslog-ng server and a passive check result is passed to say that a security login failed on a client, the result may go to the syslog-ng server, not the originating host. But in this case, checking the syslog-ng server gives no extra value either!
So we think this is a safe patch to apply. We've tested it on Nagios 2.2 and Nagios 2.3.1. We'll see if Ethan agrees.
Update: Jason Martin points out that the main assumption - that a passive check comes from the host, so it must be up - is not true with distributed monitoring.
Distributed monitoring can be setup in two ways:
1. the master receiving passive service checks and checking the host (the only way for Nagios 1.x)
2. the master receiving passive service and host check results from the slave (http://nagios.sourceforge.net/docs/2_0/distributed.html)
This patch is fine for case 2, but would break case 1. So this is not quite a safe patch anymore.
I can also see problems arising if the passive check is alerting of an immediate system shutdown (environmental alarm forcing a shutdown, etc).
Posted by: Larry Low | October 10, 2007 at 01:39 AM
Larry: Possibly. The host check could still pass (if it is a ping) as network connectivity is shutdown last.
Posted by: tonvoon | October 10, 2007 at 08:31 PM