In our continual task to try and speed up Opsview, we found a bug in NSCA's handling of aggregate writes when run in --single mode.
The specific failure scenario is this:
- NSCA and Nagios are told to start up
- A send_nsca request is received by NSCA before Nagios has created the nagios.cmd command pipe
- NSCA tries to write to open the command file, but sees it is not there
- NSCA opens the alternate dump file instead
Now when Nagios does create the nagios.cmd file, NSCA uses that ... unless aggregate mode is on and daemon mode is --single. In this case, it continues to use the alternate dump file, thus Nagios doesn't see the results from the slaves.
Here's the patch, which we've also added into our source for Opsview.
As we are very keen on good testing, we've managed to recreate the failing behaviour in a test script. You also need a test configuration file and a patch to the test framework. If you run this test, it will show the error and then after the patch is applied, the test should pass.
....or just start NSCA in the nagios init script :)
Posted by: morten | February 19, 2008 at 11:34 AM