We ran across a problem with NSCA 2.6 yesterday day. It turned out that running the nsca daemon in single mode only works for the first packet of data from send_nsca and hung for subsequent calls.
This was actually first discovered by Rudolf van der Leeden and it looks like it has been with us since April 2006 when NSCA 2.6 was first released, through to the current NSCA 2.7. We never picked it up until running it on a customer site which was tuned to use --single.
The fix is as Rudolf suggests - uncommenting the if statement that was removed. Our patch is here.
How do we know it works? Well, we've written a series of test scripts for NSCA.
We've always been a big fan of testing. We love using the Test Anything Protocol (TAP) in Perl. CPAN encourages you to write good tests to make sure your Perl modules run, which is why we know that modules we're uploaded to CPAN continue to work while we've been updating them. And we've provided quite a few fixes to CPAN modules where the tests fail (and some just suggest that we have a broken version of perl).
Here's the test scripts for NSCA. They are more like functional testing - it tests that the daemon can start up and accept messages and compares the output in the dummy nagios.cmd file with the sent data. Unit testing is a bit more tricky to do for C code - though libtap is being used for the Nagios Plugins.
To use the test scripts, drop it down to the top level of the NSCA directory after you've compiled NSCA and cd into nsca_tests. Run prove *.t. You will require several CPAN modules: Test::More, Class::Struct, Clone and Parallel::Forker, though most will be with your perl distribution.
There are 3 tests at the moment:
- basic - just sends a few passive checks and makes sure that the nagios.cmd file receives them
- multiple - runs the same as basic, but several times to check the daemon can handle multiple requests
- simultaneous - runs lots of send_nscas at the same time (well, nearly). Uses Parallel::Forker to setup all the sends then executes them all at once. Expect about 200 extra processes to hit your server!
You'll find that multiple and simultaneous tests fails with the stock NSCA 2.6 and 2.7. But when the patch is applied, all the tests work.
The tests can obviously be extended, but this is a start and covers this basic functionality.
We hope Ethan will look into adding this to the NSCA distribution.
We're upset that something like this got to one of our customers, but we're more upset with ourselves for not catching this much earlier. This should be a good step towards better QA of future NSCA releases.
Update: Ethan has updated NSCA to 2.7.1 to fix this problem. And the tests are included as well!
Comments