« LinuxWorld show in London | Main | Lessons in .... SNMP trap handling, part 3 »

November 2, 2006

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d83451f81d69e200d835132aaf69e2

Listed below are links to weblogs that reference Caching NSCA data from slaves to master:

Comments

Andre Bergei

Always nice to see solutions to this problem, if I may I would like to suggest a couple of other workarounds.
First there is the OCSP Sweeper (hosted at nagiosexchange) it's been around for ages, and let's you bulk send send_nsca checks at FIFO limit, and/or at a time interval.

If your distributed nagios solution is really big, then one idea is to send state changes to your central monitoring server. You can always do your perfdata parsing on your slaves, and make a drilldown link to that data on your central server with a quick apache proxy defenition.

tonvoon

Andre,

Thanks for the pointer to OCSP Sweeper. We discounted options using another daemon - we didn't really want to add another daemon onto our system. Ideally, the "caching" should be done at the Nagios level so if we feel there is a need for a persistent caching mechanism, we'll probably look at amending Nagios (though I think Nagios 3 should cater for this - we haven't looked in detail yet).

I like the idea re: sending only state changes - we have to give it some more thought. At the moment, there is an assumption that the master knows every state, not just failures, so this would be a major change for Opsview. Also, we've designed distributed monitoring so that only a single SSH port is required between the master and slaves. Redirecting perf data to the slave would require a HTTP port open as well.

We're also betting heavily on NDOUtils, which has other techniques for status of slaves (send to single database or a databases per slave) which we haven't investigated fully yet.

Oliver Hookins

Hi, great script and great idea. I had the idea as well after I saw the terribly serial behaviour of the OCSP and OCHP commands but then I found your script so I didn't have to spend days writing my own solution! Thankyou!

I made a few changes you might find handy, like enabling the script to be used for both the OCSP and OCHP commands, not just for services. Here's my updated version:

#!/usr/bin/perl
#
# SYNTAX:
# send_nsca_cached [cache_time]
#
# DESCRIPTION:
# Used to pass passive results. Caches results and submits at 5 second
# intervals by default. The cache time can be specified on
# command line - 0 to send immediately
#
# Requires Nagios 2.0+
#
# Warning: this script needs to be invoked for a send_nsca to occur, so
# if you only have 1 service on a slave that is run every minute, the
# minimum time between sends is 1 minute, regardless of the cache_time setting.
# So you should only use on a busy slave.
#
# Warning 2: Do not use a cache time that is too large. Even a cache time of
# 1 second will help performance dramatically on a busy slave.
#
# AUTHORS:
# Copyright (C) 2006 Altinity Limited
#
# This file is part of Opsview
#
# Opsview is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# Opsview is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with Opsview; if not, write to the Free Software
# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
#
# CHANGELOG:
# v1.0.1 - Oliver Hookins, Anchor Systems, 07/03/2008
# - Replaced hard coded nsca command with a single variable
# - Changed file paths to reflect RHEL standards
# - Altered output sub to handle both OCSP and OCHP commands
#
# v1.0.0 - Altinity Limited, 07/03/2008
# - original downloaded version from http://altinity.blogs.com/dotorg/send_nsca_cached

use strict;

my $cache_time = shift @ARGV;
$cache_time = 5 unless defined $cache_time;

my $nsca_command = "/usr/sbin/send_nsca -H nagios-server -c /etc/nagios/send_nsca.cfg";

if ($cache_time == 0) {
open SEND_NSCA, "| $nsca_command";
print SEND_NSCA &output;
close SEND_NSCA;
exit;
}

my $cache_file = "/var/log/nagios/send_nsca.cache";
my $now = time;
my $last_updated;

if (-e $cache_file) {
open CACHE, "+<", $cache_file;
$last_updated = ;
#print "Last updated: ", scalar localtime $last_updated, $/;
} else {
open CACHE, "+>", $cache_file;
print CACHE $now, $/;
$last_updated = time;
#print "New cache",$/;
}

if ($now - $last_updated < $cache_time) {
seek CACHE, 0, 2; # Goto end
print CACHE &output;
} else {
open SEND_NSCA, "| $nsca_command";
print SEND_NSCA , &output;
close SEND_NSCA;
#print "Will send:", $/;
#print ;
#close CACHE;
#print "Plus this one:", &output;

# Reset time
open CACHE, ">", $cache_file;
print CACHE time, $/;

# Update send_nsca status
my $status_file = "/var/log/nagios/ocsp.status";
open STATUS, ">", $status_file;
if ($? == 0) {
print STATUS "0";
} else {
print STATUS "2";
}
close STATUS;
}

close CACHE;
exit;

sub output {
if ($ENV{NAGIOS_SERVICEDESC} eq "") {
return "$ENV{NAGIOS_HOSTNAME}\t$ENV{NAGIOS_HOSTSTATEID}\t$ENV{NAGIOS_HOSTOUTPUT}\n";
} else {
return "$ENV{NAGIOS_HOSTNAME}\t$ENV{NAGIOS_SERVICEDESC}\t$ENV{NAGIOS_SERVICESTATEID}\t$ENV{NAGIOS_SERVICEOUTPUT}\n";
}
}

Cédric

I've tried this in a large system (1000 hosts, 5000 services) and it work fine !

some comments :
* we need to cache services and hosts
* sending all the data take some time, so to not stop nagios scheduling, it is better to fork another process.

so here another version :


#!/usr/bin/perl
#
# SYNTAX:
# send_nsca_cached [cache_time]
#
# DESCRIPTION:
# Used to pass passive results. Caches results and submits at 5 second
# intervals by default. The cache time can be specified on
# command line - 0 to send immediately
#
# Requires Nagios 2.0+
#
# Warning: this script needs to be invoked for a send_nsca to occur, so
# if you only have 1 service on a slave that is run every minute, the
# minimum time between sends is 1 minute, regardless of the cache_time setting.
# So you should only use on a busy slave.
#
# Warning 2: Do not use a cache time that is too large. Even a cache time of
# 1 second will help performance dramatically on a busy slave.
#
# AUTHORS:
# Copyright (C) 2006 Altinity Limited
#
# This file is part of Opsview
#
# Opsview is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# Opsview is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with Opsview; if not, write to the Free Software
# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
#
# CHANGELOG:
# v1.0.2 - cedric.cabessa@uperto.com, 13/01/2009
# - fork before sending data
# v1.0.1 - Oliver Hookins, Anchor Systems, 07/03/2008
# - Replaced hard coded nsca command with a single variable
# - Changed file paths to reflect RHEL standards
# - Altered output sub to handle both OCSP and OCHP commands
#
# v1.0.0 - Altinity Limited, 07/03/2008
# - original downloaded version from http://altinity.blogs.com/dotorg/send_nsca_cached

use strict;

my $cache_time = shift @ARGV;
$cache_time = 5 unless defined $cache_time;

my $nsca_command = "/usr/sbin/send_nsca -H nagios-server -c /etc/nagios/send_nsca.cfg";

if ($cache_time == 0) {
open SEND_NSCA, "| $nsca_command";
print SEND_NSCA &output;
close SEND_NSCA;
exit;
}

my $cache_file = "/var/log/nagios/send_nsca.cache";
my $now = time;
my $last_updated;

if (-e $cache_file) {
open CACHE, "+<", $cache_file;
$last_updated = ;
#print "Last updated: ", scalar localtime $last_updated, $/;
} else {
open CACHE, "+>", $cache_file;
print CACHE $now, $/;
$last_updated = time;
#print "New cache",$/;
}

if ($now - $last_updated < $cache_time) {
seek CACHE, 0, 2; # Goto end
print CACHE &output;
} else {
#child send_data, father exit
my $pid=fork();
if (not defined $pid) {
print STDERR "FATAL cannot fork \n";
}elsif ($pid==0){
open SEND_NSCA, "| $nsca_command";
print SEND_NSCA , &output;
close SEND_NSCA;
#print "Will send:", $/;
#print ;
#close CACHE;
#print "Plus this one:", &output;

# Reset time
open CACHE, ">", $cache_file;
print CACHE time, $/;

# Update send_nsca status
my $status_file = "/var/log/nagios/ocsp.status";
open STATUS, ">", $status_file;
if ($? == 0) {
print STATUS "0";
} else {
print STATUS "2";
}
close STATUS;

close CACHE;
}
}
exit;

sub output {
if ($ENV{NAGIOS_SERVICEDESC} eq "") {
return "$ENV{NAGIOS_HOSTNAME}\t$ENV{NAGIOS_HOSTSTATEID}\t$ENV{NAGIOS_HOSTOUTPUT}\n";
} else {
return "$ENV{NAGIOS_HOSTNAME}\t$ENV{NAGIOS_SERVICEDESC}\t$ENV{NAGIOS_SERVICESTATEID}\t$ENV{NAGIOS_SERVICEOUTPUT}\n";
}
}

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment