SNMP is one of those useful, but mis-understood technologies. I think it doesn't help that the name is Simple Network Management Protocol, yet when you first start, you get hit by these ridiculous OIDs like .1.3.6.1.2.1.1.1.0 for the system description. It just doesn't feel simple. Sigh.
However, every networking device manufacturer supports it - and there's an open source network management system based on it - so we looked into how we could integrate SNMP into Opsview. Polling SNMP devices is already supported through active checks. The next step was receiving traps, which are passive by nature.
There's a good article in Sysadmin magazine by Francois Meehan, where he describes how to get SNMP traps integrated into Nagios. His design is:
- snmptrap received by snmptrapd
- snmptrapd calls snmptt (snmp trap translator)
- snmptt defines what alert levels each trap should take and then writes to syslog
- SEC can handle correlation of events, but in this case is configured to read syslog and then pass any single event to a custom python script called snmptraphandling.py
- snmptraphandling.py then puts an entry on Nagios' command file based on the hostname and the alert level
That's a lot of layers! I'm a big fan of the KISS approach, so we went further into how these things worked.
Snmptrapd is from the Net-SNMP project. Though there are other (mainly commercial) implementations, this seems to be the most popular. You configure snmptrapd to invoke a command, called a traphandle, when it receives a SNMP trap. The interface to the traphandle is simple: just call any executable and pass stdin with the:
- the host name of the originating packet
- the ip of the originating packet
- the contents of the packet
An example packet:
cisco2611.lon.altinity
192.168.10.20
RFC1213-MIB::sysUpTime.0 0:18:14:45.66
SNMPv2-MIB::snmpTrapOID.0 IF-MIB::linkDown
RFC1213-MIB::ifIndex.2 2
RFC1213-MIB::ifDescr.2 "Serial0/0"
RFC1213-MIB::ifType.2 ppp
OLD-CISCO-INTERFACES-MIB::locIfReason.2 "administratively down"
SNMP-COMMUNITY-MIB::snmpTrapAddress.0 192.168.10.20
SNMP-COMMUNITY-MIB::snmpTrapCommunity.0 "public"
SNMPv2-MIB::snmpTrapEnterprise.0 CISCO-SMI::ciscoProducts.186
However, snmptt's documentation suggests that you run snmptrapd with the -On flag, which means "do not translate OIDs to names".
So the above equivalent would be received by snmptt as:
cisco2611.lon.altinity
192.168.10.20
.1.3.6.1.2.1.1.3.0 0:18:13:59.95
.1.3.6.1.6.3.1.1.4.1.0 .1.3.6.1.6.3.1.1.5.3
.1.3.6.1.2.1.2.2.1.1.2 2
.1.3.6.1.2.1.2.2.1.2.2 "Serial0/0"
.1.3.6.1.2.1.2.2.1.3.2 ppp
.1.3.6.1.4.1.9.2.2.1.1.20.2 "administratively down"
.1.3.6.1.6.3.18.1.3.0 192.168.10.20
.1.3.6.1.6.3.18.1.4.0 "public"
.1.3.6.1.6.3.1.1.4.3.0 .1.3.6.1.4.1.9.1.186
The reason for this is that snmptt has its configuration file indexed by OID. If you do not use the -On flag, snmptt will translate back into OIDs before finding the right entry.
In order for snmptt to know the OIDs, you have to import MIBs into snmptt and then define what the message and alert level is, using the OID as the key. It will then give you a set of macros which you can use to define your message.
Here's where we disagreed with snmptt's design - why bother importing MIBs? Obviously, snmptrapd needs to understand MIBs and it does a good job of translating OIDs. By giving snmptt that MIB information too means maintaining MIB importing in two places.
When I get stuck trying to understanding the point of something, I ask myself: What is the custom data? This is important because this needs to be maintained and it leads to the answer of What is the value?.
Snmptt's value is that lookup between the OID and the message and alert level (and the default message is not that helpful - it takes the 1st line of the description of the MIB and adds the arguments at the end). This is called the snmptt_conf_files in their language, but I'll call it the message catalogue.
But there is a performance impact with parsing the message catalogue. If snmptrapd calls a perl script which is reading this catalogue at every invocation, then there's going to be a hit if there are lots of traps being received. This is why snmptt has a daemon mode. The last thing we want is another daemon!
So then we thought: "What about leaving snmptrapd to do the translation?" Instead of indexing by OID, we could index by the trapname itself. This leaves all the MIB information at the snmptrapd level - removing our administrative nightmare - and our glue code would just be text parsing, which perl, our tool of choice, is ideally suited for.
This message catalogue is precisely the type of Nagios configuration data that we want Opsview to excel at. In fact, snmptt missed a trick in that it doesn't know which host/service to submit the passive check to. This is left to the snmptraphandling.py script, which just does it by putting onto hostname, then alert level (so every host has 3 and only 3 services with regards to snmptraps).
Our traphandle, which we call snmptrap2nagios, therefore needs to:
- be fast - it could be invoked hundreds of times a minute
- process the textual data to convert to a message and an alert level
- know which service on which host wants this alert
- submit a passive check to Nagios
Since snmptt has some useful code regarding macros, we need to emulate that. This is generic information and is not tied to the rest of Opsview, so we've written this as a perl module called SNMP::Trapinfo and we've published this on CPAN.
In Francois' design, SEC was not used for any filtering so we've removed it. This removes the need to write to syslog as well.
So now the architecture looks like this:
- SNMP packet received by snmptrapd
- snmptrapd's traphandle calls snmptrap2nagios
- snmptrap2nagios, if applicable, will write to the Nagios command file
Much cleaner!
Stay tuned for the next post when we discuss how we handle filtering and exceptions.
Update: We forgot to credit Alex Burger for his work on SNMPTT, which lots of users appreciate. Also, Ethan has got a page on integration of SNMPtraps in the Nagios documentation which we didn't see until recently.
Update: Part 2 posted here.
Very interesting article. However SNMP still is not that 'simple' to me.
I'm trying to configure the steps described, but I don't know how to run snmptrap2nagios from snmptrapd. I suppose the snmptrapd.conf has to be edited, but calling a perl module is new for me.
Help would be appreciated.
Posted by: Paul Verhoeven | July 12, 2006 at 01:54 PM
Paul,
I agree, SNMP is wrongly named!
Lookup snmptrapd.conf for information on how to call an external command.
You can't call a perl module from snmptrapd. snmptrap2nagios is our perl script that we use to process the trap. Most of the logic is in SNMP::Trapinfo and we publish that. But we haven't, and don't plan to, publish snmptrap2nagios.
Quite a few people seem to misunderstand this. I'll try and clarify this later.
Ton
Posted by: tonvoon | July 12, 2006 at 02:36 PM
hello Ton
i was wondering if you have read the article in http://cerebro.victoriacollege.edu/hobbit-trap.html
that is about the same thing the sysadmin article deal with. This solution uses net-snmp, snmptt and sec for trap handling but using Hobbit as the frontend rather than nagios.
It this solution better than sysadmins? mainly because of the mode snmptt run that is standalone as you recommended that is not a good option
for recibing traps for mibs with tons of objects traps as is my case working with an cisco AS5300 access server with an extremely large mib but i only want the E 1 interface to be notified.
Posted by: ulises | January 30, 2007 at 12:01 AM
I am having a little trouble with getting you script to work with hobbit. I have the mysql server logging the traps. I can use the trap.php to view the traps. I just cannot get hobbit to place a trap column next the device that I put “trap” in my bbhost file.
Thanks
Posted by: gary | August 27, 2008 at 10:48 AM