Home   Profile   Fun
#188 Linux  02.10.2010

Nagios 3 with failover on SLES 11 with NConf, NRPE, NSCA, PNP4Nagios and NagVis


In this tutorial Nagios 3 will be installed in a VMware guest with SLES 11.
These add-ons will be installed too:
NConf which is a graphical configuration front-end
NRPE for monitoring private parameters
NSCA for passive checks
PNP4Nagios for rrdtool graphs of Nagios performance data
NagVis for visualization of the network

Afterwards the VMware guest will be cloned and configured in a way that it makes an automatic failover in case that the master Nagios server goes down.
Everywhere in this article where a password is needed you find the word PASSWORD. You have to replace it accordingly.


VMware SLES 11
DSH
Physical host on which the VMs are running
Installation Nagios 3 core
Installation Nagios plugins
Setup the Apache vhost
NConf
Logos
Skin for Nagios 3 (optional)
NagVis
PNP4Nagios
NRPE
NSCA
Example of an snmp check
Monitoring log files
Notifications
General hints
Failover setup
Debugging



VMware SLES 11
Download the VMware player from http://www.vmware.com/products/player/. It is a single file of the type .bundle.

Make it executable and run it
chmod u+x ...bundle
Then download the SLES 11 iso image from Novell and create a new virtual machine with the VMware player. It is best tu use X/Gnomw because otherwise the VMware tools cannot be installed. After the installation is finished select the VM options and set the network to bridged (directly connect to the physical network). The last step is to set the desired IP, default route and nameserver within the VM. When this is done the guest is reachable from the network like a discrete server.

In Yast within the VM add a new repository:
http://packages.vmware.com/tools/esx/4.0/sles11/x86_64
and install the package vmware-tools.



DSH
To make changes on several monitored servers at once you can use Puppet or Cfengine. An easier way is the dancers shell DSH:
cd /home/test/build
wget http://www.netfort.gr.jp/~dancer/software/downloads/libdshconfig-0.20.9.tar.gz
tar xfvp libdshconfig-0.20.9.tar.gz
cd libdshconfig-0.20.9
./configure
make
make install

cd /home/test/build
wget http://www.netfort.gr.jp/~dancer/software/downloads/dsh-0.25.9.tar.gz
tar xfvp dsh-0.25.9.tar.gz
cd dsh-0.25.9
./configure
make
make install
ldconfig

vi /etc/profile.local
export MANPATH=$MANPATH:/usr/local/share/man
. /etc/profile.local

vi /usr/local/etc/dsh.conf
verbose=0
change from rsh to ssh.

Then enter the list of IP addresses.
vi /usr/local/etc/machines.list
root@IP1
root@IP2
...

As an example execute the command 'uname -a' on a single host:
dsh -m root@192.168.1.34 -- 'uname -a'

Executing 'uname -a' on all hosts of the machines.list file:
dsh -a -M -- 'uname -a'



Physical host on which the VMs are running
For monitoring a correct system time is essential. This must be configured on the physical host. One way is to set up an ntp client via Yast which brings along new open ports. Another way is to just run ntpdate.



Installation Nagios 3 Core
Install with Yast apache2, gcc-c++, apache2-mod_php5, gd-devel, libpng-devel, libjpeg-devel. (If gd-devel, libpng-devel and libjpeg-devel are not installed, you don't see any maps and trends)
Switch of the firewall completely.
su -
mkdir /home/test/build
cd /home/test/build
useradd nagios -p 'nagios' -m
groupadd nagios
usermod -G nagios nagios
groupadd nagcmd
usermod -G nagcmd nagios
usermod -G nagcmd wwwrun

wget http://prdownloads.sourceforge.net/sourceforge/nagios/nagios-3.2.0.tar.gz
tar xfvp nagios-3.2.0.tar.gz

cd nagios-3.2.0
./configure --prefix=/opt/nagios --enable-event-broker=yes --with-command-group=nagcmd
make all

make install
make install-init
make install-config
make install-commandmode

For the default contact enter your own email address:
sed -i -e 's/^.*email.*$/        email        YOUREMAIL@DOMAIN/' /opt/nagios/etc/objects/contacts.cfg

make install-webconf
(The file /etc/apache2/conf.d/nagios.conf will be created.)

htpasswd2 -b -c /opt/nagios/etc/htpasswd.users nagiosadmin 'PASSWORD'
/etc/init.d/apache2 restart

vi /etc/hosts
127.0.0.1       localhost nagios.localnet nagios
#127.0.0.2

Set the correct permissions for the external command file.
chown nagios:nagcmd /opt/nagios/var/rw
chmod u+rwx /opt/nagios/var/rw
chmod g+rwx /opt/nagios/var/rw
chmod g+s /opt/nagios/var/rw

Create a backup of the original Nagios configuration
cp -rp /opt/nagios/etc /opt/nagios/etc_orig

Insert Nagios in the default runlevels so that it starts automatically on system start.
insserv nagios



Installation Nagios plugins
cd /home/test/build
wget http://prdownloads.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.14.tar.gz
tar xfvp nagios-plugins-1.4.14.tar.gz
cd nagios-plugins-1.4.14
./configure --prefix=/opt/nagios --with-nagios-user=nagios --with-nagios-group=nagios --with-openssl=/usr/include/openssl
(If later on some plugins are missing or features are not available look here if and how the plugins have been built.)
make
make install

Now it's time to make a first syntax check of the Nagios configuration.
/opt/nagios/bin/nagios -v /opt/nagios/etc/nagios.cfg

Switchs/routers can be monitored with check_snmp (for this plugin you need the package net-snmp-devel)

If you want to monitor Windows servers the plugin check_nt is needed(Nagios host) and NSClient++(remote host).



Setup the Apache vhost
cd /etc/apache2/vhosts.d/
cp vhost.template nagios.conf
vi nagios.conf
<VirtualHost *:80>
    ServerAdmin YOUREMAIL@DOMAIN
    ServerName nagios.localnet
    DocumentRoot /srv/www/vhosts/nagios.localnet/htdocs

    ErrorLog /var/log/apache2/nagios.localnet-error_log
    CustomLog /var/log/apache2/nagios.localnet-access_log combined

    HostnameLookups Off
    UseCanonicalName Off
    ServerSignature On

    <IfModule mod_userdir.c>
        UserDir public_html
        Include /etc/apache2/mod_userdir.conf
    </IfModule>

    <Directory "/srv/www/vhosts/nagios.localnet/htdocs">
        Options -Indexes -FollowSymLinks
        AllowOverride AuthConfig
        Order allow,deny
        Allow from all
    </Directory>
</VirtualHost>

mkdir -p /srv/www/vhosts/nagios.localnet/htdocs

/etc/init.d/nagios start
/etc/init.d/apache2 restart

Now it is possible to open the web front-end and see the default checks for localhost.
http://nagios.localnet/nagios/

Login:
nagiosadmin
PASSWORD



NConf
Use Yast to install php5-mysql, perl-DBD-mysql.
/etc/init.d/apache2 restart

vi /etc/php5/apache2/php.ini
date.timezone = Europe/Berlin
short_open_tag = On
register_globals = Off
magic_quotes_gpc = Off 

cd /srv/www/vhosts/nagios.localnet/htdocs/
wget http://downloads.sourceforge.net/project/nconf/nconf/1.2.5-0/nconf-1.2.5-0.tgz
tar xfv nconf-1.2.5-0.tgz
rm nconf-1.2.5-0.tgz
cd nconf
chown wwwrun ./config ./output/ ./static_cfg ./temp

/etc/init.d/mysql start
mysqladmin -uroot password 'PASSWORD'
mysql -uroot -p
  create database nconf;
  grant all on nconf.* to nconf@localhost identified by 'PASSWORD';

insserv mysql
http://nagios.localnet/nconf/INSTALL.php
(Use the data from grant, see above, no authentication)

rm -r INSTALL INSTALL.php UPDATE UPDATE.php 

vi .htaccess
AuthType Basic
AuthName "Welcome to NConf"
AuthUserFile /opt/nagios/etc/htpasswd.users
require valid-user 

vi /etc/apache2/default-server.conf
ServerName nagios.localnet
/etc/init.d/apache2 restart

chown nagios:nagcmd /opt/nagios/var/spool/checkresults

To activate a new configuration from NConf you can use a simple script. The reason for this is that NConf does not override the Nagios configuration. Instead it creates a single tar archive which you have to replace the Nagios configuration with.
cd /srv/www/vhosts/nagios.localnet/htdocs
vi nconfrun.sh

#!/bin/bash

backupname='etc_'
date=`date +%Y-%m-%d_%H:%M:%S`
backupname=$backupname$date

/etc/init.d/nagios stop

cp -rp /opt/nagios/etc /home/test/backup/$backupname

rm -r /opt/nagios/etc/NagiosConfig.tgz
rm -r /opt/nagios/etc/Default_collector
rm -r /opt/nagios/etc/global

cp /srv/www/vhosts/nagios.localnet/htdocs/nconf/output/NagiosConfig.tgz /opt/nagios/etc
cd /opt/nagios/etc
tar xfvp ./NagiosConfig.tgz

/etc/init.d/nagios restart

chown wwwrun ./nconfrun.sh

To use it just run it as root from a shell after you have exported a configuration from NConf.



Logos
To get some fancy graphics for each operating system we install an additional package.
cd /srv/www/vhosts/nagios.localnet/htdocs/nconf/img/logos
wget http://www.monitoringexchange.org/attachment/download/Artwork/Image-Packs/Base-Images/imagepak-base.tar.tar
mv imagepak-base.tar.tar ./imagepak-base.tar.gz
gunzip imagepak-base.tar.gz
tar xfv imagepak-base.tar


NConf only exports its configuration if a syntax check with this configuration is successful. For this syntax check it needs access to the Nagios binary. This is a security risk. So you have to find an apropriate solution for your environment. For testing purposes you can make the binary world executable. Another way is to copy the binary to a place where it can be executed by the web server.
chmod o+x /opt/nagios/bin/nagios
ln -s /opt/nagios/bin/nagios /srv/www/vhosts/nagios.localnet/htdocs/nconf/bin/nagios
Enter NConf:
http://nagios.localnet/nconf
nagiosadmin
PASSWORD
Top left in the menu there is "Generate Nagios config". If you select this the syntax check is executed. If there are no errors the following file is being generated:
/srv/www/vhosts/nagios.localnet/htdocs/nconf/output/NagiosConfig.tgz

Remove the original Nagios configuration and unpack the new one from NConf:
rm -r /opt/nagios/etc/*
cp /opt/nagios/etc_orig/cgi.cfg /opt/nagios/etc
cp /opt/nagios/etc_orig/nagios.cfg /opt/nagios/etc
cp /opt/nagios/etc_orig/htpasswd.users /opt/nagios/etc
cp /opt/nagios/etc_orig/resource.cfg /opt/nagios/etc
(Make sure that these files have o+r set)
cp /srv/www/vhosts/nagios.localnet/htdocs/nconf/output/NagiosConfig.tgz /opt/nagios/etc/
cd /opt/nagios/etc
tar xfv NagiosConfig.tgz 

vi /opt/nagios/etc/nagios.cfg
Add these lines:
cfg_dir=/opt/nagios/etc/global
cfg_dir=/opt/nagios/etc/Default_collector
Uncomment all other cfg_dir AND cfg_file lines! (so that Nagios only uses the ones from NConf)

Provide the logos with Nagios too:
cp -rp /srv/www/vhosts/nagios.localnet/htdocs/nconf/img/logos /opt/nagios/share/images/
/etc/init.d/nagios restart



Skin for Nagios 3 (optional)
If you want you can install a different style for the Nagios web page. I personally prefer the original one.

Download Vautour Style
http://www.monitoringexchange.org/cgi-bin/page.cgi?g=Detailed%2F2969.html;d=1
and copy it to /opt/nagios/share.
cp -rp /opt/nagios/share /opt/nagios/share_orig
cd /opt/nagios/share
unzip vautour_style.zip

In my browser the left frame was too narrow. Thus I increased the width of the sidebar frames in index.html and index.php from 200 to 210:
<frameset frameborder="0" framespacing="0" cols="210,*">



NagVis
For NagVis you need SLES SDK 11 (for the libmysqlclient-devel package). This must be downloaded from Novell.

With Yast install libmysqlclient-devel, php5-gd, php5-gettext, php5-mbstring and tcpd-devel.

First of all the NDO utils must be installed. The NDO utils store Nagios data in database. But it does not replace the text file based configuration in Nagios! Nagios only access the text files.

If you encounter problems with the following commands you may have to install some additional 32 bit development packages.
cd /home/test/build
wget http://sourceforge.net/projects/nagios/files/ndoutils-1.x/ndoutils-1.4b9/ndoutils-1.4b9.tar.gz/download
tar xfv ndoutils-1.4b9.tar.gz
cd ndoutils-1.4b9
./configure --prefix=/opt/nagios/ndo --with-init-dir=/etc/init.d --enable-mysql
make all
make fullinstall

mysql -uroot -p
create database ndo;
grant all on ndo.* to ndo@localhost identified by 'PASSWORD';
cd ./db
./installdb -uroot -proot -d ndo

cd /opt/nagios/ndo/etc
mv ndo2db.cfg-sample ndo2db.cfg
mv ndomod.cfg-sample ndomod.cfg

In /opt/nagios/etc/nagios.cfg insert the following line:
broker_module=/opt/nagios/ndo/bin/ndomod.o config_file=/opt/nagios/ndo/etc/ndomod.cfg

chown nagios.nagios /opt/nagios/ndo/bin/*
chmod 0775 /opt/nagios/ndo/bin/*

An init script (/etc/init.d/ndo) comes with the package: daemon-init.in. But you can also use the skeleton in /etc/init.d/. Basically NDO is started like this:
/opt/nagios/ndo/bin/ndo2db -c /opt/nagios/ndo/etc/ndo2db.cfg

chmod 755 /etc/init.d/ndo
insserv ndo

mkdir /opt/nagios/ndo/var
chown nagios.nagios /opt/nagios/ndo
chown nagios.nagios /opt/nagios/ndo/var

vi /opt/nagios/ndo/etc/ndo2db.cfg
db_name=ndo
db_user=ndo
db_pass=PASSWORD

vi /etc/my.cnf
bind-address = 127.0.0.1
/etc/init.d/mysql restart

/etc/init.d/ndo start

The Nagios daemon must be able to write to this socket:
chown nagios.nagios /opt/nagios/ndo/var/ndo.sock

Always start NDO first, then Nagios.
A successful start looks like this (/var/log/messages):
ndomod: NDOMOD 1.4b9 (10-27-2009) Copyright (c) 2009 Nagios Core Development Team and Community Contributors
Successfully connected to MySQL database
ndomod: Successfully connected to data sink.  0 queued items to flush.
Event broker module '/opt/nagios/ndo/bin/ndomod.o' initialized successfully.
Finished daemonizing... (New PID=3852)

Now we can continue with the actual NagVis installation.
cd /home/test/build
wget http://downloads.sourceforge.net/project/nagvis/NagVis%201.4%20%28stable%29/NagVis-1.4.4/nagvis-1.4.4.tar.gz
tar xfv nagvis-1.4.4.tar.gz
cd nagvis-1.4.4/
chmod +x install.sh
./install.sh -m /opt/nagios/ndo/bin/ndo2db
 Do you want to use backend ndo2db [n]: y

vi /opt/nagios/nagvis/etc/nagvis.ini.php
[paths]
base="/opt/nagios/nagvis/"
htmlbase="/nagvis"
htmlcgi="/nagios/cgi-bin"

[automap]
showinlists=1

[wui]
allowedforconfig=EVERYONE

[backend_ndomy_1]
backendtype="ndomy"
dbhost="localhost"
dbport=3306
dbname="ndo"
dbuser="ndo"
dbpass="PASSWORD"
dbprefix="nagios_"
dbinstancename="default"
maxtimewithoutupdate=180
htmlcgi="/nagios/cgi-bin"

[rotation_demo]
maps="demo,Demo2:demo2"
interval=15

vi /etc/apache2/conf.d/nagios.conf 
Add:

Alias /nagios "/opt/nagios/share"
Alias /nagvis "/opt/nagios/nagvis"

<Directory "/opt/nagios/share">
#  SSLRequireSSL
   Options None
   AllowOverride None
   Order allow,deny
   Allow from all
#  Order deny,allow
#  Deny from all
#  Allow from 127.0.0.1
   AuthName "Nagios Access"
   AuthType Basic
   AuthUserFile /opt/nagios/etc/htpasswd.users
   Require valid-user
</Directory>

<Directory "/opt/nagios/nagvis">
#  SSLRequireSSL
   Options None
   AllowOverride None
   Order allow,deny
   Allow from all
#  Order deny,allow
#  Deny from all
#  Allow from 127.0.0.1
   AuthName "Nagios Access"
   AuthType Basic
   AuthUserFile /opt/nagios/etc/htpasswd.users
   Require valid-user
</Directory>

/etc/init.d/apache2 restart
/etc/init.d/nagios restart
/etc/init.d/ndo stop (kill the ndo2db process respectively as long as there is no working init script)
/etc/init.d/ndo start

chown wwwrun /opt/nagios/nagvis/var
chown wwwrun /opt/nagios/nagvis/var/*
chown wwwrun /opt/nagios/nagvis/etc/maps
chown wwwrun /opt/nagios/nagvis/etc/maps/*

Using NagVis:
http://nagios.homenet/nagvis/wui/index.php
http://nagios.homenet/nagvis/nagvis/index.php

Look in /var/log/messages for errors.

The NagVis automap configuration file:
vi /opt/nagios/nagvis/etc/maps/__automap.cfg
The automap provides an automatically generated map of the Nagios objects.

Create a new map:
http://nagios.homenet/nagvis/wui/index.php
right mouse click, manage, maps

On http://nagios.homenet/nagvis/wui/index.php for example a host can be added to a map:
right mouse click, Open Map, name of the map, right mouse click, Add object, Icon, Host
Select localhost
iconset: std_medium

Back to the WUI: Select map: Overview, Edit current map.

A map on the overview page can only be selected if it has a background set!

The speedometer gadget:
Gadgets can only be used for services.
Edit the service in NagVis and select gadget for view_type and std_speedometer.php for gadet_url.



PNP4Nagios
Here I will show the "synchronous mode". It is the easiest but also the slowest way to set up PNP4Nagios. It can be used for small Nagios installations. For larger setups the "buld mode with NPCD" ist better.

Install rrdtool, rrdtool-devl, php5-zlib with Yast.

cd /home/test/build
wget http://sourceforge.net/projects/pnp4nagios/files/PNP-0.6/pnp4nagios-0.6.rc6.tar.gz
tar xfv pnp4nagios-0.6.rc6.tar.gz
cd pnp4nagios-0.6.rc6

mkdir -p /opt/nagios/pnp4nagios/spool

./configure \
--prefix=/opt/nagios/pnp4nagios \
--libdir=/opt/nagios/pnp4nagios/lib \
--libexecdir=/opt/nagios/pnp4nagios/libexec \
--with-layout=suse \
--with-perfdata-logfile=/opt/nagios/pnp4nagios/pnp4nagios.log \
--with-perfdata-dir=/opt/nagios/pnp4nagios \
--with-perfdata-spool-dir=/opt/nagios/pnp4nagios/spool \
--with-rrdtool=/usr/bin/rrdtool \
--with-debug

make all
make install
make install-config
make install-webconf

vi /etc/apache2/conf.d/pnp4nagios.conf
AuthUserFile /opt/nagios/etc/htpasswd.users

vi /opt/nagios/etc/nagios.cfg
process_performance_data=1
enable_environment_macros=1
host_perfdata_command=process-host-perfdata
service_perfdata_command=process-service-perfdata

vi /opt/nagios/etc/global/misccommands.cfg
Change to:
define command {
       command_name    process-service-perfdata
       command_line    /usr/bin/perl /opt/nagios/pnp4nagios/libexec/process_perfdata.pl
}

define command {
       command_name    process-host-perfdata
       command_line    /usr/bin/perl /opt/nagios/pnp4nagios/libexec/process_perfdata.pl -d HOSTPERFDATA
}

cp /opt/nagios/pnp4nagios/etc/process_perfdata.cfg-sample /opt/nagios/pnp4nagios/etc/process_perfdata.cfg
mkdir -p /opt/nagios/pnp4nagios/var/perfdata

vi /opt/nagios/pnp4nagios/etc/process_perfdata.cfg
LOG_LEVEL=2

Verify the PNP4Nagios configuration. For other modes you have to set -m xxx differently.
vi /opt/nagios/pnp4nagios/libexec/verify_pnp_config.pl
Adapt:
my $basedir = "/opt/nagios";
my $PNPdir = "/opt/nagios/pnp4nagios";  # PNP root directory

/etc/init.d/nagios restart

/opt/nagios/pnp4nagios/libexec/verify_pnp_config.pl -m sync

The Apache module mod_rewrite is necessary.
vi /etc/sysconfig/apache2 
APACHE_MODULES="... rewrite"

SuSEconfig 

cd /opt/nagios/pnp4nagios
chown -R nagios.nagios *

/etc/init.d/apache2 restart
Enter PNP4Nagios with this URL:
http://nagios.homenet/pnp4nagios/
If everything is green:
mv /opt/nagios/pnp4nagios/share/install.php /opt/nagios/pnp4nagios/share/install.php_off

To access the right performance graph directly from Nagios you can use action urls. Then for all objects in Nagios which have the action url parameter set a new symbol appears in the web front-end. This symbol is a link to PNP4Nagios.
Open /opt/nagios/etc/Default_collector/hosts.cfg
and add at the top:
define host {
   name       host-pnp
   action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=_HOST_
   register   0
}
In etc/Default_collector/services.cfg
add at the top:
define service {
   name       srv-pnp
   action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
   register   0
}

Another way is to put these lines in there own template file and configure this file in nagios.cfg.

Then apply the new templates to all hosts and services you want:
hosts.cfg:
use host-pnp

services.cfg:
use srv-pnp

/etc/init.d/nagios restart



NRPE
Nagios-Host:
On the Nagios server the plugin check_nrpe is needed, nothing more.
cd /home/test/build
wget http://prdownloads.sourceforge.net/sourceforge/nagios/nrpe-2.12.tar.gz
tar xfvp nrpe-2.12.tar.gz
cd nrpe-2.12
./configure --enable-ssl --enable-command-args
make all
cp src/check_nrpe /opt/nagios/libexec/
chown nagios.nagios /opt/nagios/libexec/check_nrpe

On every host which shall be monitored with NRPE the NRPE daemon must be installed:
cd /home/test/build
wget http://prdownloads.sourceforge.net/sourceforge/nagios/nrpe-2.12.tar.gz
tar xfvp nrpe-2.12.tar.gz
cd nrpe-2.12
./configure --enable-ssl --enable-command-args
make all
mkdir -p /opt/nrpe/bin
cp src/nrpe /opt/nrpe/bin
mkdir /etc/nrpe
cp ./sample-config/nrpe.cfg /etc/nrpe/nrpe.cfg

Create an init script with the following part:
#!/bin/bash
/opt/nrpe/bin/nrpe -c /etc/nrpe/nrpe.cfg -d

insserv nrpe
chmod 0755 /etc/init.d/nrpe

vi /etc/nrpe/nrpe.cfg
server_port=5666
server_address=192.168.1.34
# nrpe cannot be started as root.
# If needed create some sudo records for the checks.
nrpe_user=nrpe
nrpe_group=nrpe
allowed_hosts=192.168.1.55,192.168.1.50
debug=1
dont_blame_nrpe=1
command[check_users]=/opt/nagios/libexec/check_users -w $ARG1$ -c $ARG2$
command[check_load]=/opt/nagios/libexec/check_load -w $ARG1$ -c $ARG2$
command[check_disk]=/opt/nagios/libexec/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
command[check_procs]=/opt/nagios/libexec/check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$

It makes sense to install the Nagios plugins as well. They can be executed by NRPE.
cd /home/test/build
wget http://prdownloads.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.14.tar.gz
tar xfvp nagios-plugins-1.4.14.tar.gz
cd nagios-plugins-1.4.14
mkdir /opt/nagios
./configure --prefix=/opt/nagios --with-nagios-user=nagios --with-nagios-group=nagios
make
make install

useradd nrpe
/etc/init.d/nrpe start

In case of problems:
rm /var/run/nrpe.pid; /etc/init.d/nrpe restart
tail -f /var/log/daemon.log
tail -f /var/log/debug

Run a manual test from Nagios host:
/opt/nagios/libexec/check_nrpe -H 192.168.1.34 -c check_users -a 8 10

The Nagios configuration looks like this:
define command{
        command_name    check_nrpe
        command_line    /opt/nagios/libexec/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ -a $ARG2$
        }

define service{
        host_name                             remotehost
        service_description                   nrpe_check_users_8_10
        check_command                         check_nrpe!check_users!8 10
        check_period                          24x7
        notification_period                   24x7
        contact_groups                        admins
        max_check_attempts                    3
        normal_check_interval                 5
        retry_check_interval                  1
        notification_interval                 15
        notification_options                  w,u,c,r
        active_checks_enabled                 1
        passive_checks_enabled                0
        notifications_enabled                 1
        check_freshness                       0
        freshness_threshold                   86400
        }

define host {
                host_name                             remotehost
                alias                                 remotehost.localnet
                address                               192.168.1.34
                check_command                         check-host-alive
                max_check_attempts                    3
                notification_interval                 15
                notification_options                  d,u,r
                active_checks_enabled                 1
                passive_checks_enabled                0
                notifications_enabled                 1
                check_period                          24x7
                notification_period                   24x7
                contact_groups                        admins
}



NSCA
Nagios host:
Install the libmycrypt-devel package.

cd /home/test/build
wget http://prdownloads.sourceforge.net/sourceforge/nagios/nsca-2.7.2.tar.gz
tar xfvp nsca-2.7.2.tar.gz
cd nsca-2.7.2
./configure
make all

mkdir -p /opt/nsca/bin/
cp src/nsca /opt/nsca/bin
mkdir /etc/nsca
cp sample-config/nsca.cfg /etc/nsca
chmod 0444 /etc/nsca/nsca.cfg

Insert the IP address of the server on which the NSCA daemon runs, which is the Nagios server:
vi /etc/nsca/nsca.cfg
server_address=192.168.1.55
debug=1
command_file=/opt/nagios/var/rw/nagios.cmd
alternate_dump_file=/opt/nagios/var/rw/nsca.dump
password=PASSWORD
decryption_method=1

Create an init script with the following part:
#!/bin/bash
/opt/nsca/bin/nsca -c /etc/nsca/nsca.cfg --daemon

insserv nsca

Configure Nagios for passive checks:
vi /opt/nagios/etc/nagios.cfg
accept_passive_service_checks=1

For every designated host add this line in hosts.cfg:
vi /opt/nagios/etc/Default_collector/hosts.cfg
passive_checks_enabled 1

The NSCA daemon itself should be monitored too.


Install the libmcrypt-devel package on every host which has to send check results to the Nagios server.
cd /home/test/build
wget http://prdownloads.sourceforge.net/sourceforge/nagios/nsca-2.7.2.tar.gz
tar xfvp nsca-2.7.2.tar.gz
cd nsca-2.7.2
./configure
make all

mkdir -p /opt/nagios/bin
cp src/send_nsca /opt/nagios/bin
mkdir /etc/nsca
cp sample-config/send_nsca.cfg /etc/nsca/
chmod 0444 /etc/nsca/send_nsca.cfg

vi /etc/nsca/send_nsca.cfg
password=PASSWORD
encryption_method=1

Example script to send the disk usage check result.
vi nscadisk.sh
#!/bin/bash
send_nsca=/opt/nagios/bin/send_nsca
send_nsca_cfg=/etc/nsca/send_nsca.cfg
nagioshost=192.168.1.55
host=$1
service=$2
plugin=/opt/nagios/libexec/check_disk
# free disk space thresholds
warn=80%
crit=90%
output=`$plugin -w $warn -c $crit`
# return value of the last command ($plugin -w $warn -c $crit)
rc=$?
# pipe the result into send_nsca, separated by tabs
echo -e "$host\t$service\t$rc\t$output"|$send_nsca -H $nagioshost -c $send_nsca_cfg
exit 0

On the Nagios host you must create a service so that Nagios knows the service for which the data arrives. You can set check_period to none.
define service{
         host_name              remotehost
         service_description    remotedisk
         active_checks_enabled  0
         passive_checks_enabled 1                               # We want only passive checking
         flap_detection_enabled 0
         register               1
         is_volatile            0
         check_period           none
         max_check_attempts     1
         normal_check_interval  5
         retry_check_interval   1
         check_freshness        0
         contact_groups         admins
         check_command          check_dummy!0
         notification_interval  120
         notification_period    24x7
         notification_options   w,u,c,r
         stalking_options       w,c,u
         }

define command{
        command_name check_dummy
        command_line $USER1$/check_dummy $ARG1$
        }



We proceed on the remote host.
chmod u+x ./nscadisk.sh

Run a manual check and send it to the Nagios server.
./nscadisk.sh remotehost remotedisk
1 data packet(s) sent to host successfully.
(remotehost and remotedisk must exactly match the settings in Nagios for host_name and service_description so that Nagios knows for which object the data is:
host_name remotehost
service_description remotedisk
)



Example of an snmp check
cp  /usr/share/doc/packages/net-snmp/EXAMPLE.conf /etc/snmp/snmpd.conf 

vi /etc/snmp/snmpd.conf 
Set COMMUNITY (local) to public
/etc/init.d/snmpd restart

This is just to show the principle. It checks the number of NICs.
snmpget -v1 -c public localhost interfaces.ifNumber.0
You find more examples in snmpd.conf.

Find out the numeric OID:
snmpwalk -v2c -On -c public localhost IF-MIB::ifNumber.0
.1.3.6.1.2.1.2.1.0 = INTEGER: 2

/opt/nagios/libexec/check_snmp -H localhost -o .1.3.6.1.2.1.2.1.0 -w 1 -c 2 -C public -P 1
SNMP WARNING - *2* | IF-MIB::ifNumber.0=2 
Now this can be put into the Nagios configuration.
define command {
                command_name                          check_snmp_interfaces
                command_line                          $USER1$/check_snmp -H $HOSTADDRESS$ -o .1.3.6.1.2.1.2.1.0 -w 2 -c 3 -C public -P 1
}

define service {
                service_description                   check_snmp_interfaces
                check_command                         check_snmp_interfaces
                host_name                             localhost
                check_period                          24x7
                notification_period                   24x7
                contact_groups                        admins
                max_check_attempts                    3
                normal_check_interval                 5
                retry_check_interval                  1
                notification_interval                 15
                notification_options                  w,u,c,r
                active_checks_enabled                 1
                passive_checks_enabled                0
                notifications_enabled                 1
                check_freshness                       0
                freshness_threshold                   86400
}



Monitoring log files
http://labs.consol.de/nagios/check_logfiles/
cd /home/test/build
http://labs.consol.de/wp-content/uploads/2009/09/check_logfiles-3.0.4.tar.gz
tar xfvp check_logfiles-3.0.4.tar.gz
cd check_logfiles-3.0.4
./configure
make
cp plugins-scripts/check_logfiles /opt/nagios/libexec/
cp t/etc/check_somelogfiles.cfg /opt/nagios/etc/check_logfiles.cfg

Hint:
The following patterns are regular expressions.
The plugin checks only if there are NEW records since the last check.
This means if there is a new matching record and a service check is executed the result is critical. If the check runs a second time without a new record the result is ok.
vi /opt/nagios/etc/check_logfiles.cfg

# where the state information will be saved.
$seekfilesdir = '/opt/nagios/tmp';

# where protocols with found patterns will be stored.
$protocolsdir = '/opt/nagios/tmp';

# where scripts will be searched for.
$scriptpath = '/opt/nagios/tmp';

@searches = (
  {
    tag => 'ssh logins',
    logfile => '/var/log/messages',
    rotation => 'linux',
    criticalpatterns => ['Accepted keyboard-interactive/pam for root'],
    warningpatterns => ['Authentication failure for root'],
  }
);

This example looks for root login attempts in /var/log/messages. The result is a warning state
if root enters a wrong password. If root successfully logs into the machine the result is a critical state.
mkdir /opt/nagios/tmp
chown nagios /opt/nagios/tmp

To run a manual test execute:
/opt/nagios/libexec/check_logfiles -f /opt/nagios/etc/check_logfiles.cfg

In Nagios the check can be configured like this::
define command {
                command_name                          check_ssh_root_logins
                command_line                          $USER1$/check_logfiles -f /opt/nagios/etc/check_logfiles.cfg
}

define service {
                service_description                   check_ssh_root_logins
                check_command                         check_ssh_root_logins
                host_name                             localhost
                check_period                          24x7
                notification_period                   24x7
                contact_groups                        admins
                max_check_attempts                    3
                normal_check_interval                 5
                retry_check_interval                  1
                notification_interval                 15
                notification_options                  w,u,c,r
                active_checks_enabled                 1
                passive_checks_enabled                0
                notifications_enabled                 1
                check_freshness                       0
                freshness_threshold                   86400
}

Again this is only to show the principle. Usually Nagios cannot read read /var/log/messages and shouldn't be able to do so.



Notifications
For the notifications Postfix can be configured as a relay client.

Open the existing main.cf and remove the lower part (where no comments are). Replace it with:
# sample_directory: The location of the Postfix sample configuration files.
# This parameter is obsolete as of Postfix 2.1.
#
sample_directory = /usr/share/doc/packages/postfix-doc/samples

myhostname = nagios.localnet
smtp_helo_name = nagios.localnet
relayhost = [MAILSERVER]
smtp_sasl_auth_enable=yes
smtp_sasl_password_maps=hash:/etc/postfix/sasl_passwd
smtp_sasl_mechanism_filter = plain, login
smtp_sasl_security_options = noanonymous
sender_canonical_maps = hash:/etc/postfix/sender_canonical

vi /etc/postfix/sasl_passwd
[smtp.servername.of.your.provider]  username:password

Create the postfix map.
postmap hash:/etc/postfix/sasl_passwd

Rewrite the sender address.
vi /etc/postfix/sender_canonical
localuser@yourserver.domain emailaddressofrelayaccount

And again a new map:
postmap hash:/etc/postfix/sender_canonical
postfix reload

Finally change the
notification commands (this is necessary for some distributions, not for SLES):

!!! Warning !!!
vi /opt/nagios/etc/global/contacts.cfg
Remove the character 'n'. Otherwise the contact does not receive notifications at all!
host_notification_options d,u,r,f,n
service_notification_options w,u,c,r,f, n



General hints
For parents in the Nagios configuration the servername must be used. It is not possible to use the IP address.

Availability report
Values in front of the parentheses: total time including unknown times.
Values after the parentheses: total time WITHOUT unknown times.
Unknown time: those times when the Nagios daemon did not run.



Failover setup
This is a simple way to set up failover functionality in Nagios. It is according to the official Nagios documentation.

The starting point is the NRPE daemon which is used together with check_nagios to verify that Nagios runs. This check is run by the Nagios slave. The IP of the slave must be set in the nrpe.cfg. Otherwise the slave cannot run checks via NRPE on the Nagios master. The IP of the master must be provided too.
vi /etc/nrpe/nrpe.cfg
allowed_hosts=192.168.1.56,192.168.1.55

The manuall call of this plugin looks like this:
/opt/nagios/libexec/check_nagios -F /opt/nagios/var/status.dat -e 1 -C '/opt/nagios/bin/nagios -d /opt/nagios/etc/nagios.cfg'
This must be set in the nrpe.cfg accordingly.
NRPE must be run as a daemon the Nagios master.

On the Nagios slave the Nagios daemon runs too but it is configured in a way that it neither executes any checks nor does it send out notifications (nagios.cfg):
execute_service_checks=0
enable_notifications=0
check_external_commands=1

Now we create a cron job on the slave which executes a script that checks the status of the Nagios master. If it does not run or is unavailable it activates checks and notifications on the slave:
vi /etc/crontab
* * * * * nagios /opt/nagios/libexec/setslavestatus.sh

vi /opt/nagios/libexec/setslavestatus.sh

#!/bin/sh

RESULT=`/opt/nagios/libexec/check_nrpe -H 192.168.1.55 -c check_nagios`
RV=$?

now=`date +%s`
commandfile='/opt/nagios/var/rw/nagios.cmd'

case "$RV" in
0|1)
/usr/bin/X11/printf "[%lu] DISABLE_NOTIFICATIONS\n" $now > $commandfile
/usr/bin/X11/printf "[%lu] STOP_EXECUTING_HOST_CHECKS\n" $now > $commandfile
/usr/bin/X11/printf "[%lu] STOP_EXECUTING_SVC_CHECKS\n" $now > $commandfile
;;
2)
/usr/bin/X11/printf "[%lu] ENABLE_NOTIFICATIONS\n" $now > $commandfile
/usr/bin/X11/printf "[%lu] START_EXECUTING_HOST_CHECKS\n" $now > $commandfile
/usr/bin/X11/printf "[%lu] START_EXECUTING_SVC_CHECKS\n" $now > $commandfile
;;
esac

chown nagios.nagios /opt/nagios/libexec/setslavestatus.sh 
chmod u+x /opt/nagios/libexec/setslavestatus.sh 

Tip:
If you want to find out a certain command for the external command file just look in the nagios.log after you have selected something in the web front-end.



Debugging
In case you want to test if a Nagios module can be successfully loaded you can use this little C program.
vi test.c

#include <dlfcn.h>
#include <stdlib.h>
#include <stdio.h>

int main(){

  void *module_handle=(void *)dlopen("/home/test/build/libtooltest/ndomod.o",RTLD_NOW|RTLD_GLOBAL);

  if(module_handle==NULL){

    printf("error\n");

  }else{

    printf("ok\n");
    lt_dlclose(module_handle);
  }

  exit(0);
}

gcc -l ltdl -o test test.c
./test


SLES Administration
Nagios and Cacti
Nagios Training