Nagios搭建配置及监控脚本开发

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/hiyun9/article/details/51881839

一、Nagios服务端安装

1.1 环境介绍

          OS:Centos6.4

          nagios-server:172.16.27.57

          nagios-client:172.16.27.43

1.2 安装基础套件

 
 
 
 
 
 
 
  
 
#yum install -y gcc glibc glibc-common gd gd-devel xinetd openssl-devel zlib* gd*

1.3创建nagios用户和用户组

# useradd -s /sbin/nologin nagios
# mkdir /usr/local/nagios
# chown -R nagios.nagios /usr/local/nagios

 
 

1.4安装nagios:

首先去我的共享云盘上下载所需要的安装包,地址如下:

http://yun.baidu.com/share/link?shareid=2600302438&uk=1226090734

整个nagios目录所有的文件全部下载下来即可。

关于nagios的一些基础概念及知识介绍可以查看nagios.ppt。这里对概念性的东西

不多做介绍了。。。。。。

Nagios主程序安装:

[root@nagios-server opt]# tar -zxvf nagios-3.4.3.tar.gz
[root@nagios-server opt]# cd nagios
[root@nagios-server nagios]# ./configure --prefix=/usr/local/nagios
[root@nagios-server nagios]# make all
[root@nagios-server nagios]# make install
 make install-init
     - This installs the init script in /etc/rc.d/init.d

  make install-commandmode
     - This installs and configures permissions on the
       directory for holding the external command file

  make install-config
     - This installs sample config files in /usr/local/nagios/etc

[root@nagios-server nagios]# make install-init
[root@nagios-server nagios]# make install-commandmode
[root@nagios-server nagios]# make install-config

#开机自启动
[root@nagios-server nagios]# chkconfig --add nagios
[root@nagios-server nagios]# chkconfig --level 35 nagios on
[root@nagios-server nagios]# chkconfig --list nagios
nagios          0:关闭  1:关闭  2:关闭  3:启用  4:启用  5:启用  6:关闭

 
 
 
 


Nagios插件安装:

[root@nagios-server nagios]# cd /opt/
[root@nagios-server opt]# tar -zxvf nagios-plugins-2.1.1.tar.gz
[root@nagios-server opt]# cd nagios-plugins-2.1.1
[root@nagios-server nagios-plugins-2.1.1]# ./configure --prefix=/usr/local/nagios
[root@nagios-server nagios-plugins-2.1.1]# make && make install

 
 
 
 
 
 
 
 
 
 
 
 


Apache安装:

[root@nagios-server nagios-plugins-2.1.1]# cd /opt/
[root@nagios-server opt]# tar -zxvf httpd-2.2.23.tar.gz
[root@nagios-server opt]# cd httpd-2.2.23
[root@nagios-server httpd-2.2.23]# ./configure --prefix=/usr/local/apache2 --enable-so --enable-rewrite
[root@nagios-server httpd-2.2.23]# make && make install

 
 
 
 


php安装:

[root@nagios-server nagios-plugins-2.1.1]# cd /opt/
[root@nagios-server opt]# yum install libxml2 libxml2-devel libpng-devel libtool libtool-ltdl-devel -y
[root@nagios-server opt]# tar -jxvf php-5.3.28.tar.bz2
[root@nagios-server opt]# cd php-5.3.28
[root@nagios-server php-5.3.28]# ./configure --prefix=/usr/local/php/ --with-apxs2=/usr/local/apache2/bin/apxs \
--with-gd --with-zlib --enable-sockets
[root@nagios-server php-5.3.28]# make
[root@nagios-server php-5.3.28]# make install

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Apache配置:

[root@nagios-server php-5.3.28]# vim /usr/local/apache2/conf/httpd.conf

1.

<IfModule !mpm_netware_module>
<IfModule !mpm_winnt_module>
修改为
LoadModule php5_module     modules/libphp5.so
LoadModule rewrite_module    modules/mod_rewrite.so
 
 
<IfModule !mpm_netware_module>
<IfModule !mpm_winnt_modul

2.

User daemon
Group daemon
修改为
User nagios 
Group nagios

3.

#ServerName www.example.com:80
修改为
ServerName 127.0.0.1

4.

<IfModule dir_module>
    DirectoryIndex index.html
</IfModule>
修改为
 
 
<IfModule dir_module>
    DirectoryIndex index.html index.php index.php3 default.php
</IfModule>

5.

AddType application/x-compress .Z
AddType application/x-gzip .gz .tgz
修改为
AddType application/x-compress .Z
AddType application/x-gzip .gz .tgz
AddType application/x-httpd-php .php .php3 .htm .phtml .php4
AddType application/x-httpd-php-source .phps

6.

#setting for nagios 
ScriptAlias /nagios/cgi-bin "/usr/local/nagios/sbin" 
<Directory "/usr/local/nagios/sbin">                         #nagios存放cgi程序目录
     AuthType Basic 
     Options ExecCGI 
     AllowOverride None 
     Order allow,deny 
     Allow from all 
     AuthName "Nagios Access" 
     AuthUserFile /usr/local/nagios/etc/htpasswd             #用于此目录访问身份验证的文件 
     Require valid-user 
</Directory> 
Alias /nagios "/usr/local/nagios/share" 
<Directory "/usr/local/nagios/share">                        #nagios存放html目录
     AuthType Basic 
     Options None 
     AllowOverride None 
     Order allow,deny 
     Allow from all 
     AuthName "nagios Access" 
     AuthUserFile /usr/local/nagios/etc/htpasswd 
     Require valid-user 
</Directory> 

#注意:不要把注释内容也复制到配置文件里哦,否则会出错的!


创建apache目录验证文件:

[root@nagios-server php-5.3.28]# /usr/local/apache2/bin/htpasswd -c /usr/local/nagios/etc/htpasswd nagiosadmin
New password:
Re-type new password:
Adding password for user nagiosadmin

#这样就在/usr/local/nagios/etc 目录下创建了一个htpasswd 验证文件,
#当通过http://IP/nagios访问时就需要输入用户名和密码了。

[root@nagios-server share]# cat /usr/local/nagios/etc/htpasswd
nagiosadmin:$apr1$GHjt0nqB$VNnfnBpdxXoiHYh.PXuMS0

 
 

启动apache服务:

[root@nagios-server share]# /usr/local/apache2/bin/apachectl start
httpd: Syntax error on line 57 of /usr/local/apache2/conf/httpd.conf: module rewrite_module is built-in and can't be loaded
若出现:
httpd: Syntax error on line 55 of /usr/local/apache2/conf/httpd.conf:
module rewrite_module is built-in and can't be loaded
表示模块是内建的,不用再调入,注释掉
#LoadModule rewrite_module modules/mod_rewrite.so
##############################################


访问下nagios界面:

 
 
[root@nagios-server share]# service iptables stop   

#先要关闭防火墙,或者设置允许外界可以访问80端口。

首先会要求你输入账号密码:



然后就是nagios主页面了:




关于nagios服务端的安装部分到这里就差不多了,按照步骤来,应该不会有问题,接下来最难的部分还是如何配置。

我们下面继续研究。

二、Nagios服务端配置

 
 
[root@nagios-server etc]# cd /usr/local/nagios/etc/
[root@nagios-server etc]# ls
cgi.cfg  htpasswd  nagios.cfg  objects  resource.cfg

#接下来我们每个配置文件逐个学下

a.
cgi.cfg:

#此文件用来控制相关cgi脚本,如果想在nagios的web监控界面执行cgi脚本,
#例如重启nagios进程、关闭nagios通知、停止nagios主机检测等,这时就需要配置cgi.cfg文件了。
#由于nagios的web监控界面验证用户为nagiosadmin
#所以只需在cgi.cfg文件中添加此用户的执行权限就可以了,需要修改的配置信息如下:
#如果有其他用户也想执行,就直接添加如hiyun9。

default_user_name=nagiosadmin
authorized_for_system_information=nagiosadmin,hiyun9
authorized_for_configuration_information=nagiosadmin,hiyun9
authorized_for_system_commands=nagiosadmin,hiyun9
authorized_for_all_services=nagiosadmin,hiyun9
authorized_for_all_hosts=nagiosadmin,hiyun9
authorized_for_all_service_commands=nagiosadmin,hiyun9
authorized_for_all_host_commands=nagiosadmin,hiyun9

 
 
b.
resource.cfg

$USER1$=/usr/local/nagios/libexec

#这里面就一行内容,主要是设定$USER1$变量,它指定了nagios插件的路径,如果把插件安装到了其他
#地方,在这里修改即可。需要注意的是,变量必须先定义,然后才能在其他配置文件中进行引用。

 
 
 
 
c.
nagios.cfg

#nagios主配置文件,这个我基本都没怎么改,就是为了方便管理,
#定义了三个目录,用来区分linux机器、windows机器、还有网络设备。

log_file=/usr/local/nagios/var/nagios.log
cfg_file=/usr/local/nagios/etc/objects/commands.cfg
cfg_file=/usr/local/nagios/etc/objects/contacts.cfg
cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg
cfg_file=/usr/local/nagios/etc/objects/templates.cfg
cfg_dir=/usr/local/nagios/etc/objects/linux
cfg_dir=/usr/local/nagios/etc/objects/windows
cfg_dir=/usr/local/nagios/etc/objects/switch

#之前定义的3个目录需要在这里调用
cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
....
....
....
###
service_perfdata_file=/usr/local/pnp4nagios/var/service-perfdata
service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\tSERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tSERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVICESTATETYPE$
service_perfdata_file_mode=a
service_perfdata_file_processing_interval=15
service_perfdata_file_processing_command=process-service-perfdata
host_perfdata_file=/usr/local/pnp4nagios/var/host-perfdata
host_perfdata_file_template=DATATYPE::HOSTPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tHOSTPERFDATA::$HOSTPERFDATA$\tHOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$
host_perfdata_file_mode=a
host_perfdata_file_processing_interval=15
host_perfdata_file_processing_command=process-host-perfdata
host_perfdata_process_empty_results=1
service_perfdata_process_empty_results=1
(nagios.cfg标红部分为pnp4nagios配置)

#有需要的话可以根据此配置文件中的注释在研究下。

#接下来再看下objects里面的配置文件信息如下:
[root@nagios-server objects]# ls
commands.cfg  localhost.cfg  switch.cfg     timeperiods.cfg
contacts.cfg  printer.cfg    templates.cfg  windows.cfg

d.
command.cfg
此文件默认是存在的,无需修改即可使用,当然如果有新的命令需要加入时,在此文件进行添加即可。

# 'process-host-perfdata' command definition
define command{
    command_name    process-host-perfdata
    #command_line    /usr/bin/printf "%b" "$LASTHOSTCHECK$\t$HOSTNAME$\t$HOSTSTATE$\t$HOSTATTEMPT$\t$HOSTSTATETYPE$\t$HOSTEXECUTIONTIME$\t$HOSTOUTPUT$\t$HOSTPERFDATA$\n" >> /usr/local/nagios//var/host-perfdata.out
    command_line  /usr/local/pnp4nagios/libexec/process_perfdata.pl -d HOSTPERFDATA
    }

#红色部分是pnp4nagios的配置

# 'process-service-perfdata' command definition
define command{
    command_name    process-service-perfdata
    #command_line    /usr/bin/printf "%b" "$LASTSERVICECHECK$\t$HOSTNAME$\t$SERVICEDESC$\t$SERVICESTATE$\t$SERVICEATTEMPT$\t$SERVICESTATETYPE$\t$SERVICEEXECUTIONTIME$\t$SERVICELATENCY$\t$SERVICEOUTPUT$\t$SERVICEPERFDATA$\n" >> /usr/local/nagios//var/service-perfdata.out
    command_line  /usr/local/pnp4nagios/libexec/process_perfdata.pl
    }
#此为nrpe插件程序部分
define command{
    command_name check_nrpe
    command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
    }

define command{
        command_name check_nrpe2
        command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ -t 60
        }

e.
contacts.cfg

define contact{
    contact_name       nagiosadmin                ; Short name of user
    use                 eneric-contact                ; Inherit default values from generic-contact template (defined above)
    alias               Nagios Admin                ; Full name of user
    #host_notification_period     24x7    
    #service_notification_period   24x7
    email               [email protected]        ; <<***** CHANGE THIS TO YOUR EMAIL ADDRESS ******
    }


define contactgroup{
        contactgroup_name       admins
        alias             Nagios Administrators
        members            nagiosadmin
        }

f.
timeperiods.cfg


#在中国留着第一个24x7就好了,其他自行参阅把。。。。。
define timeperiod{
        timeperiod_name 24x7
        alias         24 Hours A Day, 7 Days A Week
        sunday         00:00-24:00
        monday         00:00-24:00
        tuesday        00:00-24:00
        wednesday       00:00-24:00
        thursday        00:00-24:00
        friday         00:00-24:00
        saturday        00:00-24:00
        }


# 'workhours' timeperiod definition
define timeperiod{
    timeperiod_name   workhours
    alias         Normal Work Hours
    monday        09:00-17:00
    tuesday       09:00-17:00
    wednesday     09:00-17:00
    thursday     09:00-17:00
    friday        09:00-17:00
    }


# 'none' timeperiod definition
define timeperiod{
    timeperiod_name    none
    alias         No Time Is A Good Time
    }

define timeperiod{
     name us-holidays
        timeperiod_name     us-holidays
        alias          U.S. Holidays
        january 1        00:00-00:00     ; New Years
        monday -1 may      00:00-00:00     ; Memorial Day (last Monday in May)
        july 4         00:00-00:00     ; Independence Day
        monday 1 september   00:00-00:00     ; Labor Day (first Monday in September)
        thursday 4 november   00:00-00:00     ; Thanksgiving (4th Thursday in November)
        december 25       00:00-00:00     ; Christmas
        }


# This defines a modified "24x7" timeperiod that covers every day of the
# year, except for U.S. holidays (defined in the timeperiod above).
define timeperiod{
        timeperiod_name 24x7_sans_holidays
        alias         24x7 Sans Holidays
     use         us-holidays        ; Get holiday exceptions from other timeperiod
        sunday        00:00-24:00
        monday        00:00-24:00
        tuesday        00:00-24:00
        wednesday       00:00-24:00
        thursday       00:00-24:00
        friday        00:00-24:00
        saturday       00:00-24:00
        }

 
 
 
 
 
 
 
 
 
 
g.
templates.cfg
顾名思义,nagios主要用于监控主机资源以及服务,在nagios配置中称为对象,为了不必重复定义一些监控对象,
nagios引入了一个模板配置文件,将一些共性的属性定义成模板,以便于多次引用。这就是templates.cfg的作用。
define contact{
        name                generic-contact #联系人名称
        service_notification_period     24x7 #当服务出现异常时,发送通知的时间段,这个时间段"24x7"在timeperiods.cfg文件中定义
        host_notification_period      24x7 #当主机出现异常时,发送通知的时间段,这个时间段"24x7"在timeperiods.cfg文件中定义
        service_notification_options    w,u,c,r,f,s     
        host_notification_options      d,u,r,f,s     
        service_notification_commands   notify-service-by-email  #服务故障时,发送通知的方式,可以是邮件和短信,这里发送的方式是邮件;
        host_notification_commands     notify-host-by-email  #主机故障时,发送通知的方式,可以是邮件和短信,这里发送的方式是邮件;
        register                        0             

w即warn,表示警告状态,

u即unknown,表示不明状态,

c即criticle,表示紧急状态,

d即down,表示宕机状态,

r即recovery,表示重新恢复状态

f即flapping,状态波动很大

n即none,不发送告警邮件


define host{
        name                   generic-host
   #主机名称,这里的主机名,并不是直接对应到真正机器的主机名;
        notifications_enabled           1           
        event_handler_enabled           1         
        flap_detection_enabled          1       
        failure_prediction_enabled      1         
        process_perf_data             1  
  #其值可以为0或1,其作用为是否启用Nagios的数据输出功能; #如果将此项赋值为1,那么Nagios就会将收集的数据写入某个文件中,以备提取。

        retain_status_information       1           
        retain_nonstatus_information    1         
     notification_period         24x7
     #指定“发送通知”的时间段,也就是可以在什么时候发送通知给使用者。

        register                 0        

define host{
    name                 linux-server  
    use                 generic-host    #引用generic-host所有的配置

    check_period             24x7      
    check_interval          5      #nagios对主机的检查时间间隔,这里是5分钟。
    retry_interval          1      #重试检查时间间隔,单位是分钟。
    max_check_attempts       10     #nagios对主机的最大检查次数,也就是nagios在检查发现某主机异常时,并不马上判断为异常状况,
                                                            #而是多试几次,因为有可能只是一时网络太拥挤,或是一些其他原因,让主机受到了一点影响;
    check_command         check-host-alive #指定检查主机状态的命令,其中“check-host-alive”在commands.cfg文件中定义。
    notification_period      workhours                                                   
    notification_interval     30     #在主机出现异常后,故障一直没有解决,nagios再次对使用者发出通知的时间。单位是分钟;

    notification_options     d,u,r      
    contact_groups           admins       #指定联系人组,这个“admins”在contacts.cfg文件中定义。

    register             0      
    }
define host{
 
 
    name                  windows-server    
    use                   generic-host  
    check_period             24x7        
    check_interval           5      
    retry_interval           1      
    max_check_attempts       10        
    check_command            check-host-alive    
    notification_period       24x7      
    notification_interval     30        
    notification_options     d,r      
    contact_groups         admins        
    hostgroups         windows-servers
    register                0      
    }
define host{
    name                  generic-printer    
    use                   generic-host    
    check_period             24x7        
    check_interval          5        
    retry_interval         1      
    max_check_attempts     10      
    check_command         check-host-alive  
    notification_period     workhours      
    notification_interval   30        
    notification_options     d,r    
    contact_groups         admins     
    register         0      
    }


define host{
    name                 generic-switch    
    use                 generic-host    
    check_period         24x7     
    check_interval         5      
    retry_interval          1        
    max_check_attempts       10      
    check_command         check-host-alive    
    notification_period     24x7     
    notification_interval   30      
    notification_options     d,r      
    contact_groups         admins      
    register               0      
    }


define service{
        name                generic-service     
        active_checks_enabled        1               
        passive_checks_enabled       1                 
        parallelize_check          1             
        obsess_over_service         1               
        check_freshness           0               
        notifications_enabled        1               
        event_handler_enabled        1             
        flap_detection_enabled       1           
        failure_prediction_enabled     1               
        process_perf_data          1             
        retain_status_information      1               
        retain_nonstatus_information    1           
        is_volatile             0             
        check_period            24x7          
        max_check_attempts         3            
        normal_check_interval        10          
        retry_check_interval        2          
        contact_groups           admins            
     notification_options         w,u,c,r          
        notification_interval        60            
        notification_period         24x7          
        register              0               
        }


define service{
    name                 local-service         
     use                 generic-service      
      max_check_attempts      4          
      normal_check_interval     5            
      retry_check_interval      1        
      register             0        
    }
##以下是pnp4nagios的配置

define host {
name host-pnp
register 0
action_url /pnp4nagios/index.php/graph?host=$HOSTNAME$&srv=_HOST_' class='tips' rel='/pnp4nagios/index.php/popup?host=$HOSTNAME$&srv=_HOST_
#action_url /pnp4nagios/index.php/graph?host=$HOSTNAME$
#process_perf_data 1


define service {
name srv-pnp
register 0
action_url /pnp4nagios/index.php/graph?host=$HOSTNAME$&srv=$SERVICEDESC$' class='tips' rel='/pnp4nagios/index.php/popup?host=$HOSTNAME$&srv=$SERVICEDESC$
#action_url /pnp4nagios/index.php/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
#process_perf_data 1
}

 
 
 
 
 
 
 
 
 
 
 
 
 
 
h.
localhost.cfg

# Define a host for the local machine

define host{
        use                  linux-server,host-pnp      
        host_name               localhost
        alias                 localhost
        address           127.0.0.1
     statusmap_image         linux40.png
        }


# Define an optional hostgroup for Linux machines

define hostgroup{
        hostgroup_name      linux-servers            
        alias               Linux Servers            
        members             localhost,linux_server #(有其他linux机器在这里加入组)   
        }

# Define a service to "ping" the local machine

define service{
        use                 local-service                        
        host_name              localhost
        service_description         PING
     check_command             check_ping!100.0,20%!500.0,60% action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$ #调用pnp4nagios出图
        }


# Define a service to check the disk space of the root partition
# on the local machine.  Warning if < 20% free, critical if
# < 10% free space on partition.

define service{
        use                 local-service                    
        host_name             localhost
        service_description         Root Partition
        check_command             check_local_disk!20%!10%!/ action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
        }



# Define a service to check the number of currently logged in
# users on the local machine.  Warning if > 20 users, critical
# if > 50 users.

define service{
        use                 local-service                      
        host_name              localhost
        service_description         Current Users
        check_command             check_local_users!20!50 action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
        }


# Define a service to check the number of currently running procs
# on the local machine.  Warning if > 250 processes, critical if
# > 400 users.

define service{
        use                 local-service                 
        host_name              localhost
        service_description         Total Processes
        check_command             check_local_procs!250!400!RSZDT action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
        }


# Define a service to check the load on the local machine.

define service{
        use                 local-service        
        host_name             localhost
        service_description         Current Load
        check_command             check_local_load!5.0,4.0,3.0!10.0,6.0,4.0 action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
        }


# Define a service to check the swap usage the local machine.
# Critical if less than 10% of swap is free, warning if less than 20% is free

define service{
        use                 local-service        
        host_name              localhost
        service_description         Swap Usage
        check_command             check_local_swap!20!10 action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
        }



# Define a service to check SSH on the local machine.
# Disable notifications for this service by default, as not all users may have SSH enabled.

define service{
        use                 local-service        
        host_name              localhost
        service_description         SSH
        check_command             check_ssh
        notifications_enabled       0 action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
        }


# Define a service to check HTTP on the local machine.
# Disable notifications for this service by default, as not all users may have HTTP enabled.

define service{
        use                local-service        
        host_name             localhost
        service_description        HTTP
       check_command            check_http
     notifications_enabled       0 action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
        }
#以后新的机器需要监控,也类似,不赘述了。     

验证Nagios配置文件的正确性:

# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Nagios提供的这个验证功能非常有用,在错误信息中通常会打印出错误的配置文件以及文件中的哪一行,这使得nagios的配置变得非常容易,报警信息通常是可以忽略的,因为一般那些只是建议性的。

# service nagios start
若出现以下错误:
[root@localhost etc]# service nagios restart
Running configuration check...done.
Stopping nagios: done.
Starting nagios:This account is currently not available.
done.

将/etc/passwd下
nagios:x:501:501::/home/nagios: /sbin/nologin
修改为
nagios:x:501:501::/home/nagios: /bin/bash

汉化(喜欢英文的同学忽略):

[root@nagios-server opt]# tar -zxvf nagios-cn-3.2.3.tar.gz
[root@nagios-server opt]# cd nagios-cn-3.2.3
[root@nagios-server nagios-cn-3.2.3]# ./configure --prefix=/usr/local/nagios/
[root@nagios-server nagios-cn-3.2.3]# make all
[root@nagios-server nagios-cn-3.2.3]# make install

安装后监控页面感觉被该的有点丑:
/usr/local/nagios/share/stylesheets
stylesheets是定义界面的CSS结构。可以将原来的替换成汉化后的。
[root@localhost share]# cd /usr/local/nagios/share/ssi
[root@train ssi]# mv common-footer.ssi common-footer.ssi.bak
可以将下面的“感谢使用nagios-cn工程,工程代码主要源自Nagios工程和Nagiosgraph项目。”去除

安装pnp4nagios查看图形:

安装rrdtool
[root@nagios-server nagios-cn-3.2.3]# cd /opt/
 
 
[root@nagios-server opt]# yum install rrdtool
[root@nagios-server opt]# tar -zxvf pnp4nagios-0.6.14.tar.gz
[root@nagios-server opt]# cd pnp4nagios-0.6.14
[root@nagios-server pnp4nagios-0.6.14]# yum install -y  perl-Time-HiRes
[root@nagios-server pnp4nagios-0.6.14]# ./configure --prefix=/usr/local/pnp4nagios --with-nagios-user=nagios --with-nagios-group=nagios
[root@nagios-server pnp4nagios-0.6.14]# make all
[root@nagios-server pnp4nagios-0.6.14]# make fullinstall
[root@nagios-server pnp4nagios-0.6.14]# cd /usr/local/pnp4nagios/etc/
[root@nagios-server etc]# mv misccommands.cfg-sample misccommands.cfg
[root@nagios-server etc]# mv nagios.cfg-sample nagios.cfg
[root@nagios-server etc]# mv rra.cfg-sample rra.cfg
[root@nagios-server etc]# mv pages/web_traffic.cfg-sample pages/web_traffic.cfg
[root@nagios-server etc]# cd check_commands/
[root@nagios-server check_commands]# mv check_all_local_disks.cfg-sample check_all_local_disks.cfg
[root@nagios-server check_commands]# mv check_nrpe.cfg-sample check_nrpe.cfg
[root@nagios-server check_commands]# mv check_nwstat.cfg-sample check_nwstat.cf

在/usr/local/apache2/conf/httpd.conf下最后面加入:
Alias /pnp4nagios "/usr/local/pnp4nagios/share"

<Directory "/usr/local/pnp4nagios/share">
        AllowOverride None
        Order allow,deny
        Allow from all
        #
        # Use the same value as defined in nagios.conf
        #
        AuthName "Nagios Access"
        AuthType Basic
        AuthUserFile /usr/local/nagios/etc/htpasswd
        Require valid-user
        <IfModule mod_rewrite.c>
                # Turn on URL rewriting
                RewriteEngine On
                Options FollowSymLinks
                # Installation directory
                RewriteBase /pnp4nagios/
                # Protect application and system files from being viewed
                RewriteRule ^(application|modules|system) - [F,L]
                # Allow any files or directories that exist to be displayed directly
                RewriteCond %{REQUEST_FILENAME} !-f
                RewriteCond %{REQUEST_FILENAME} !-d
                # Rewrite all other URLs to index.php/URL
                RewriteRule .* index.php/$0 [PT,L]
        </IfModule>
</Directory>


复制鼠标悬停图标显示文件:

[root@localhost share]# cp /opt/pnp4nagios-0.6.14/contrib/ssi/* /usr/local/nagios/share/ssi/
此时点击页面有1个问题(红色小太阳):

PHP magic_quotes_gpc PHP magic_quotes_gpc is deprecated

[root@localhost php]# cp /opt/php-5.3.28/php.ini-production /usr/local/php/lib/php.ini
[root@nagios-server opt]# cd /usr/local/pnp4nagios/share/
[root@localhost share]# mv install.php install.php.bak

#配置部分已在上文给出,##
##所有的配置都在百度云的nagios包里,名为etc.tar.gz若有需要可自行查看


访问下nagios监控界面(汉化后):


naigos监控主机的界面:



naigos监控服务的界面:



点红色太阳出图的界面:


##下面就不一 一 展示了。


安装check_nrpe插件(服务端):


先来看下nrpe的工作原理:


1.NRPE   总共由两部分组成:
check_nrpe   插件,位于监控主机上
NRPE daemon ,运行在远程的 Linux 主机上 ( 通常就是被监控机 )


2.NRPE 工作原理:

Nagios需要监控某个远程Linux主机的服务或者资源情况时:

2.1 Nagios 会运行 check_nrpe 这个插件,告诉它要检查什么;
2.2 check_nrpe 插件会连接到远程的 NRPE daemon ,所用的方式是 SSL
2.3 NRPE daemon 会运行相应的 Nagios 插件来执行检查;
2.4 NRPE daemon 将检查的结果返回给 check_nrpe 插件,插件将其递交给 nagios 做处理。

注意:NRPE daemon需要Nagios插件安装在远程的Linux主机上,否则,daemon不能做任何的监控。

[root@nagios-server opt]# tar -zxvf nrpe-2.14.tar.gz
[root@nagios-server opt]# cd nrpe-2.14
[root@nagios-server nrpe-2.14]# ./configure
[root@nagios-server nrpe-2.14]# make all
[root@nagios-server nrpe-2.14]# make install-plugin

#若要监控交换机需要添加snmp服务,需要重新编译下plugins
[root@nagios-server nrpe-2.14]# yum install net-snmp-devel net-snmp-utils
[root@nagios-server opt]# cd /opt/nagios-plugins-2.1.1
[root@nagios-server nagios-plugins-2.1.1]# ./configure --prefix=/usr/local/nagios --with-snmpget-command=/usr/bin/snmpwalk --with-snmpgetnext-command=/usr/bin/snmpwalk
[root@nagios-server nagios-plugins-2.1.1]# make
[root@nagios-server nagios-plugins-2.1.1]# find / -name check_snmp
/opt/nagios-plugins-2.1.1/plugins/check_snmp
[root@nagios-server nagios-plugins-2.1.1]# cp /opt/nagios-plugins-2.1.1/plugins/check_snmp /usr/local/nagios/libexec/


安装check_nrpe插件(客户端):

[root@elk ~]# useradd nagios
[root@elk ~]# passwd nagios
[root@nagios-server opt]# cd /opt/
[root@elk opt]# scp [email protected]:/opt/nagios-plugins-2.1.1.tar.gz ./
[root@elk opt]# tar -zxvf nagios-plugins-2.1.1.tar.gz
[root@elk opt]# cd nagios-plugins-2.1.1
[root@elk nagios-plugins-2.1.1]# ./configure --prefix=/usr/local/nagios
[root@elk nagios-plugins-2.1.1]# make && make install

#这一步完成后会在/usr/local/nagios/下生成三个目录include、libexecshare。
[root@elk nagios-plugins-2.1.1]# chown nagios.nagios /usr/local/nagios/
[root@elk opt]# scp [email protected]:/opt/nrpe-2.14.tar.gz /opt/
[root@elk opt]# tar -zxvf nrpe-2.14.tar.gz
[root@elk opt]# cd nrpe-2.14
[root@elk nrpe-2.14]# ./configure
[root@elk nrpe-2.14]# make all
[root@elk nrpe-2.14]# make install-plugin
[root@elk nrpe-2.14]# make install-daemon
[root@elk nrpe-2.14]# make install-daemon-config

#现在再查看nagios 目录就会发现有5个目录了
[root@elk nrpe-2.14]# ls /usr/local/nagios/
bin  etc  include  libexec  share
[root@elk nrpe-2.14]# vim /usr/local/nagios/etc/nrpe.cfg
allowed_hosts=127.0.0.1,172.16.27.57

#修改为服务端的IP

#启动nrpe:
[root@elk nrpe-2.14]# /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
[root@elk nrpe-2.14]# ps -ef |grep nrpe
nagios   29271     1  0 11:16 ?        00:00:00 /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
root     29273  7990  0 11:17 pts/2    00:00:00 grep nrpe
[root@elk nrpe-2.14]# netstat -lanp|grep 5666
tcp        0      0 0.0.0.0:5666      0.0.0.0:*       LISTEN      29271/nrpe
[root@elk nrpe-2.14]# /usr/local/nagios/libexec/check_nrpe -H localhost
NRPE v2.14
#返回本说明成功
#然后在服务端也进行测试,记得关闭防火墙:
[root@nagios-server opt]# /usr/local/nagios/libexec/check_nrpe -H 172.16.27.43 NRPE v2.14
#返回本说明成功

##然后我们需要去服务端上添加个配置文件,告诉nrpe去监控那个客户机。
[root@nagios-server objects]# cd /usr/local/nagios/etc/objects/linux/
[root@nagios-server linux]# cp ../localhost.cfg ./elk.cfg #然后我们复制一个文件进行修改,文件名自行考虑
[root@nagios-server linux]# vim elk.cfg

define host{
        use           linux-server,host-pnp                                                                 ; This host definition will inherit all variables that are defined
        host_name        elk
        alias          elk
hostgroups      linux-servers
        address         172.16.27.43
      }

#define hostgroup{
# hostgroup_name  linux-servers
# alias          Linux Servers
# members         localhost,elk
# }

define service{
        use            local-service       
        host_name         elk
        service_description    PING
        check_command      check_ping!100.0,20%!500.0,60%
        action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
         }

define service{
        use            local-service    
        host_name         elk
        service_description    Root Partition
        check_command       check_nrpe!check_disk
        action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
        }

define service{
        use           local-service      
        host_name         elk
        service_description    Current Users
        check_command       check_nrpe!check_users
        action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
        }

define service{
        use            local-service       
        host_name         elk
        service_description    Total Processes
        check_command       check_nrpe!check_total_procs
action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
}

define service{
        use           local-service      
        host_name         elk
        service_description    Current Load
        check_command       check_nrpe!check_load
        action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
        }

define service{
        use            local-service         ; Name of service template to use
        host_name         elk
        service_description    Swap Usage
        check_command       check_nrpe!check_swap
        action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
        }

define service{
        use            local-service         ; Name of service template to use
        host_name         elk
        service_description    SSH
        check_command       check_ssh
        notifications_enabled   0
action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
}

define service{
        use           local-service         ; Name of service template to use
        host_name        elk
        service_description    HTTP
        check_command       check_http
        notifications_enabled   0
action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
}

#然后去客户端上修改nrpe.cfg文件

[root@elk etc]# vim /usr/local/nagios/etc/nrpe.cfg

#将最下面的类似command的内容注释了,然后换成下面的。
command[check_swap]=/usr/local/nagios/libexec/check_swap -w 20% -c 10%
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_disk]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200




#localhost.cfg里面需要加上elk,或者可以按照自己的方式分组、、
#define hostgroup{
#        hostgroup_name  linux-servers
#        alias      Linux Servers
#        members     localhost,elk
#        }



[root@nagios-server linux]# sed -i 's/localhost/elk/g' elk.cfg #将localhost都替换成elk
[root@nagios-server ~]# service nagios restart

然后我们看下web效果:



因为elk上没安装web服务,所有会显示警告,

到这里关于nagios的配置,也结束了。之后大家只要自己多尝试下,肯定也可以的。。。。


三、Nagios插件的开发

我这里抛砖引玉一下,举一个非常简单的例子,大家应该一看就可以明白,nagios插件支持多种语言。shell、python、、、

这里么我使用的是shell。。


#上面我们知道了,客户端的监控数据是通过nrpe采集之后再上传给nagios服务器,那么我们写插件的时候
也同样在客户端编写,然后将数据传给nrpe。

#我们这里尝试写一个监控memcached服务的脚本。
[root@elk libexec]# cd /usr/local/nagios/libexec/
[root@elk libexec]# vim check_memcached
#!/bin/bash

STATE_OK=0
STATE_CRITICAL=2
W=`netstat -lanp | grep 11211 | wc -l`

if [ $W -ge 1 ];then
    echo "OK,Memcached is working!"
    exit $STATE_OK;
else
    echo "WARING,Memcached is not working!"
    exit $STATE_CRITICAL
fi

#脚本十分简单,就是查看memcached进程是否存活。若存活就返回0,否则返回2,这里不做过多介绍了

[root@elk libexec]# chmod 777 check_memcached
[root@elk libexec]# ./check_memcached
WARING,Memcached is not working!
[root@elk libexec]# service memcached start
正在启动 memcached:         [确定]
[root@elk libexec]# ./check_memcached
OK,Memcached is working!
[root@elk libexec]# service memcached stop
停止 memcached:           [确定]
[root@elk libexec]# ./check_memcached
WARING,Memcached is not working!

#经过测试,脚本可以监控memcached的运行情况。

[root@elk libexec]# vim /usr/local/nagios/etc/nrpe.cfg
command[check_swap]=/usr/local/nagios/libexec/check_swap -w 20% -c 10%
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_disk]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200

command[check_memcached]=/usr/local/nagios/libexec/check_memcached -c 2

#标红色的这个是新添加的监控命令,其他不变。

[root@elk libexec]# ps -ef |grep nrpe
nagios   30360     1  0 14:03 ?        00:00:00 /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
root     32258 31396  0 15:51 pts/3    00:00:00 grep nrpe
[root@elk libexec]# kill -9 30360
[root@elk libexec]# /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d

#客户端的修改完成了,然后去服务端添加一下命令:

[root@nagios-server linux]# vim /usr/local/nagios/etc/objects/linux/elk.cfg
#在最后面添加这段即可
define service{
        use                 local-service         ; Name of service template to use
        host_name              elk
        service_description          Check Memcached
        check_command             check_nrpe!check_memcached
        action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
}


下面看下结果:

当memcached服务关闭的时候,



当memcached服务开启的时候,


关于自定义脚本的监控大家可以自己再研究下,不过思路就是这样了,其他的就要看你研发的功力了。。。

终于完工了,,,感觉自己搞下用不了多少时间,但是写文档竟然用了2天时间。

感谢网上那么多达人的文档,我只是参考了众多人的劳动成果,再进行了总结,若有相似之处,纯属正常。。。。。

猜你喜欢

转载自blog.csdn.net/hiyun9/article/details/51881839
今日推荐