zabbix is not running solution

Reprinted from:  http://fengzhige.blog.51cto.com/3691377/1034485

In the Linux system, almost all running services will generate relatively log (log), and the running program will have an error prompt when there is an error. Even if there is no prompt, you can use "echo $" to check whether the operation is successful. I've been using zabbix for a while, so let's sort out the problems and solutions I've encountered.

The log of zabbix is ​​stored under /tmp, the log corresponding to the server side is zabbix_server.log, and the log corresponding to the monitored side is zabbix_agentd.log.

1. Whether the zabbix service has been successfully opened

  1. Check if the system already has a zabbix process running 
  2. # ps aux |grep zabbix 
  3. Check whether the system has listened to ports 10050 and 10051 used by zabbix server and zabbix agent 
  4. # netstat -nplut |grep zabbix 
  5. If not, start it: #/etc/init.d/zabbix_server_ctl start 
  6. #/etc/init.d/zabbix_agent_ctl start

It is especially important to note that after each modification of the configuration file, the corresponding zabbix server or zabbix agentd needs to be restarted.

Some running scripts cannot close zabbix when restarting, so the service cannot be restarted. You can use the kill command to kill the zabbix-related process and restart it.

2. The prompt that appears in zabbix_server.log

2009:20121023:193549.354 Sending list of active checks to [192.168.30.3] failed: host [CentOS-3] not found

This is because the Hostname in the zabbix_agentd.conf configuration file corresponds to the hostname in the web.

Third, there are errors in the web page

1,

 

 Get value from agent failed: cannot connect to [[192.168.30.2]:10050]: [111] Connection refused

 

192.168.30.2 is my zabbix server, which also has its own agent function to monitor itself. This error occurs because I forgot to open an account zabbix_agentd on the zabbix server. There are also hints in the Last 20 issues

Last 20 issues
 
Host Issue Last change Age Ack Actions
Zabbix server Server Zabbix server is unreachable 23 Oct 2012 18:42:14 6m 57s No
-

Solution: Open zabbix_agentd.

2,

 

 Get value from agent failed: cannot connect to [[192.168.30.3]:10050]: [113] No route to host

 

See the prompt "No route to host", which is related to the network connection. The methods of exclusion are as follows:

a) Check if the machine 192.168.30.3 is powered on

b) Ping this machine on the zabbix server side to see if the network is connected

c) Use telnet to log in to ports 10050 and 10051 to see if the host allows these two ports to communicate

d) Check whether the iptables firewall rules block ports 10050 and 10051

3,

The following red prompts keep appearing in the webpage:

zabbix server is not running: the information displayed may not be current.

zabbix server is running | No.

Check /tmp/zabbix_server.log and /tmp/zabbix_agent.log without any exception. Looking at the zabbix_server and zabbix_agent processes and ports are normal... After a few googles and trying, I finally got the solution!

http://www.zabbix.com/forum/showthread.php?t=23878&page=3 这里面有说到zabbix受selinux的影响而已有这种错误提示。

http://www.zabbix.com/forum/showthread.php?t=25321 这里面说到了修改hostname为IP的做法。

我具体的做法是:

①查看selinux产生的log,确实有错误提示:

 

#tail -f /var/log/audit/audit.log

type=AVC msg=audit(1351863204.990:32): avc:  denied  { name_connect } for  pid=1575 comm="httpd" dest=10051 scontext=system_u:system_r:httpd_t:s0 tcontext=system_u:object_r:port_t:s0 tclass=tcp_socket

type=SYSCALL msg=audit(1351863204.990:32): arch=40000003 syscall=102 success=no exit=-13 a0=3 a1=bfd494b0 a2=b76b0ad8 a3=d items=0 ppid=1434 pid=1575 auid=4294967295 uid=48 gid=48 euid=48 suid=48 fsuid=48 egid=48 sgid=48 fsgid=48 tty=(none) ses=4294967295 comm="httpd" exe="/usr/sbin/httpd" subj=system_u:system_r:httpd_t:s0 key=(null)

②然后让selinux允许它通过

  1. setsebool -P httpd_can_network_connect on 

③编辑zabbix.conf.php文件,把$ZBX_SERVER的值改为本机的IP地址

$ZBX_SERVER                     = '192.168.30.2'; #######用IP代替hostname

④OK

 

 

用户自定义脚本监控:

a)

有时候用户自定义的脚本运行的时间可能比较长,如超过10秒的20秒的。这时在执行zabbix_agentd -p 或者zabbix_agentd -t时就可能出现“Alarm clock”,从而得不到想要的结果。这是因为zabbix agentd配置文件中定义Timeout时间默认为3秒,脚本运行取结果的时间超过了3秒就会出现这种情况。

解决方法:编辑配置文件/etc/zabbix/zabbix_agentd.conf,找到"Timeout"把它定义为30秒或小于30秒。

b)

对a中的情况还需要注意对zabbix服务器端的配置,如我自己定义的脚本

  1. UserParameter=ping.avgtime,ping 192.168.30.2 -c 10 -w 29 |grep 'avg' |awk -F "/" '{print $5}' 
  2. 对192.168.30.2 ping 10取平均值,-w参数是对ping限定时间为29秒 

这个脚本运行的大概时间为10秒左右,此时在agent端虽然可以用zabbix_agentd -t得到结果,但是在zabbix服务器端日志会不断的出现

 

1762:20121023:191941.360 resuming Zabbix agent checks on host [Zabbix server]: connection restored

  1761:20121023:191952.149 Zabbix agent item [ping.avgtime] on host [CentOS-3] failed: first network error, wait for 15 seconds

  1762:20121023:192010.610 Zabbix agent item [ping.avgtime] on host [CentOS-3] failed: another network error, wait for 15 seconds

  1762:20121023:192028.628 Zabbix agent item [ping.avgtime] on host [CentOS-3] failed: another network error, wait for 15 seconds

 

Such an error log is not drawn on the web side.

Solution:

①Edit the configuration file /etc/zabbix/zabbix_server.conf on the zabbix server side to find "Timeout" and define it as 30 seconds or less.

②If there is a similar prompt, it should be that the memory of the zabbix server is set too small, just increase the server memory.

 

 

1:   http://www.jincon.com/archives/169/

 

2http://fengzhige.blog.51cto.com/3691377/1034485

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326834787&siteId=291194637