Problems encountered in the zabbix production environment

1) The Zabbix monitoring interface reports the error "Lack of free swap space on Zabbix server"

The homepage of the monitoring interface of zabbix 3.0 deployed online by the company reports an error saying that the host without swap memory "Lack of free swap space on Zabbix server"

The steps to resolve this issue are as follows:

        Select Configuration->Templates (template), select Triggers (trigger) on the right side of Template OS Linux in the template interface, open the Lack of free swap space on {HOST.NAME} project in the trigger page, and in the newly opened trigger Modify the content of Expression (expression) in the editor editing page, from the original

{Template OS Linux:system.swap.size[,pfree].last(0)}<50

change into

{Template OS Linux:system.swap.size[,pfree].last(0)}<50 and {Template OS Linux:system.swap.size[,free].last(0)}<>0

The modification here adds " and {Template OS Linux:system.swap.size[,free].last(0)}<>0" to judge that the system has swap space. When the system has no swap space, it means {Template OS Linux:system. If the value of swap.size[,free].last(0)} is 0, it will not trigger an error message if the expression does not hold. "Lack of free swap space" issues reported by Zabbix before in the next update cycle are automatically marked as Resolved after saving.


2) The "Zabbix poller processes more than 75% busy" alarm appears in the zabbix monitoring world

After the zabbix monitoring environment deployed online has been running for a period of time, the alarm "Zabbix poller processes more than 75% busy" suddenly appeared

In fact, there are many types of Zabbix monitoring alarms. The more common ones are memory exhaustion, network failure, IO too slow, and this "Zabbix poller processes more than 75% busy". At the beginning, I didn't care much because it didn't affect the use and it would resolve itself for a while. Then as the database grows, Zabbix consumes more and more memory, and Poller processes (polling) begin to be Busy every day.

In the end, it turns out that solving this problem is easy!

You can increase the number of processes initialized when Zabbix Server starts, but doing so directly increases the polling load, which can be done when the memory configuration is sufficient.


Specifically edit the configuration file /etc/zabbix/zabbix_server.conf of Zabbix Server and find the paragraph that configures StartPollers:

### Option: StartPollers

# Number of pre-forked instances of pollers.

#

# Mandatory: no

# Range: 0-1000

# Default:

# StartPollers=5


Cancel the # comment before StartPollers, and modify 5 to 10 or more [because the online machine has 64G memory, I modified it to 60 or 80 here]

After modification, restart zabbix_server

#pkill -9 zabbix_server

#/usr/local/zabbix/sbin/zabbix-server

After a while it turns out that there are no similar warnings in the trigger anymore


Of course, we can also write a script to restart zabbix_server at other times to reduce the load

Below is the script /root/zabbix-restart.sh

#!/bin/bash

/usr/bin/pkill zabbix_server

/usr/local/zabbix/sbin/zabbix_server

Then crontab does the scheduled task

0 3 * * * /bin/bash -x /root/zabbix-restart.sh > /dev/null 2>&1


3)zabbix Too many processes on

Solution: Set the threshold of the corresponding trigger to a larger value (the default is 300, which can be changed to 3000)

image.png

image.png

4) No data can be obtained in the monitoring map

You can first pass the command on the command line of the server:

# /usr/local/zabbix/bin/zabbix_get -s 192.168.1.10 -p 10050 -k "mysql.status[Uptime]"

in:

        -s is followed by the ip address of the monitored machine;

      -k is followed by the key value of the monitoring item, which can be found in the corresponding monitoring item on the zabbix page.

If the data can be obtained through the above command on the server, then the graph of the zabbix monitoring page shows that the data cannot be obtained, which may be a configuration problem in the web page.


5) The zabbix_server service is closed due to memory overflow

138401:20170630:172159.850 using configuration file: /data/zabbix/etc/zabbix_server.conf

138401:20170630:172159.854 current database version (mandatory/optional): 03020000/03020000

138401:20170630:172159.854 required mandatory version: 03020000

138401:20170630:172200.238 __mem_malloc: skipped 0 asked 48 skip_min 4294967295 skip_max 0

138401:20170630:172200.238 [file:strpool.c,line:53] zbx_mem_malloc(): out of memory (requested 42 bytes)

138401:20170630:172200.238 [file:strpool.c,line:53] zbx_mem_malloc(): please increase CacheSize configuration parameter


Solution:

Open zabbix_server.conf and find Option: CacheSize

Remove the # comment in front of the original # CacheSize=8M, and change 8M to 1024. This 1024 is modified according to the server performance.


# vim /data/zabbix/etc/zabbix_agentd.conf

......

CacheSize=1024M


Then restart zabbix_server


6) The connection fails due to the excessive number of zabbix database connections

mysql> show variables like 'max_connections';
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| max_connections | 152   |
+-----------------+-------+
1 row in set (0.00 sec)
 
The default is 152 connections. The modification method is as follows:
1) Temporary modification
mysql> set GLOBAL max_connections=1024;
mysql> show variables like 'max_connections';
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| max_connections | 1024  |
+-----------------+-------+
1 row in set (0.00 sec)

 

2) Permanent modification

Configure in the my.cnf file:

[mysqld] //Add a new line with the following parameters

max_connections=1000

 

Restart the mysql service



7) The load displayed in the cpu monitoring graph in the zabbix web interface is 0.002-0.0014, which is obviously wrong and inconsistent with the actual cpu load of uptime on the server!


Solution:

Modify the template (Template OS Linux)--monitoring items--Processor load (1 min average per core)--key value:

Change system.cpu.load[percpu,avg1] to system.cpu.load[all,avg1]


8) The following error appears in zabbix_server.log:


The following error appears in zabbix_server.log:
95213:20180101:154323.271 cannot send list of active checks to "10.0.8.20": host [jumpserver01.kevin.cn] not found
95212:20180101:154323.549 cannot send list of active checks to "10.0.56.21": host [cx-app02.kevin.cn] not found
95216:20180101:154324.768 cannot send list of active checks to "10.0.54.21": host [bl2-app02.kevin.cn] not found
95212:20180101:154325.072 cannot send list of active checks to "10.0.52.22": host [nc-app02.kevin.cn] not found


Cause Analysis:

The hostname content configured in the zabbix_agentd.conf file is inconsistent with the hostname configuration of the zabbix web interface "Configuration" -> "Host", and it can be changed to the same content!


9) The following error appears in zabbix_server.log:


95219:20180101:162139.869 fping failed: /usr/local/sbin/fping: can't create raw socket (must run as root?) : Operation not permitted
95219:20180101:162140.871 fping failed: /usr/local/sbin/fping: can't create raw socket (must run as root?) : Operation not permitted
95219:20180101:162141.874 fping failed: /usr/local/sbin/fping: can't create raw socket (must run as root?) : Operation not permitte


Solution:

1) Make sure that the zabbix of the zabbix agent client has sudo permissions

[root@web01 ~]# chattr -i /etc/sudoers
[root@web01 ~]# chmod 640 /etc/sudoers
[root@web01 ~]# echo "zabbix  ALL=(ALL)      NOPASSWD: ALL" >> /etc/sudoers
[root@web01 ~]# chmod 440 /etc/sudoers
[root@web01 ~]# chattr +i /etc/sudoers

 

2) Modify the permissions of zabbix's server server-side fping, this step is very important! !

[root@zabbix01 ~]# ll /usr/local/sbin/fping
-rwxr-xr-x 1 root root 67110 12月 11 17:18 /usr/local/sbin/fping
[root@zabbix01 ~]# chmod u+s /usr/local/sbin/fping

 

Then switch to the zabbix user for testing

[root@zabbix01 ~]# su - zabbix
[zabbix@zabbix01 ~]$ /usr/local/sbin/fping -s oa-mob01.kevin.cn
oa-mob01.kevin.cn is alive
 
       1 targets
       1 alive
       0 unreachable
       0 unknown addresses
 
       0 timeouts (waiting for response)
       1 ICMP Echos sent
       1 ICMP Echo Replies received
       0 other ICMP received
 
 0.58 ms (min round trip time)
 0.58 ms (avg round trip time)
 0.58 ms (max round trip time)
        0.001 sec (elapsed real time

 

If it returns XX.XX.XX.XX is alive, it means it is OK!

10) Problem description: Start zabbix_agent on a zabbix monitored server (64-bit centos6.8 system, 64G content), and find that the process cannot be started, and port 10050 is not up!


There was no error when starting the zabbix_agent process, but port 10050 did not start normally.

[root@ctl ~]# /usr/local/zabbix/sbin/zabbix_agentd
[root@ctl ~]# ps -ef|grep zabbix_agent
root 27506 27360 0 11:07 pts/5 00:00:00 grep --color zabbix
[root@ctl etc]# lsof -i:10050


Check the /usr/local/zabbix/logs/zabbix_agentd.log log and find that the error is as follows:

................

27667:20161027:111554.851 cannot allocate shared memory of size 657056: [28] No space left on device

27667:20161027:111554.851 cannot allocate shared memory for collector

..............


Cause Analysis:

This is due to the kernel's limit on share memory.


Process record:

[root@ctl logs]# ipcs -l
------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 1940588
max total shared memory (kbytes) = 8388608
min seg size (bytes) = 1
------ Semaphore Limits --------
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 100
semaphore max value = 32767
------ Messages: Limits --------
max queues system wide = 32768
max size of message (bytes) = 65536
default max size of queue (bytes) = 65536


You can see from the results of the above command:

The max total shared memory is set to 2M, and the max seg size is set to 8M, which is obviously not enough to allocate (allocate) the memory used by zabbix_agent to start.


To view the current shared memory settings,

[root@ctl logs]# sysctl -a|grep shm
kernel.shmmax = 1987162112
kernel.shmall = 2097152
kernel.shmmni = 4096
kernel.shm_rmid_forced = 0
vm.hugetlb_shm_group = 0


Among them, kernel.shmall represents the shared memory that can be allocated in total, here is 2G, kernel.shmax represents the memory (in bytes) that can be allocated by a single segment, here is 2M, so there must be a problem!


Then check /etc/sysctl.conf

[root@ctl logs]# cat /etc/sysctl.conf
........
kernel.shmall = 2097152
kernel.shmmax = 1987162112


Obviously, the values ​​of the kernel.shamll and kernel.shmmax parameters set in the sysctl.conf file are small.


--------------------------------------------------------------------------------------------------------------------------------------------------

This machine is a 64-bit centos 6.8 system with 64G memory. Check other monitored servers of the same system and find:

[root@bastion-IDC ~]# cat /etc/sysctl.conf 
........
kernel.shmmax = 68719476736
kernel.shmall = 4294967296
[root@ctl logs]# ipcs -l
------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 67108864
max total shared memory (kbytes) = 17179869184
min seg size (bytes) = 1
------ Semaphore Limits --------
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 100
semaphore max value = 32767
------ Messages: Limits --------
max queues system wide = 32768
max size of message (bytes) = 65536
default max size of queue (bytes) = 65536


That is, the default values ​​of the above two parameters of the 64-bit centos6 system (64G) are 64G and 4G, and the settings are the maximum memory that the system can recognize.

---------------------------------------------------------------------------------------------------------------------------------------------------


Now you only need to increase the value of these two parameters locally to solve the problem!

[root@ctl logs]# cat /etc/sysctl.conf
........
kernel.shmmax = 68719476736
kernel.shmall = 4294967296
kernel.msgmnb = 65536 
kernel.msgmax = 65536
Execute sysctl -p to take effect
[root@ctl logs]# sysctl -p


Check again and find that the modification has been successful!

[root@ctl logs]# sysctl -a|grep shm
kernel.shmmax = 68719476736
kernel.shmall = 4294967296
kernel.shmmni = 4096
kernel.shm_rmid_forced = 0
vm.hugetlb_shm_group = 0
[root@ctl logs]# ipcs -l


------ Shared Memory Limits --------

max number of segments = 4096

max seg size (kbytes) = 67108864

max total shared memory (kbytes) = 17179869184

min seg size (bytes) = 1


------ Semaphore Limits --------

max number of arrays = 128

max semaphores per array = 250

max semaphores system wide = 32000

max ops per semop call = 100

semaphore max value = 32767


------ Messages: Limits --------

max queues system wide = 32768

max size of message (bytes) = 65536

default max size of queue (bytes) = 65536


Finally restarted zabbix and found that port 10050 started successfully:

[root@ctl ~]# /usr/local/zabbix/sbin/zabbix_agentd
[root@ctl logs]# ps -ef|grep zabbix
zabbix 27776 1 0 11:22 ? 00:00:00 /usr/local/zabbix/sbin/zabbix_agentd
zabbix 27777 27776 0 11:22 ? 00:00:00 /usr/local/zabbix/sbin/zabbix_agentd: collector [idle 1 sec]
zabbix 27778 27776 0 11:22 ? 00:00:00 /usr/local/zabbix/sbin/zabbix_agentd: listener #1 [waiting for connection]
zabbix 27779 27776 0 11:22 ? 00:00:00 /usr/local/zabbix/sbin/zabbix_agentd: listener #2 [waiting for connection]
zabbix 27780 27776 0 11:22 ? 00:00:00 /usr/local/zabbix/sbin/zabbix_agentd: listener #3 [waiting for connection]
zabbix 27781 27776 0 11:22 ? 00:00:00 /usr/local/zabbix/sbin/zabbix_agentd: active checks #1 [idle 1 sec]
root 28188 27360 0 11:48 pts/5 00:00:00 grep --color zabbix
[root@ctl logs]# lsof -i:10050
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
zabbix_ag 27776 zabbix 4u IPv4 112357384 0t0 TCP *:zabbix-agent (LISTEN)
zabbix_ag 27777 zabbix 4u IPv4 112357384 0t0 TCP *:zabbix-agent (LISTEN)
zabbix_ag 27778 zabbix 4u IPv4 112357384 0t0 TCP *:zabbix-agent (LISTEN)
zabbix_ag 27779 zabbix 4u IPv4 112357384 0t0 TCP *:zabbix-agent (LISTEN)
zabbix_ag 27780 zabbix 4u IPv4 112357384 0t0 TCP *:zabbix-agent (LISTEN)
zabbix_ag 27781 zabbix 4u IPv4 112357384 0t0 TCP *:zabbix-agent (LISTEN)


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325247757&siteId=291194637