Break through the number of zabbix single server monitoring by modifying the source code

With the rapid growth of business recently, the VPS (Required server performance, new values ​​per second) monitored by zabbix has already exceeded six figures, which may be the maximum throughput of his single server. However, the key performance indicators of zabbix server such as CPU load, memory usage, network IO, disk IO, and MySQL QPS are not high (mainly because the server has a higher configuration: 64c/128G memory/SSD hard disk), which may be reached His single server maximum throughput. The architecture is a single server, single MySQL database, and multi-agent mode. Several agent data queues are delayed and alarms are abnormal. At this time, only the data of the agent node can be cleared to temporarily solve the problem.

By observing the performance indicators of zabbix server itself, it is found that several indicators are particularly high when queue delay occurs: Zabbix busy history syncer processes reach 100%, which is very busy (used to synchronize agent data to the server side, and then flush the data into the database) , Zabbix history write cache free, Zabbix value cache free are basically 0, and check the configuration file of zabbix server to see that it has been set to the maximum value. Because the load of the server is relatively low, and the load of the zabbix server is relatively low, it is obvious that the server resources are not well utilized. If you can modify the upper limit of these parameters through the source code, you may be able to gain a certain amount of time for the split of zabbix server. By searching for these configuration parameters, I found the maximum value of these parameters in the source code src/zabbix_server/server.c (the maximum and minimum configuration parameters of the server are here), adjust the maximum value, recompile, start, After a few days of observation, the previous phenomenon disappeared, and MySQL's QPS also increased a lot (nearly 30%).

There is also a parameter. When the number of monitoring items and hosts increase, the pressure will be greater. Zabbix busy configuration syncer processes, in% (used to synchronize monitoring source data from the server to the agent) is an indicator that is set to 1 process in the source code. Through testing, this value cannot be modified. Otherwise, multiple threads query the database library at the same time, and the same SQL is executed multiple times, and the efficiency is lower. It is recommended that the official optimize the synchronization process of this process, find out once, and synchronize the source data by multiple processes. Improve the performance of synchronizing configuration source data.

It is recommended that the official enlarge the maximum value of these cache values ​​appropriately, so that server resources can be used reasonably and the maximum throughput of zabbix server can be improved. At the same time, it is recommended that the official release a UI for unified management of multiple servers to facilitate operation and maintenance. As an operation and maintenance, this single-server architecture is more trouble-free, but when the business grows to a certain number, the server must be split and secondary development such as unified management of monitoring data to ensure the stable operation of the monitoring service. According to experience, through the optimization of database, hardware, and server configuration parameters, it is not recommended that the VPS monitored by a single server exceed 50,000, otherwise the test of operation and maintenance will be particularly severe.

In addition, if the server is not good and the monitoring business volume is not large, there is no need to modify it with this method, which is meaningless.

src/zabbix_server/server.c 

StartDBSyncers -> 100 -> 1000

CacheSize-->8--->64 #The unit is G

HistoryCacheSize-->2-->64 #The unit is G

HistoryIndexCacheSize-->2-->64 #The unit is G

TrendCacheSize-->2-->64 #The unit is G

image.png


Guess you like

Origin blog.51cto.com/wangwei007/2675900