Several pertinent suggestions 023-zabbix Performance Optimization

With zabbix widely used, few people zabbix server appears on the performance bottleneck, or bottleneck in performance occur in the future, discussed next few simple and effective optimization.

Server Hardware

I would like to make a few simple server configuration to improve performance exponentially, good idea, but the basic is not realistic. Simply put, you need to match better CPU, more memory, a faster hard drive: flowers conditions permit, consider buying SSD, it brings more than cpu and more memory is better, or consider using SAS 15K hard drives, raid group, etc. in a word, configuration optimization without moving, increased hardware investment, do not search for brains: how to optimize zabbix like the article, you are wasting time.

operating system

Use the latest operating systems, optimization, customized operating system kernel. It should be some effect, but it certainly is not.

Database Optimization

DBsock optimization

If MySQL and zabbix server on the same server, socket connection is faster than tcp connection.

Database Separation

The separate database server, database, and zabbix resources independent of each other, for example: You can buy a RDS

Database Engine

Use MySQL5.6 or later, since MySQL is Oracle acquisition, its performance does have a lot of improvement. Be sure to select innodb, do not choose myisam, because zabbix performance innodb in myisam than 1.5 times faster, and myisam insecurity, a large amount of data zabbix monitoring, once the table is broken, and that is a tragedy.

mysql partition, history table like a large amount of data, the partition may try to substitute performance.

Other optimization

1, to reduce history to save time

2, reducing the acquisition time interval item

3, to reduce unnecessary monitoring items

In the case of the above methods or conditions were not invalid, be sure to consider the above recommendations. In the monitoring environment, the above three points is everyone mistakes, most of the item is not required to save the data is too long, some monitoring items simply meaningless, some monitoring interval is too short term. I have been guilty of this in the wrong, but because zabbix performance has been good, not yet corrected, multi-point data point is better than less good, is not it ~

 

 

 

 

 

1. Zabbix performance may slow performance:

  • zabbix queue too many delayed item, can be viewed by administration-queue
  • zabbix drawing FIG often broken, some of the data item no
  • With NoData () trigger function occurs flase
  • Front page no response, or slow response

    a. in moniting apparatus Zabbix agent by collecting data at this time, but the state machine crashes or other causes zabbix agent die not obtain data server, then unreachable poller
    will rise.
    B. moniting is collected by the device agent ZABBIX However, when the state data acquired data to the server agent for too long, even the server often exceeds the timeout time, this time unreachable poller will rise.

   How to measure the performance of Zabbix:

         To measure its performance through the Zabbix NVPS (value of the number per second) . There is a slightly wrong valuation on Zabbix the dashboard.

  

2. Some principles of Zabbix performance optimization:

  • Zabbix internal components to ensure performance in state of a monitored (basic tuning!)
  • Use server hardware performance good enough
  • Different roles separately, using a separate server
  • Using a distributed deployment
  • MySQL performance adjustment
  • Zabbix configuration adjust itself

3. Zabbix slow for several reasons summarized as follows:

  • Zabbix server hardware configuration, recommended better CPU, more memory, a faster hard drive
  • Zabbix architecture, if the overall structure is too large, it is recommended to use a distributed proxy, each server function independently
  • Too much data, vps is too high, zabbix too late to deal
  • Housekeeper set incorrectly, the larger the volume of the database
  • Front-end host too much, too much data query
  • Item Triggers mode of operation and optimization, Triggers too complex

   

Zabbix 3.1 understand the current work status

     Get zabbix internal state

         zabbix[wcache,values,all]

          zabbix [queue, 1m] ---- delay than 1 minute item

     [Reserved] zabbix Optimization Guide

    Zabbix internal components to obtain the operating state (BUSY state of the assembly is the percentage of time)

       zabbix[process,type,mode,state]

    Wherein the available parameters are:

  • type: trapper,discoverer,escalator,alerter,etc
  • mode: avg,count,min,max
  • state: busy,idel

     [Reserved] zabbix Optimization Guide
     [Reserved] zabbix Optimization Guide

3.2 Zabbix performance optimization --- Item mode of operation and optimization Triggers

  • Add proxy nodes, reduces the load on the server side. (The following method useless, then use this way)
  • item default Zabbix work in passive mode, the server can improve performance by setting the active mode.   

   The main talk about using active mode, the use of active checks modes:

  ①zabbix_agentd.conf configuration adjustments

1
2
3
4
5
6
7
8
LogFile=/tmp/zabbix_agentd.log
Server=xxx.xxx.xxx.xxx    server端ip
ServerActive=xxx.xxx.xxx.xx   指定Agentd收集的数据往哪里发送
Hostname=yyy.yyy.yyy.yyy   agent的hostname ,必须要和Server端添加主机时的主机名对应
RefreshActiveChecks=60
BufferSize=10000
MaxLinesPerSecond=200
Timeout=30

     比较重要的参数是ServerActive和Hostname,ServerActive是指定Agentd收集的数据往哪里发送,Hostname是必须要和Server端添加主机时的主机名对应起来,这样Server端接收到数据才能找到对应关系,这里为了兼容被动模式,没有把StartAgents设为0,如果一开始就是使用主动模式的话建议把StartAgents设为0,关闭被动模式。
  ②zabbix_server.conf 配置调整

    StartPollers=100     减少主动收集数据进程,由原来的500---100,减小
    StartTrappers=200    负责处理Agentd推送过来的数据的进程,由原来的50---100 ,变大

  ③模板调整

    a. 以任何一个现有模板为例,clone并重命名,假如重命名模板为TEST
    b. 将模板TEST里所有items和discovery rules里的items都变更type为atvice agent

    至此active-checks模式的agent部署完毕,可以在overview中查看模板中的监控项。

    Tigger中正则表达式函数last()、nodata()的速度是最快的。。。Min()、max()、avg()是最慢的。。。尽量使用速度快的函数

3.3  数据量太大,vps太高,zabbix来不及处理

    通过以下图,可看出哪个item导致慢:     若more than 10 min 有数据则表示对应的Item数据量过大。

解决办法:

  • 修改监控项
  • 调整Item的时间间隔(主要办法)       将zabbix agent监控 timeout时间增大

备注:

调整unsupport items检查时间的方法是:在Adiministration里选择General然后在右侧下拉菜单里选择Other,然后修改Refresh unsupported items (in sec)的值,表示“每多少秒去重新检查一下那些not_supported的值”。

3.4 调整MySQL性能

 采用分布式架构,性能瓶颈的最大可能出现在数据库中。

    • 关闭housekeeper, 将history分区
    • 将zabbix_server.conf中的StartDBSyncers参数上调,表示将数据从zabbix写入数据库的进程是多少
      • 起因:近几日zabbix报警的恢复时间变得很长,页面有卡顿的现象。抓包查看发现,确实是收到了最近正常的值,但是面板不更新,重新zabbix_server进程,才能完成面板更新。

        1. Zabbix性能概述

        当zabbix性能低时会出现多种状况,Zabbix前端页面出现无响应、卡顿、列队无法更新,zabbix图形中经常出现断图,无图。一些item获取不到数据。列队中出现大多被延迟的item

        如何判断zabbix-server性能

        首页导航中通过zabbix状态可以看到zabbix的主机数量、监控项的数目、触发器的数目。并通过zabbix的NVPS(每秒处理数值数)衡量性能标准,NVPS是通过PHP代码编写实现的计算,从总体上反映出了zabbix-server的处理速度。

        NVPS与History的保留时间和Trends的保留时间都有直接关系。如下图中zabbix状态性能提升空间还很大,可以调整主机模版、修改被禁用和不支持的监控项及触发器。

         

        我这里因为服务器比较老,再加上zabbix,mysql都是比较老的所以数字会很低

        可以通过看zabbix对于本身server列队的监控,来确定是什么类型的监控项造成的性能问题,见下图。等待的列队越多、时间越长,说明zabbix-server性能越差。可以针对受影响的监控类型做调整,比如调整监控项的时间间隔,或者根据监控类型定制模版,将模版写到最简。如果以上方法还是没有效果,那么就说明zabbix server压力过大,采用搭建proxy分布式架构,将server的压力分担给proxy

         

        上图是我调整后的

        调整前

         

        从上图可以看出有几个监控项延迟达到3年。。。。。

        先将这几个延迟超长的监控项禁用掉,完事看看队列是否有变化

        2. Zabbix配置文件优化

        Zabbix自带模版还会监控各工作进程的状态,可对数据收集过程中的性能做分析,见下图,数据采集过程和使用缓存的空间容量。需要特别注意的有:

        Zabbix busy housekeeper processes,in %##管家处理数据占缓存的百分比

        Zabbix busy history syncer processes,in %##写入数据库的同步程序占缓存的百分比

        Zabbix busy poller processes,in % ## zabbix轮询进程占比

        Zabbix busy unreachable poller processes in %##不可达的轮询进程占比

         

         

         

         

        root@localhost ~]# vim /etc/zabbix/zabbix_server.conf

        #配置文件前面内容为初始安装zabbix时需要配置的基本参数。找到高级配置这一行开始,涉及优化内容用红色标识填充

        ############ ADVANCED PARAMETERS #################

        ### Option: StartPollers

        # Number of pre-forked instances of pollers.

        #

        # Mandatory: no

        # Range: 0-100

        # Default:

        StartPollers=5

        #填写范围0-100,默认5 。轮询处理监控项的进程数,增加太大会影响服务器本身性能,保持此参数的值尽可能低,20000个监控项大概控制在80左右即可。

        StartIPMIPollers=0

        #IPMI轮询进程实例个数,服务器使用IPMI协议监控时需要更改此项,默认0为关闭

        StartPollersUnreachable=10

        #不可达主机轮询数量。此值特别耗费性能,设置在10-20之间即可,默认1

        StartTrappers=5

        #负责处理agents和proxy推送过来的数据的进程数,默认为5,如果zabbix-agent监控类型较多需要加大此参数

        StartPingers=1

        # ICMP- ping进程轮询实例数,默认为1.,建议更改为20-50之间,根据业务填写即可。

        StartDiscoverers=1

        #自动发现子进程实例数,默认为1,范围0-250

        StartHTTPPollers=1

        #HTTP进程轮询实例个数,默认1,范围0-1000,web监控不多选择默认即可

        HousekeepingFrequency=1

        #zabbix执行管家的频率,从数据库中删除过期的数据。为了防止 housekeeper 过载 (例如, 当历史和趋势周期大大减小时), 对于每一个item,不会在一个housek周期内删除超过4倍HousekeepingFrequency 的过时信息. 因此, 如果 HousekeepingFrequency 是 1, 一个周期内不会删除超过4小时的过时信息,为了降低server压力,kousekeeping延后server启动30分钟,默认为1,最大24,为0时关闭使用。

        MaxHousekeeperDelete=5000

        #执行一个Housekeeping周期时,默认删除的数据条目数。默认5000条。如果设置为0,不限制删除的行数,这种情况数据库多数会崩溃。

        CacheSize=8M

        #缓存大小,单位字节。用于存储主机、监控项、触发器数据的共享内存大小,默认8M最大8G。根据自身zabbix业务需求配置合理的参数。

        CacheUpdateFrequency=60

        #zabbix缓存更新频率,单位秒。设置范围1-3600

        HistoryCacheSize=1024M

        # Historical data cache size, in bytes

        TrendCacheSize = 256M

        # trend data cache size, in bytes. For storing trend data shared memory size

        ValueCacheSize = 1024M

        # historical data cache size, in bytes. Shared memory cache item size 0.05 historical data requests which prohibits caching (not recommended)

        Timeout = 3

        #agent, the SNMP device or external inspection of the timeout period (in seconds), fill in between 1 and 30

        or more as the main parameter configuration optimization, other content profile has a brief description, can be optimized according to business needs change. Configuration parameters set reasonable cause zabbix is in proper working condition. The larger the value, the higher the CPU and memory consumption. After modifying the configuration file, you need to restart zabbix-server process. Load the new configuration take effect
        when the above method is not effective, it is recommended clear trend data. Of course, if you have saved demand, it can only do a slice (This article does not involve fragmentation).

        truncate table history;
        truncate table history_str;
        truncate table history_uint;
        truncate table trends;
        truncate table trends_uint;
         

Guess you like

Origin www.cnblogs.com/xuefy/p/11447956.html