MySQL monitoring platform to build on tall

Outline

For MySQL monitoring platform, we believe we have achieved a lot together: the monitoring Usagi, as well as secondary development based on zabbix related. I believe that many colleagues should have started playing up. My side of the selection is prometheus + granafa implementation. In short, I am now a production environment using prometheus, there is my daily work to meet the needs granafa. In the introductory overviews and installation, we can refer here: https://blog.51cto.com/cloumn/detail/77

1, first of all look at our monitoring results, mysql master-slave
MySQL monitoring platform to build on tall

2, mysql status:
MySQL monitoring platform to build on tall

MySQL monitoring platform to build on tall

3, buffer pool status:

MySQL monitoring platform to build on tall

exporter related deployment

1, the installation exporter

[root@controller2 opt]# https://github.com/prometheus/mysqld_exporter/releases/download/v0.10.0/mysqld_exporter-0.10.0.linux-amd64.tar.gz
[root@controller2 opt]# tar -xf mysqld_exporter-0.10.0.linux-amd64.tar.gz 

2, add the mysql account:

GRANT SELECT, PROCESS, SUPER, REPLICATION CLIENT, RELOAD ON *.* TO 'exporter'@'%' IDENTIFIED BY 'localhost';
flush privileges;

3, edit the configuration file:

[root@controller2 mysqld_exporter-0.10.0.linux-amd64]# cat /opt/mysqld_exporter-0.10.0.linux-amd64/.my.cnf 
[client]
user=exporter
password=123456

4, set the configuration file:

[root@controller2 mysqld_exporter-0.10.0.linux-amd64]# cat /etc/systemd/system/mysql_exporter.service 
[Unit]
Description=mysql Monitoring System
Documentation=mysql Monitoring System

[Service]
ExecStart=/opt/mysqld_exporter-0.10.0.linux-amd64/mysqld_exporter \
         -collect.info_schema.processlist \
         -collect.info_schema.innodb_tablespaces \
         -collect.info_schema.innodb_metrics  \
         -collect.perf_schema.tableiowaits \
         -collect.perf_schema.indexiowaits \
         -collect.perf_schema.tablelocks \
         -collect.engine_innodb_status \
         -collect.perf_schema.file_events \
         -collect.info_schema.processlist \
         -collect.binlog_size \
         -collect.info_schema.clientstats \
         -collect.perf_schema.eventswaits \
         -config.my-cnf=/opt/mysqld_exporter-0.10.0.linux-amd64/.my.cnf

[Install]
WantedBy=multi-user.target

5, to add the configuration prometheus server

  - job_name: 'mysql'
    static_configs:
     - targets: ['192.168.1.11:9104','192.168.1.12:9104']

6, test to see if a return value:

http://192.168.1.12:9104/metrics

We can check whether the normal pour mysql monitoring already in force by mysql_up, whether starting up

#HELP mysql_up Whether the MySQL server is up.
#TYPE mysql_up gauge
mysql_up 1

Monitoring relevant indicators

In doing any monitoring of a thing, we must always understand what we have to monitor that the index is valid and can be better to monitor our services, in which we can usually mysql to measure the operation by about mysql index: mysql master-slave operation, query throughput, slow query, the number of connections, the spool usage and query execution performance.

Master-slave replication performance indicators:

1, copy from the master monitor thread:

In most cases, many companies are using master-slave replication environment, monitoring the two threads are very important, which we usually do in the mysql command:


MariaDB [(none)]> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.16.1.1
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000045
          Read_Master_Log_Pos: 72904854
               Relay_Log_File: mariadb-relay-bin.000127
                Relay_Log_Pos: 72905142
        Relay_Master_Log_File: mysql-bin.000045
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

# Slave_IO_Running, Slave_SQL_Running two threads so normal that we copy cluster is a healthy state.

MySQLD Exporter in sample data returned from the master to get the health cluster by mysql_slave_status_slave_sql_running.

# HELP mysql_slave_status_slave_sql_running Generic metric from SHOW SLAVE STATUS.
# TYPE mysql_slave_status_slave_sql_running untyped
mysql_slave_status_slave_sql_running{channel_name="",connection_name="",master_host="172.16.1.1",master_uuid=""} 1

2, master-slave replication lag time:

Use show slave status in which there is a critical parameter Seconds_Behind_Master. SQL Seconds_Behind_Master represents the delay between the thread and the IO thread on the slave, we all know that in MySQL replication environment, start with the master on the slave binlog pulled to the local (via IO thread), then by the SQL thread binlog replay, the Seconds_Behind_Master represents the difference in local relaylog not been performed for that part. So if the slave pulled to the local relaylog (actually binlog, but on the slave relaylog used to call it) have been performed, this time through the show slave status to see would be 0

Seconds_Behind_Master: 0

MySQLD Exporter sample data returned in to obtain the relevant status mysql_slave_status_seconds_behind_master.

# HELP mysql_slave_status_seconds_behind_master Generic metric from SHOW SLAVE STATUS.
# TYPE mysql_slave_status_seconds_behind_master untyped
mysql_slave_status_seconds_behind_master{channel_name="",connection_name="",master_host="172.16.1.1",master_uuid=""} 0

Query throughput:

When it comes to throughput, so from that aspect of how we measure it?
In general, we can insert mysql, query, delete, update operations

In order to obtain a certain, there is an internal counter called MySQL Questions (according to the terms MySQL, which is a server state variable), each client sends a query, its value is increased by one. Questions posed by indicators of client-centered perspective is often easier to explain than Queries related counters. As part of storing a program which will calculate the number of executed statements, such as the number and the DEALLOCATE PREPARE PREPARE and operating instructions, as part of a statement of the pretreatment server. You can be queried by the command:

MariaDB [(none)]> SHOW GLOBAL STATUS LIKE "Questions";
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Questions     | 15071 |
+---------------+-------+

MySQLD Exporter sample data returned by mysql_global_status_questions Questions reflect the current counter size:

# HELP mysql_global_status_questions Generic metric from SHOW GLOBAL STATUS.
# TYPE mysql_global_status_questions untyped
mysql_global_status_questions 13253

Of course, because prometheus has a very rich query language, we can be queried by the cumulative growth rate of the counter query circumstances of a short period of time, you can do the relevant threshold alarm processing, such as the case of query about the inquiry within 2 minutes:

rate(mysql_global_status_questions[2m])

Of course, the above is the total amount that we can monitor separately from reading, writing and decomposition of instruction to better understand the workload of the database to find possible bottlenecks. Typically, typically, it fetches the read queries Com_select index, the query is written may increase the value of a variable in one of three states, depending on the specific instructions:

Writes = Com_insert + Com_update + Com_delete

Here we get inserted into the case through the command:

MariaDB [(none)]> SHOW GLOBAL STATUS LIKE "Com_insert";
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Com_insert    | 10578 |
+---------------+-------+

From the monitoring of the sample MySQLD Exporter / metrics returned, the current instance number may be acquired by various types of instruction execution global_status_commands_total:

# HELP mysql_global_status_commands_total Total number of executed MySQL commands.
# TYPE mysql_global_status_commands_total counter
mysql_global_status_commands_total{command="create_trigger"} 0
mysql_global_status_commands_total{command="create_udf"} 0
mysql_global_status_commands_total{command="create_user"} 1
mysql_global_status_commands_total{command="create_view"} 0
mysql_global_status_commands_total{command="dealloc_sql"} 0
mysql_global_status_commands_total{command="delete"} 3369
mysql_global_status_commands_total{command="delete_multi"} 0

Slow query performance

Query performance, slow query is also an important indicator of the query alarms. Slow_queries MySQL also provides a counter, when the execution time of the query exceeds the value long_query_time, the counter is +1, the default value is 10 seconds, the query may be provided in the current long_query_time MySQL by following instructions:

MariaDB [(none)]> SHOW VARIABLES LIKE 'long_query_time';
+-----------------+-----------+
| Variable_name   | Value     |
+-----------------+-----------+
| long_query_time | 10.000000 |
+-----------------+-----------+
1 row in set (0.00 sec)

# Of course, we can also modify time

MariaDB [(none)]> SET GLOBAL long_query_time = 5;
Query OK, 0 rows affected (0.00 sec)

Then we just query the number of MySQL instances Slow_queries by sql language:

MariaDB [(none)]> SHOW GLOBAL STATUS LIKE "Slow_queries";
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Slow_queries  | 0     |
+---------------+-------+
1 row in set (0.00 sec)

MySQLD Exporter sample data returned, showing the current value of index mysql_global_status_slow_queries Slow_queries by:

# HELP mysql_global_status_slow_queries Generic metric from SHOW GLOBAL STATUS.
# TYPE mysql_global_status_slow_queries untyped
mysql_global_status_slow_queries 0

Similarly, according to Prometheus slow query more we can query down his growth in a certain period of time:

rate(mysql_global_status_slow_queries[5m])

Connections Monitoring

Monitoring client connection is very important, because once the available connections is exhausted, new client connections will be rejected. MySQL default number of connections is limited to 151.


MariaDB [(none)]> SHOW VARIABLES LIKE 'max_connections';
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| max_connections | 151   |
+-----------------+-------+

Of course, we can modify the form of a configuration file to increase this value. The corresponding number is currently connected, when we come out of the current connection exceeds the maximum set by the system often will we see Too many connections (Too many connections), I find the following about the current number of connections:

MariaDB [(none)]> SHOW GLOBAL STATUS LIKE "Threads_connected";
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| Threads_connected | 41     |
+-------------------+-------

Of course, mysql also provide Threads_running this indicator, at any time to help you separate processing threads are actively query and although those are available but unused connections.

MariaDB [(none)]> SHOW GLOBAL STATUS LIKE "Threads_running";
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| Threads_running | 10     |
+-----------------+-------+

If the server really reach max_connections limit, it will start to reject new connections. In this case, Connection_errors_max_connections indicators will begin to increase, at the same time, keep track of all failed connection attempts Aborted_connects indicators will start to increase.

Sample data returned MySQLD Exporter:

# HELP mysql_global_variables_max_connections Generic gauge metric from SHOW GLOBAL VARIABLES.
# TYPE mysql_global_variables_max_connections gauge
mysql_global_variables_max_connections 151         

# Indicates the maximum number of connections

# HELP mysql_global_status_threads_connected Generic metric from SHOW GLOBAL STATUS.
# TYPE mysql_global_status_threads_connected untyped
mysql_global_status_threads_connected 41

# Indicates that the current number of connections

# HELP mysql_global_status_threads_running Generic metric from SHOW GLOBAL STATUS.
# TYPE mysql_global_status_threads_running untyped
mysql_global_status_threads_running 1

# Indicates the number of connections currently active

# HELP mysql_global_status_aborted_connects Generic metric from SHOW GLOBAL STATUS.
# TYPE mysql_global_status_aborted_connects untyped
mysql_global_status_aborted_connects 31

# Accumulate all connections

# HELP mysql_global_status_connection_errors_total Total number of MySQL connection errors.
# TYPE mysql_global_status_connection_errors_total counter
mysql_global_status_connection_errors_total{error="internal"} 0
#服务器内部引起的错误、如内存硬盘等
mysql_global_status_connection_errors_total{error="max_connections"} 0
#超出连接处引起的错误

Of course, according to prom expression, we can check the remaining number of connections currently available:

mysql_global_variables_max_connections - mysql_global_status_threads_connected

Queries mysq refuse connections

mysql_global_status_aborted_connects

Pool situation:

MySQL default storage engine InnoDB uses a region of memory called buffer pool for the data cache data table with the index. Pool resources belong indicators index, rather than the performance indicators, the former is more used to investigate (but not detected) performance issues. If the database performance began to decline, and disk I / O in the rising, expanding the pool quite often rebound performance.
In the default setting, the size of the buffer pool is typically relatively small, as 128MiB. However, MySQL can be recommended to expand its dedicated 80% of the size of the database server's physical memory. We can look at:

MariaDB [(none)]> show global variables like 'innodb_buffer_pool_size';
+-------------------------+-----------+
| Variable_name           | Value     |
+-------------------------+-----------+
| innodb_buffer_pool_size | 134217728 |
+-------------------------+-----------+

MySQLD Exporter sample data returned, used to represent mysql_global_variables_innodb_buffer_pool_size.

# HELP mysql_global_variables_innodb_buffer_pool_size Generic gauge metric from SHOW GLOBAL VARIABLES.
# TYPE mysql_global_variables_innodb_buffer_pool_size gauge
mysql_global_variables_innodb_buffer_pool_size 1.34217728e+08

Innodb_buffer_pool_read_requests记录了正常从缓冲池读取数据的请求数量。可以通过以下指令查看

MariaDB [(none)]> SHOW GLOBAL STATUS LIKE "Innodb_buffer_pool_read_requests";
+----------------------------------+-------------+
| Variable_name                    | Value       |
+----------------------------------+-------------+
| Innodb_buffer_pool_read_requests | 38465 |
+----------------------------------+-------------+

MySQLD Exporter sample data returned, used to represent mysql_global_status_innodb_buffer_pool_read_requests.

# HELP mysql_global_status_innodb_buffer_pool_read_requests Generic metric from SHOW GLOBAL STATUS.
# TYPE mysql_global_status_innodb_buffer_pool_read_requests untyped
mysql_global_status_innodb_buffer_pool_read_requests 2.7711547168e+10

When the buffer pool can not be met, MySQL can only read data from the disk. Innodb_buffer_pool_reads i.e. records the number of requests to read data from the disk. In general, data is read from memory faster than reading from disk is much faster, so if the value of Innodb_buffer_pool_reads began to increase database performance could mean there is a problem. You can only see by the number of Innodb_buffer_pool_reads

MariaDB [(none)]> SHOW GLOBAL STATUS LIKE "Innodb_buffer_pool_reads";
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| Innodb_buffer_pool_reads | 138  |
+--------------------------+-------+
1 row in set (0.00 sec)

MySQLD Exporter sample data returned, used to represent mysql_global_status_innodb_buffer_pool_read_requests.

# HELP mysql_global_status_innodb_buffer_pool_reads Generic metric from SHOW GLOBAL STATUS.
# TYPE mysql_global_status_innodb_buffer_pool_reads untyped
mysql_global_status_innodb_buffer_pool_reads 138

Through the above monitoring indicators, and monitoring of the actual scene, we can use PromQL quickly create multiple monitoring entries. You can view the growth rate of two minutes to read the disk growth rates:

rate(mysql_global_status_innodb_buffer_pool_reads[2m])

Template official ID

The above is a simple list of some of the indicators that we, here we use granafa to MySQLD_Exporter add monitoring chart:

  • Main from the main group monitoring (template 7371):
  • Mysql status monitoring related to 7362:
  • Buffer pool 7365 Status:
  • Simple alarm rules

    In addition to the relevant template, no alarm rules so our monitoring is not perfect, listed below at our alarm monitoring rules

groups:
- name: MySQL-rules
  rules:
  - alert: MySQL Status 
    expr: up == 0
    for: 5s 
    labels:
      severity: warning
    annotations:
      summary: "{{$labels.instance}}: MySQL has stop !!!"
      description: "检测MySQL数据库运行状态"

  - alert: MySQL Slave IO Thread Status
    expr: mysql_slave_status_slave_io_running == 0
    for: 5s 
    labels:
      severity: warning
    annotations: 
      summary: "{{$labels.instance}}: MySQL Slave IO Thread has stop !!!"
      description: "检测MySQL主从IO线程运行状态"

  - alert: MySQL Slave SQL Thread Status 
    expr: mysql_slave_status_slave_sql_running == 0
    for: 5s 
    labels:
      severity: warning
    annotations: 
      summary: "{{$labels.instance}}: MySQL Slave SQL Thread has stop !!!"
      description: "检测MySQL主从SQL线程运行状态"

  - alert: MySQL Slave Delay Status 
    expr: mysql_slave_status_sql_delay == 30
    for: 5s 
    labels:
      severity: warning
    annotations: 
      summary: "{{$labels.instance}}: MySQL Slave Delay has more than 30s !!!"
      description: "检测MySQL主从延时状态"

  - alert: Mysql_Too_Many_Connections
    expr: rate(mysql_global_status_threads_connected[5m]) > 200
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: "{{$labels.instance}}: 连接数过多"
      description: "{{$labels.instance}}: 连接数过多,请处理 ,(current value is: {{ $value }})"  

  - alert: Mysql_Too_Many_slow_queries
    expr: rate(mysql_global_status_slow_queries[5m]) > 3
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: "{{$labels.instance}}: 慢查询有点多,请检查处理"
      description: "{{$labels.instance}}: Mysql slow_queries is more than 3 per second ,(current value is: {{ $value }})"

2, add a rule to prometheus:


rule_files:
  - "rules/*.yml" 

3, open the web ui we can see that the rules are in effect:

MySQL monitoring platform to build on tall

to sum up

Monitoring mysql everywhere relevant state has been completed, we can go to improve their monitoring in accordance with mysql more monitoring indicators, of course, this set is what I use on-line environment, you can refer to reference.

Guess you like

Origin blog.51cto.com/xiaoluoge/2476375