Statistical information record table|A comprehensive understanding of the mysql system library

In the last issue of "Database Object Information Record Table | A Comprehensive Understanding of the MySQL System Library" , we introduced the metadata record table in the mysql system library in detail. In this issue, we will bring you the fourth part of the series "Statistics Record Table|A comprehensive understanding of mysql system library", please follow us to start the system learning journey of mysql system library.

1 | Overview of Statistics

How to configure statistics persistence optimization.

  • The persistent statistics function is to store the statistics in the memory to the total disk, so that it can quickly re-read these statistics when the database restarts without re-executing the statistics, so that the query optimizer can use these persistent statistics Select the execution plan accurately (if there is no such persistent statistical information, then the statistical information in the memory will be lost after the database is restarted. The next time you access a table in a library, the statistical information needs to be recalculated, and the recalculation may be Because of the difference in estimated values, the query plan changes, which may lead to changes in query performance.) What if the persistence function of statistical information is enabled? When innodb_stats_persistent = ON or the table creation option STATS_PERSISTENT = 1 is used when the table is built, it means that the persistence function of statistics is turned on (note that the latter means that only the statistics of a single table are persisted and regardless of whether the innodb_stats_persistent parameter is enabled, the former Represents to enable the persistence of statistics for all tables globally. The innodb_stats_persistent system variable is enabled by default. If you want to turn off the persistent statistics function of a table individually, you can modify it through the statement ALTER TABLE tbl_name STATS_PERSISTENT = 0).

  • Persistent statistics are stored in the mysql.innodb_table_stats and mysql.innodb_index_stats tables. The former stores statistics related to table structure and data rows, and the latter stores statistics related to index values.

How to configure the persistence of statistics to optimize automatic calculations.

  • The innodb_stats_auto_recalc system variable controls whether to enable the automatic calculation of statistical information. It is enabled by default. When the automatic calculation function is enabled, the automatic calculation of statistical information will be triggered when the amount of data in the table changes by more than 10%. If the innodb_stats_auto_recalc variable is not enabled, you can also use the STATS_AUTO_RECALC clause in the CREATE TABLE or ALTER TABLE statement to configure automatic recalculation of statistics for a single table.

  • The automatic recalculation runs in the background, so even if the innodb_stats_auto_recalc system variable is enabled, the statistics may not be recalculated immediately after the DML operation of the data in the table exceeds 10%. In some cases, it may be delayed for a few seconds, if necessary If the statistics are accurate, you can manually execute the ANALYZE TABLE statement to ensure the accuracy of the optimizer statistics.

  • When a new index is added to a table, regardless of the value of the system parameter innodb_stats_auto_recalc, it will trigger the recalculation of index statistics and add them to the innodb_index_stats table. But it should be noted that what is mentioned here is to trigger the recalculation of index statistics, not the statistics related to the table and its data in the table mysql.innodb_table_stats. If you want to add the index, the statistics related to the data will be updated to mysql at the same time. In the innodb_table_stats table, you need to enable the system variable innodb_stats_auto_recalc or modify the innodb_stats_auto_recalc option of the table, or execute the ANALYZE TABLE statement on the table.

How to configure the persistence optimization of statistics for a single table.

  • innodb_stats_persistent, innodb_stats_auto_recalc and innodb_stats_persistent_sample_pages are global system variables. If you need to ignore the value of global variables and separately specify whether a table needs to be configured with persistent statistics, you can use the table creation options (STATS_PERSISTENT, STATS_AUTO_RECALC and STATS_SAMPLE_PAGES clauses) to override the value set by the system variable to create a table Options can be specified in the CREATE TABLE or ALTER TABLE statement.

    * STATS_PERSISTENT: Specify whether to enable persistent statistics for InnoDB tables. If not set, the default is DEFAULT, which means that the persistent statistics function setting of the table is determined by the innodb_stats_persistent system variable. If it is set to 1, it means that the persistent statistics of the table is enabled, and if it is set, it means that the persistent statistics function of this table is turned off. If the persistent statistics function is enabled through the CREATE TABLE or ALTER TABLE statement, the ANALYZE TABLE statement will be called to calculate the statistics after the representative data is loaded into the table.

    * STATS_AUTO_RECALC: Specify whether to automatically recalculate the persistent statistics of the InnoDB table. The default value is DEFAULT, which means that the recalculation function of the persistent statistics of the table is determined by the value of the system variable innodb_stats_auto_recalc. When set to 1, it means the automatic recalculation function is enabled. After activation, statistics will be recalculated when 10% of the data in the table changes. When set to 0, it means that the automatic recalculation of statistics for the table is turned off. It should be noted that if the data of the table has undergone major changes after closing, please manually execute the ANALYZE TABLE statement to recalculate the statistics. Otherwise, the execution plan may be inaccurate due to inaccurate statistical information.

    * STATS_SAMPLE_PAGES: Set the number of index pages to be sampled when estimating the cardinality of the index column and other statistical data (for example: the number of sampling pages required for ANALYZE TABLE calculation).

  • The following are examples of the use of these three table creation options:

CREATE TABLE `t1` (
`id` int(8) NOT NULL auto_increment,
`data` varchar(255),
`date` datetime,
PRIMARY KEY (`id`),
INDEX `DATE_IX` (`date`)
) ENGINE=InnoDB,
  STATS_PERSISTENT=1,
  STATS_AUTO_RECALC=1,
  STATS_SAMPLE_PAGES=25;

How to configure the number of sampling pages for InnoDB optimizer statistics.

  • The MySQL query optimizer uses statistical information about the key value of the index to calculate the index selectivity, and selects the index of the execution plan based on the selectivity. So how did these statistics come from? For example: When performing operations such as ANALYZE TABLE, InnoDB will extract random pages from each index in the table to estimate the cardinality of the index. (This technique is called random sampling). The number of sample pages is set by the system parameter innodb_stats_persistent_sample_pages, and the default is 20, which is a dynamic variable. Normally, there is no need to modify it. Increasing the setting of this variable may result in a longer sampling time (because more pages need to be read), but if the default sampling number is determined to cause inaccurate index statistics, you can try to gradually Increase the value of this system variable until you have sufficiently accurate statistics. Whether the statistics are accurate can be checked by comparing the value returned by SELECT DISTINCT(index_name) with the estimated value provided in the mysql.innodb_index_stats persistent statistics table.

How to configure to include deleted marked records in the calculation of persistent statistics.

  • By default, InnoDB reads uncommitted data when calculating statistics. For uncommitted transactions that delete rows from the table, InnoDB ignores these deleted records when estimating row and index statistics, so this may lead to the execution plan of other transactions that perform parallel queries on the table Not precise. To avoid this, you can enable the system parameter innodb_stats_include_delete_marked to ensure that InnoDB includes records marked for deletion when calculating persistent statistics. When innodb_stats_include_delete_marked is enabled, the records marked for deletion will be counted when the ANALYZE TABLE statement is executed. It should be noted that: innodb_stats_include_delete_marked is a global variable, and a table cannot be set separately. Innodb_stats_include_delete_marked was introduced in MySQL 5.7.16.

Persistence of statistics depends on the tables innodb_table_stats and innodb_index_stats under the mysql database, which are automatically set during installation, upgrade, and source code construction.

  • The innodb_table_stats and innodb_index_stats tables both contain the last_update column, which indicates the time when InnoDB last updated index statistics.

  • The innodb_table_stats and innodb_index_stats tables are ordinary tables and can be updated manually. Through the function of manually updating statistics, you can enforce specific query optimization plans or test alternative plans without modifying the database. Note: If you manually update the statistics, you need to execute the FLUSH TABLE tbl_name command to make MySQL reload the updated statistics.

  • Persistent statistics are considered local information because they are related to the instance itself. Therefore, the automatic statistics data changes of the innodb_table_stats and innodb_index_stats tables will not be replicated between the primary and standby architectures. But if the ANALYZE TABLE statement is manually executed to trigger the recalculation of statistics, the ANALYZE TABLE statement itself will be replicated between the primary and standby architectures to start the synchronization recalculation operation of the statistics in the standby database (unless it is set during the main database operation Statements such as set sql_log_bin=0 turn off logging).

2 | Detailed statistical information table

2.1. innodb_table_stats

This table provides statistical information related to the query table data.

The following is the information stored in the table.

root@localhost : test 08:00:46> use mysql
Database changed
root@localhost : mysql 08:01:30> select * from innodb_table_stats where table_name='test'\G
*************************** 1. row ***************************
           database_name: test
              table_name: test
             last_update: 2018-05-24 20:00:50
                  n_rows: 6
    clustered_index_size: 1
sum_of_other_index_sizes: 2
1 row in set (0.00 sec)

Table field meaning.

  • database_name: database name.

  • table_name: table name, partition name or sub-partition name.

  • last_update: Indicates the timestamp of the last time InnoDB updated this statistics row.

  • n_rows: The number of estimated data record rows in the table.

  • clustered_index_size: The size of the primary key index, the estimated value in page units.

  • sum_of_other_index_sizes: The total size of other (non-primary key) indexes, the estimated value in pages.

2.2. innodb_index_stats

This table provides statistical information related to query indexes.

The following is the information stored in the table.

root@localhost : mysql 08:01:34> select * from innodb_index_stats where table_name='test';
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update | stat_name | stat_value | 
sample_size | stat_description |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| test | test | PRIMARY | 2018-05-24 20:00:50 | n_diff_pfx01 | 5 | 1 | a |
| test | test | PRIMARY | 2018-05-24 20:00:50 | n_diff_pfx02 | 6 | 1 | a,b |
| test | test | PRIMARY | 2018-05-24 20:00:50 | n_leaf_pages | 1 | NULL | Number of leaf pages in the index |
| test | test | PRIMARY | 2018-05-24 20:00:50 | size | 1 | NULL | Number of pages in the index |
| test | test | i1 | 2018-05-24 20:00:50 | n_diff_pfx01 | 5 | 1 | c |
| test | test | i1 | 2018-05-24 20:00:50 | n_diff_pfx02 | 5 | 1 | c,d |
| test | test | i1 | 2018-05-24 20:00:50 | n_diff_pfx03 | 6 | 1 | c,d,a |
| test | test | i1 | 2018-05-24 20:00:50 | n_diff_pfx04 | 6 | 1 | c,d,a,b |
| test | test | i1 | 2018-05-24 20:00:50 | n_leaf_pages | 1 | NULL | Number of leaf pages in the index |
| test | test | i1 | 2018-05-24 20:00:50 | size | 1 | NULL | Number of pages in the index |
| test | test | i2uniq | 2018-05-24 20:00:50 | n_diff_pfx01 | 6 | 1 | e |
| test | test | i2uniq | 2018-05-24 20:00:50 | n_diff_pfx02 | 6 | 1 | e,f |
| test | test | i2uniq | 2018-05-24 20:00:50 | n_leaf_pages | 1 | NULL | Number of leaf pages in the index |
| test | test | i2uniq | 2018-05-24 20:00:50 | size | 1 | NULL | Number of pages in the index |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
14 rows in set (0.00 sec)

Table field meaning.

  • database_name: database name.

  • table_name: table name, partition table name, sub-partition table name.

  • index_name: Index name.

  • last_update: Indicates the timestamp of the last time InnoDB updated this statistics row.

  • stat_name: The name of the statistical information, and the corresponding statistical information value is stored in the stat_value column.

  • stat_value: Save the statistical information value corresponding to the stat_name column of the statistical information name.

  • sample_size: The number of sampling pages for the estimated statistical information provided in the stat_value column.

  • stat_description: The descriptive information of the statistical information specified in the stat_name column of the statistical information name.

From the data obtained from the query in the table, we can see:

  • The stat_name column has the following statistical values.

    * size: When stat_name is the size value, the stat_value column value represents the total number of pages in the index.

    * n_leaf_pages: When stat_name is n_leaf_pages value, stat_value column value shows the number of index leaf pages.

    * n_diff_pfxNN: NN represents a number (for example: 01, 02, etc.). When stat_name is n_diff_pfxNN, the stat_value column value shows the first column of the index (that is, the first index column of the index, starting from the first column of the index definition order) column The number of unique values, for example: when NN is 01, the stat_value column value represents the number of unique values ​​in the first column of the index. When NN is 02, the stat_value column value represents the combination of the first and second columns of the index. The number of unique values, and so on. In addition, in the case of stat_name = n_diff_pfxNN, the stat_description column displays a comma-separated list of columns for calculating index statistics.

  • From the description information "a, b" of the stat_description column of the PRIMARY data row with index_name, we can see that the statistical information column of the primary key index is actually equal to the number of defined index columns.

  • From the description information "e, f" of the stat_description column where index_name is the i2uniq data row, we can see that the statistical information column of the unique index is actually equal to the number of defined index columns.

  • From the description information "c, d, a, b" of the stat_description column of the index_name of the i1 data row, we can see that the statistical information column of the ordinary index (non-unique auxiliary index) is actually in addition to the defined index column , Also contains the primary key column. That is, for the statistical information recorded by the non-unique index in the table, InnoDB will append the primary key column.

  • Note: The persistent statistics sampling page defined by the system variable innodb_stats_persistent_sample_pages in MySQL 5.7 is 20. The sample_size column value in the example here is 1 because the amount of data in the table is too small and it is enough to store in one page, so the actual sampling is only 1 page is used, if the amount of data is large enough, the value displayed here will be the value specified by the innodb_stats_persistent_sample_pages system variable.

PS: We can use the number of index information pages in the table combined with the value of the system variable innodb_page_size to calculate the data size of the index, as follows

root@localhost : mysql 08:31:14> SELECT SUM(stat_value) pages, index_name, 
SUM(stat_value)*@@innodb_page_size size FROM mysql.innodb_index_stats WHERE 
table_name='dept_emp' AND stat_name = 'size' GROUP BY index_name;                         
+-------+------------+----------+
| pages | index_name | size |
+-------+------------+----------+
| 737 | PRIMARY | 12075008 |
| 353 | dept_no | 5783552 |
| 353 | emp_no | 5783552 |
+-------+------------+----------+
3 rows in set (0.01 sec) 

The content of this issue is introduced here, and the reference link for the content of this issue is as follows:

https://dev.mysql.com/doc/refman/5.7/en/innodb-persistent-stats.html

https://dev.mysql.com/doc/refman/5.7/en/innodb-persistent-stats.html

"Climbing over this mountain, you can see a piece of sea!". Keep reading our "A Comprehensive Understanding of the MySQL System Library" series of articles to share, and you can learn it systematically. Thank you for reading, we will see you in the next issue!

| About the author

Luo Xiaobo·ScaleFlux Database Technology Expert

One of the authors of "A Thousand Gold Recipes-MySQL Performance Optimization Pyramid Rule", "Data Ecology: MySQL Replication Technology and Production Practice".

Familiar with MySQL architecture, good at overall database tuning, like to specialize in open source technology, and keen on the promotion of open source technology, have done many public database topic sharing online and offline, and published nearly 100 database-related research articles.

The full text is over.

Enjoy MySQL :)

Teacher Ye's "MySQL Core Optimization" class has been upgraded to MySQL 8.0, scan the code to start the journey of MySQL 8.0 practice

Guess you like

Origin blog.csdn.net/n88Lpo/article/details/110507419