Mysql must know the index selection

In daily production, a table may have multiple indexes, so how does mysql determine which index to use when executing SQL, or scan the entire table? When MySQL selects an index, it will judge based on the cost of using the index.

The cost of a sql execution is roughly divided into two parts

  • IO cost, because these pages are all on disk, if you want to judge, you must first load them into memory. MySQL stipulates that the cost of loading a page is 1.0
  • CPU cost, in addition to IO cost, there is also the cost of conditional judgment, that is, CPU cost. For example, in the previous example, you have to judge that the loaded data name = '赵六'symbol does not meet the conditions. MySQL stipulates that the cost of each judged data is 0.2

Full table scan cost calculation. For a full table scan, the cost calculation is roughly as follows. mysql will perform data statistics on the table, this statistics is approximate, not very accurate, you show table status like '表名'can view the statistical data through

For example, how many data rows are there in this table, and the number of bytes data_length occupied by the clustered index, since the default is 16kb, you can calculate the approximate number of data pages (data_length/1024/16). So the cost of full table scan is calculated like this

rows * 0.2 + data_length/1024/16 * 1.0

Secondary index + return table cost calculation

The cost calculation of secondary index + table return is more complicated, and its cost data depends on the number of scanning intervals and the number of table return times. In order to facilitate the description of the scanning interval, here I will take the above picture again

select * from `user` where name = '赵六';

Look at the picture!

The query condition name = '赵六'will generate a scanning interval, scanning from Zhao Liu with id=4 to Zhao Liu with id=6. For another example, assuming that the query condition is name > '赵六', a scanning interval will be generated from Liu Qi with id=7 to the end of data (Wang Jiu with id=9). For another example name < '李四' and name > '赵六', if the query condition is , two scanning intervals will be generated at this time, one is counted from Zhang San with id=2 to Zhang San with id=3, and the other is counted from Liu Qi with id=7 until the end of the data. So the scan interval means the record interval that meets the query conditions. When calculating the cost of the secondary index, mysql stipulates that the cost of reading a range is the same as the IO cost of reading a page, both of which are 1.0. After the intervals are available, it will estimate how many pieces of data are in these intervals based on statistical data, because to read and write these data, the cost of reading is roughly the number of pieces * 0.2. So the cost of going through the secondary index is  区间个数 * 1.0 + 条数 * 0.2. Afterwards, these data need to be returned to the table (if necessary). Mysql stipulates that the IO cost of each return to the table is the same as that of reading a page, which is also 1.0. When returning to the table, it is necessary to judge the remaining query conditions for the data found from the clustered index, which is the CPU cost, which is roughly the number of entries * 0.2

So the cost of returning the table is roughly 条数 * 1.0 + 条数 * 0.2

So the approximate cost of the secondary index + table return is 区间个数 * 1.0 + 条数 * 0.2 + 条数 * 1.0 + 条数 * 0.2

When the cost of the index and the cost of the full table scan are calculated, mysql will choose the index with the lowest cost to execute. Mysql will also fine-tune the above cost calculation results, but the fine-tuning value is very small, so I omitted it here, and here is only a general introduction to the cost calculation rules. The actual situation will be more complicated, such as querying tables, etc., there are Interested partners can refer to relevant information

summary

In general, this section is mainly to let you understand one thing. When MySQL selects indexes, it will calculate the cost of using each index according to statistical data and cost calculation rules, and then choose to use the index with the lowest cost. execute query

The original text comes from Sanyou's java diary

Guess you like

Origin blog.csdn.net/qq_28165595/article/details/131030953