"Note" MySQL combat 45 say - practice articles (five)

MySQL Join

  • Background: t1, t2 same table structure (field ID, A, B -> a PRIMARY KEY ( id), KEY a( a)
  • Index Nested-Loop Join
    • 栗子:select * from t1 straight_join t2 on (t1.a=t2.a);

      • straight_join let MySQL query is executed using a fixed connection (that is, where will drive table t1
    • Execution flow Index Nested-Loop Join Algorithms

      img

      • Reads from table t1 row data R

      • From the data in row R, taken to a lookup table field going t2

      • Remove a portion of table t2 qualifying rows, with R form a row, as the result set

      • Repeat steps 1 through 3, the end of the cycle until the end of table t1

      • This process is similar to what we just write a nested query the program, and can be used for the index driven table

      • In the process join statement is executed, the drive table is to take full table scan, the table is being driven away tree search

    • in conclusion

      • Use the join statement, forced performance than single-table is split into a plurality of performance better execute SQL statements
      • If you use a join statement, then, need to do a small table-driven table
  • Simple Nested-Loop Join
    • 栗子:select * from t1 straight_join t2 on (t1.a=t2.b);
    • B Because the field is not indexed table t2, each time t2 to the time to match, have to do a full table scan
      • This calculation, the SQL requests will scan table t2 up to 100 times, a total of 100 scanning lines * 1000 = 100,000
    • When this scenario, MySQL does not select this algorithm, choosing instead to Block Nested-Loop Join Algorithms
  • Block Nested-Loop Join
    • Execution flow Block Nested-Loop Join Algorithms

      img

      • The table t1 data read into memory join_buffer thread, because we write this statement is select *, and therefore the entire table into memory t1

      • T2 scan table, each row in table t2 taken out, made in comparison with the data join_buffer meet the join condition, as part of the result set returned

      • If required t1 read data is too large, join_buffer disposable can not fit

        • join_buffer join_buffer_size size parameter is set, the default value is 256k

          • The more time can be placed in rows, the number of segments into the less, the less number of times of the full table scan driving table
        • If the data does not fit all the words table t1, the strategy is simple, that is segmented put

          img

          • After screening the first match, empty join_buffer, the former operation is repeated until a match-End
    • in conclusion

      • From the time complexity, two algorithms are the same; however, the latter are determined memory operations much faster speed, better performance
      • In join_buffer_size not big enough (this is more common), you should choose a small table to do the driving table
        • In join_buffer_size large enough to select the two tables are the same
  • Can not use the Join, if used, with which tables used to drive the table?
    • If you can use the index driven table, join the statement still has its advantages
    • You can not use the index driven table, only use Block Nested-Loop Join algorithm, such statements will try not to use
    • When using join, it should do so that small table-driven table
      • Small table: After filtration query data amount, i.e., calculate the total amount of data involved in each field of the join, a small amount of data goes

Join Optimization

  • Multi-Range Read optimization

    • Objective: to make use of sequence read the disk

    • MRR execution process

      img

      • The index a, is positioned to satisfy the condition of the recording, in the id value into read_rnd_buffer

      • Read_rnd_buffer will be in ascending order id

        • Read_rnd_buffer_size controlled by the size parameter, if full use is still "segmentation" approach
      • id sorted array, sequentially id to the primary key index to search records, and returned as a result

      • For stable use of MMR optimization is necessary to provide set optimizer_switch = "mrr_cost_based = off"

        • Official documents say, is now optimizer strategy, determine when consumed, will be more inclined not to use MRR
      • After explain # Extra fields more Using MRR, MRR represents the spend optimization

        • After using the MRR, the results obtained are set according to ascending order of the primary key id
    • MRR can enhance the performance of the core

      • According to post-secondary index leaf node to get the bulk of the primary key, sort these primary key, and finally back to the table query batch
      • When nnoDB check records directly read the whole page, and then locate the required rows in the page (similar to the characteristics of an array of pre-reading
  • Batched Key Access

    • MySQL version 5.6 after the beginning of the introduction of Batched Key Access (BKA) algorithm

    • Batched Key Access Process

      img

      • Old NLJ algorithm from the driving table t1, the value of a taken line by line, to the drive table t2 do is join
      • BKA-time algorithm is taken into join_buffer plurality of values, the match characteristics associated with the use of MRR
      • Similarly, if join_buffer not big enough, still using the "segmented" approach
    • 使用BKA :set optimizer_switch=‘mrr=on,mrr_cost_based=off,batched_key_access=on’;

      • The first two parameters is to enable the MRR. The reason for this is that the optimization algorithm depends on the BKA MRR
  • BNL algorithm performance issues

    • Multiple scans may be driven table, taking up disk IO resources
      • The impact is only temporary, after the statement is executed, the impact on the IO will be over
    • Conditions need to be performed to determine join M * N times contrast, if a large table will take up a lot of CPU resources
    • It may result in data Buffer Pool heat is eliminated, affecting memory hit rate
      • If a sheet is driven Datasheet large cold, cold table and the plurality of scans when performed more than 1 second
        • The amount of data is less than the entire cold Buffer Pool table 3/8, the cold table data page is moved LRU list head
        • If the cold table is large, there will be a normal business pages of data access, no access to young area
      • Impact on the Buffer Pool is sustained, need to rely on subsequent queries to slowly recover memory hit rate
  • BNL 转 BKA

    • The general idea: Let's join statement can spend on the index driven table to trigger BKA algorithm to improve query performance

    • Is built directly on the drive index table, then it can be directly converted into BKA algorithm (and if the service requires the use of non-low

    • Create a temporary table with indexes

      • General idea

        • Data table t2 satisfy the condition on the temporary table tmp_t
        • In order to join BKA use algorithms to temporary table tmp_t field b plus index
        • Let t1 table and tmp_t do join operations
      • Implementation of the results

        img

        • The sum of the execution time of the process less than one second, compared to the preceding 1 minute 11 seconds, performance has improved tremendously
  • Extended -hash join

    • No need to replace the array with hash tables, queries matching efficiency improved significantly
    • One reason for the MySQL optimizer and executor has been criticized: does not support a hash join
    • MySQL official roadmap, also has yet to put this optimization rafts agenda

The same name as a temporary table

  • Difference in memory temporary tables and table concept

    • Memory table, refers to the use of table Memory engine, built table syntax create table ... engine = memory
      • This data tables are stored in memory, when the system is restarted will be cleared, but still table structure
    • Temporary tables, you can use a variety of engine types (if using Memory engine, the characteristics of the above
      • If you are using InnoDB engine or temporary tables MyISAM engine, when the write data is written to disk
  • Characteristics of a temporary table

    img

    • Temporary table has the following characteristics in the use
      • Built table syntax is create temporary table ...
      • A temporary table is created it can only access the session, not visible to other threads (automatically deleted at the end of session
      • Temporary table with the same name can ordinary table
      • There are temporary tables with the same name and in the ordinary session A time table, show create statements, and CRUD statements to access the temporary table
      • show tables command does not display a temporary table
    • Temporary table is particularly suitable for the beginning of the article, join us optimize this scene
      • Temporary table is different session of the same name (can support multiple session simultaneously perform join optimization scenarios
      • You do not need to worry about data deletion problem (automatic recovery at the end of session
  • Application of temporary tables

    • Do not worry because the same name conflicts between threads, temporary tables are often used in the optimization process complex queries in

      • Among them, the sub-library sub-table cross-database query system is a typical usage scenario
    • Sub-library sub-table sketch

      • Usually the scene sub-library sub-table, the table is to take a large logical distributed to different database instances

        img

      • In this architecture, the partition key choice is to "reduce cross-database and cross-table queries" as the basis of

        • For every query can use when in the partition field conditions, which are sub-library sub-table programs most popular form of the statement
      • When the conditions for which the query is not used in the partition field Solutions scene f

        • The first idea is that in the process of proxy layer to achieve the sort code

          • The advantage of this embodiment is the processing speed, the data sub-library after get directly involved in the calculation in the memory
          • Shortcoming
            • Development effort required is relatively large (if they are complex to operate, develop the ability to require a higher intermediate layer
            • Pressure on the proxy side is relatively large, in particular, is a problem not enough memory and CPU bottlenecks prone
        • Another idea is, the summary of each sub-library to get a MySQL data table example of a logic do

          • Schematic cross-database query process

            img

          • In practice, we often find that the amount of computation for each sub-library is not saturated

          • Therefore, the temporary table temp_ht directly onto the sub-library 32 one

  • Why can the same name as a temporary table

    • 栗子:create temporary table temp_t(id int primary key)engine=innodb;
    • MySQL InnoDB tables give this to create a frm save the table structure definition file, but also a place to store the table data
      • This frm file in a temporary file directory (select @@ tmpdir command to display the temporary file directory of the instance
        • Filename suffix is .frm, the prefix is "#sql {process id} {thread id} serial number
      • For storage of data in the table way, with a different approach in the different versions of MySQL
        • In version 5.6 and earlier, MySQL creates a temporary file in the same directory prefix, suffix to .ibd files, used to store data files
        • From the start version 5.7, MySQL introduced a temporary table space, designed to store temporary data files
    • When maintenance MySQL data table memory which also have a mechanism to distinguish different tables, each table corresponding to a table_def_key
      • A value table_def_key of ordinary table is from the "library name table name +" get
      • For temporary tables, table_def_key in the "library name + table name" basis, but also joined the "server_id + thread_id"
        • Temporary table t1 that is created in a different session, they have different table_def_key, disk file name is different
    • In the realization that each thread maintains its own list of temporary table
      • When operating in the session table, it will traverse the list first, if not the existence of ordinary operating table again
      • At the end of each session of the list in a temporary table, perform "DROP TEMPORARY TABLE + table name" Operation
  • Temporary tables and primary and replicate

    • Problem: they can only access the temporary table in the thread, DROP TEMPORARY TABLE why should there be written binlog
      • If you do not record the temporary table operations, the backup in the implementation insert into t_normal select * from temp_t will complain
    • Different formats binlog row, standby operation of the synchronization replication strategy temporary table
      • binlog a row format, the recording data corresponding to the operation (write_row event record insert a row (1,1)
      • Only binlog_format = statment / mixed time, binlog the temporary table before the recording operation
        • Will automatically delete the temporary table when the main library exit, but the standby database synchronization thread is continued in operation, all need to write delete operations
    • Another problem: DROP TABLE t_normal/ * Generated by Server * / (Why synchronize wanted to change the standard format
      • drop table command can delete more than one table, if binlog_format = row, may cause the thread to stop synchronization
        • Because there is no table temp_t on library equipment, drop table command binlog record time, it is necessary to rewrite the statement to make
    • The next question: the main library of the same name in different threads to create a temporary table is okay, but spread to the standby database to perform is how to deal
      • MySQL binlog at the time of recording, the main library will execute this statement written binlog thread id in
      • With this thread id line standby application library constructed temporary table table_def_key
        • session A temporary table t1, the library is prepared: library name + t1 + "M of serverid" + "thread_id session A's";
        • session B temporary table t1, the library is prepared: library name + t1 + "M of serverid" + "session B of thread_id";
      • Due to the different table_def_key, so the two tables in the application threads library which is not prepared conflict

Memory temporary table

  • union execution process

    • 栗子:(select 1000 as f) union (select id from t1 order by id desc limit 2);

    • union execution process

      img

      • The second sub-query execution, to get the required check uniqueness constraint when trying to insert a temporary table data
      • Temporary memory table in this scenario acts as a temporary data, and by checking the temporary table's primary key uniqueness constraint
    • After union all substituted Union, no longer dependent on the temporary table, the results obtained directly as part of the result set returned to the client

  • group by execution process

    • 栗子:select id%10 as m, count(*) as c from t1 group by m;

    • explain the results of the group by

      img

      • Using index, indicates that the statement uses covered index, select the index a, do not need back to the table;
      • Using temporary, indicate the use of a temporary table;
      • Using filesort, expressed the need for the sort;
    • group by execution process

      img

      • Create a temporary table memory, and the table has two fields C m, m is the primary key;
      • T1 scan table index a, id values ​​taken successively on the leaf node, the calculation result id% 10, denoted by x;
        • If the temporary table is not the primary key for the line x, a record is inserted (x,. 1);
        • If the table has a primary key of the row of x, x C will increase the value of the line 1;
      • After the traversal is complete, then sorted according to fields do m, result set returned to the client
    • If demand does not need to sort the results, you can increase the order by null at the end of the SQL statement

    • Memory temporary table size is limited, tmp_table_size parameters that control the memory size, the default is 16M

      • After the threshold is exceeded, the temporary memory table into a temporary table disk, disk temporary tables are InnoDB engine used by default
  • group by optimization method - index

    • Why perform group by statement needs a temporary table
      • Group by the logic semantics, statistics of the number of different values ​​appearing
      • As a result of each row id% 100 are unordered, so we need to have a temporary table to records and statistics
    • If you can ensure that the data input is ordered, then the calculated group by the time you only need to scan from left to right order
      • In MySQL 5.7 release supports the generated column mechanism used to implement column data associated update
        • 栗子:alter table t1 add column z int generated always as(id % 100), add index(z);
      • Using the index z, can achieve no temporary file does not need to sort and tables (to sequentially read the index z
  • Optimization group by Method - Direct Sorting

    • Background: If you do not run into the scene and create an index for the amount of data required on temporary tables particularly large place

      • Original process: first memory into a temporary table, insert some data, found that not enough memory temporary table then into a temporary table disk
      • Optimization: Optimization told by SQL_BIG_RESULT this hint (hint), a large number, go directly to disk temporary tables
        • Disk temporary table is a B + tree storage, the storage array is more efficient than high (considering the disk space it directly with an array of deposit
    • Performed using a flowchart SQL_BIG_RESULT

      img

      • Sort_buffer initialization, into a determined integer field, denoted as m;
      • T1 scan table index a, which is successively extracted id value, the value of id% 100 stored in sort_buffer;
      • After the scan is completed, the field m sort_buffer do sort (if enough memory, temporary files using Disk Secondary Sort
      • After sequencing is complete, you get an ordered array
      • According to an ordered array, which array to obtain different values, and the number of occurrences of each value, the result set is returned to the assembly
  • MySQL when it will use internal temporary tables

    • If the process can execute the statement while reading the data, while the results directly, do not need additional memory

      • Otherwise you will need additional memory to store intermediate results
    • join_buffer is disordered array, sort_buffer is an ordered array, a temporary table is a two-dimensional table structure

    • If you do need to use two-dimensional table logic characteristics, it will give priority to the use of temporary table

      • union need to use a unique constraint index, group by further additional need to use a field to save accumulated count
  • Guidelines for the use of the group by

    • If the result is not sorted group by statement of requirements, in order to increase end of the statement by null;
    • Try to make the process of using the table above group by the index (explain the results, there are no Using temporary and Using filesort
    • If the amount of data required statistical group by little, try to use only memory temporary tables
      • By appropriate transfer large tmp_table_size parameters, to avoid the temporary table to use Disk
    • If the amount of data is too great, use SQL_BIG_RESULT tips (directly tell the optimizer to use sorting algorithm to get the result value

Affect memory table

  • The main reason to use the memory table in the production environment is not recommended

    • Lock granularity problem
    • Data persistence problem
  • Memory table data organization structures

    • Memory and data organization InnoDB engine is different
      • InnoDB 引擎把数据放在主键索引上,其他索引上保存的是主键 id,称为索引组织表
      • Memory 引擎采用的是把数据单独存放,索引上保存数据位置的数据组织形式,称为堆组织表
    • 两个引擎的一些典型不同
      • InnoDB 表的数据总是有序存放的,而内存表的数据就是按照写入顺序存放的
      • 当数据文件有空洞的时候,InnoDB 表在插入新数据的时候,为了保证数据有序性,只能在固定的位置写入新值,而内存表找到空位就可以插入新值
      • 数据位置发生变化的时候,InnoDB 表只需要修改主键索引,而内存表需要修改所有索引
      • InnoDB 表用主键索引查询时需要走一次索引查找,用普通索引查询的时候,需要走两次索引查找。而内存表没有这个区别,所有索引的“地位”都是相同的。
      • InnoDB 支持变长数据类型,不同记录的长度可能不同;内存表不支持 Blob 和 Text 字段,并且即使定义了 varchar(N),实际也当作 char(N),也就是固定长度字符串来存储,因此内存表的每行数据长度相同
  • hash 索引和 B-Tree 索引

    • 内存表不仅支持 hash 索引同时也支持 B-Tree 索引
      • 栗子:alter table t1 add index a_btree_index using btree (id);
    • 内存表的优势是速度快,其中的一个原因就是 Memory 引擎支持 hash 索引
      • 更重要的原因是,内存表的所有数据都保存在内存,而内存的读写速度总是比磁盘快
  • 内存表的锁

    • 内存表不支持行锁,只支持表锁(即一个表内数据只支持串休更新
      • 这里的表锁区别于 MDL 锁,是用来锁表内数据的(前者为表结构
    • 跟行锁比起来,表锁对并发访问的支持不够好
      • 内存表的锁粒度问题,决定了它在处理并发事务的时候,性能也不会太好
  • 数据持久性问题

    • 数据放在内存中是内存表的优势,但也是一个劣势(数据库重启的时候所有的内存表都会被清空

    • 由于 MySQL 担心主库重启之后,出现主备不一致,MySQL 在实现上做了这样一件事儿

      • 在数据库重启之后,往 binlog 里面写入一行 DELETE FROM t1

      • 如果此时使用双 M 结构的话

        img

        • 备库重启的时候,备库 binlog 里的 delete 语句就会传到主库(主库内存表的内容删除
        • 主库再使用的时候就会发现,内存表数据突然被清空….
    • 内存表并不适合在生产环境上作为普通数据表使用

      • 如何看待 “内存表执行速度快” 这个观点
        • 如果你的表更新量大,那么并发度是一个很重要的参考指标(行锁 >> 表锁
        • 能放到内存表的数据量都不大,而且 InnoDB 有 InnoDB Buffer Pool 作读性能保障
      • 建议把普通内存表都用 InnoDB 表来代替
        • 例外:在数据量可控,不会耗费过多内存的情况下,你可以考虑使用内存表

          • 数量量可控时,使用内存临时表的效果更好的原因
            • 相比于 InnoDB 表,使用内存表不需要写磁盘,往表 temp_t 写数据的速度更快
            • 索引 b 使用 hash 索引,查找的速度比 B-Tree 索引快
            • 临时表数据只有 2000 行,占用的内存有限
        • 内存临时表刚好可以无视内存表的两个不足

          • 临时表不会被其他线程访问,没有并发性的问题
          • 临时表重启后也是需要删除的,清空数据这个问题不存在
          • 备库的临时表也不会影响主库的用户线程
Published 98 original articles · won 197 Like · views 60000 +

Guess you like

Origin blog.csdn.net/YangDongChuan1995/article/details/103981677