"Note" MySQL combat 45 say - practice articles (five)

MySQL Join

Background: t1, t2 same table structure (field ID, A, B -> a PRIMARY KEY ( id), KEY a( a)
Index Nested-Loop Join
- 栗子：select * from t1 straight_join t2 on (t1.a=t2.a);
  - straight_join let MySQL query is executed using a fixed connection (that is, where will drive table t1
- Execution flow Index Nested-Loop Join Algorithms
  - Reads from table t1 row data R
  - From the data in row R, taken to a lookup table field going t2
  - Remove a portion of table t2 qualifying rows, with R form a row, as the result set
  - Repeat steps 1 through 3, the end of the cycle until the end of table t1
  - This process is similar to what we just write a nested query the program, and can be used for the index driven table
  - In the process join statement is executed, the drive table is to take full table scan, the table is being driven away tree search
- in conclusion
  - Use the join statement, forced performance than single-table is split into a plurality of performance better execute SQL statements
  - If you use a join statement, then, need to do a small table-driven table
Simple Nested-Loop Join
- 栗子：select * from t1 straight_join t2 on (t1.a=t2.b);
- B Because the field is not indexed table t2, each time t2 to the time to match, have to do a full table scan
  - This calculation, the SQL requests will scan table t2 up to 100 times, a total of 100 scanning lines * 1000 = 100,000
- When this scenario, MySQL does not select this algorithm, choosing instead to Block Nested-Loop Join Algorithms
Block Nested-Loop Join
- Execution flow Block Nested-Loop Join Algorithms
  - The table t1 data read into memory join_buffer thread, because we write this statement is select *, and therefore the entire table into memory t1
  - T2 scan table, each row in table t2 taken out, made in comparison with the data join_buffer meet the join condition, as part of the result set returned
  - If required t1 read data is too large, join_buffer disposable can not fit
    - join_buffer join_buffer_size size parameter is set, the default value is 256k
      - The more time can be placed in rows, the number of segments into the less, the less number of times of the full table scan driving table
    - If the data does not fit all the words table t1, the strategy is simple, that is segmented put
      - After screening the first match, empty join_buffer, the former operation is repeated until a match-End
- in conclusion
  - From the time complexity, two algorithms are the same; however, the latter are determined memory operations much faster speed, better performance
  - In join_buffer_size not big enough (this is more common), you should choose a small table to do the driving table
    - In join_buffer_size large enough to select the two tables are the same
Can not use the Join, if used, with which tables used to drive the table?
- If you can use the index driven table, join the statement still has its advantages
- You can not use the index driven table, only use Block Nested-Loop Join algorithm, such statements will try not to use
- When using join, it should do so that small table-driven table
  - Small table: After filtration query data amount, i.e., calculate the total amount of data involved in each field of the join, a small amount of data goes

Join Optimization

Multi-Range Read optimization
- Objective: to make use of sequence read the disk
- MRR execution process
  - The index a, is positioned to satisfy the condition of the recording, in the id value into read_rnd_buffer
  - Read_rnd_buffer will be in ascending order id
    - Read_rnd_buffer_size controlled by the size parameter, if full use is still "segmentation" approach
  - id sorted array, sequentially id to the primary key index to search records, and returned as a result
  - For stable use of MMR optimization is necessary to provide set optimizer_switch = "mrr_cost_based = off"
    - Official documents say, is now optimizer strategy, determine when consumed, will be more inclined not to use MRR
  - After explain # Extra fields more Using MRR, MRR represents the spend optimization
    - After using the MRR, the results obtained are set according to ascending order of the primary key id
- MRR can enhance the performance of the core
  - According to post-secondary index leaf node to get the bulk of the primary key, sort these primary key, and finally back to the table query batch
  - When nnoDB check records directly read the whole page, and then locate the required rows in the page (similar to the characteristics of an array of pre-reading
Batched Key Access
- MySQL version 5.6 after the beginning of the introduction of Batched Key Access (BKA) algorithm
- Batched Key Access Process
  - Old NLJ algorithm from the driving table t1, the value of a taken line by line, to the drive table t2 do is join
  - BKA-time algorithm is taken into join_buffer plurality of values, the match characteristics associated with the use of MRR
  - Similarly, if join_buffer not big enough, still using the "segmented" approach
- 使用BKA ：set optimizer_switch=‘mrr=on,mrr_cost_based=off,batched_key_access=on’;
  - The first two parameters is to enable the MRR. The reason for this is that the optimization algorithm depends on the BKA MRR
BNL algorithm performance issues
- Multiple scans may be driven table, taking up disk IO resources
  - The impact is only temporary, after the statement is executed, the impact on the IO will be over
- Conditions need to be performed to determine join M * N times contrast, if a large table will take up a lot of CPU resources
- It may result in data Buffer Pool heat is eliminated, affecting memory hit rate
  - If a sheet is driven Datasheet large cold, cold table and the plurality of scans when performed more than 1 second
    - The amount of data is less than the entire cold Buffer Pool table 3/8, the cold table data page is moved LRU list head
    - If the cold table is large, there will be a normal business pages of data access, no access to young area
  - Impact on the Buffer Pool is sustained, need to rely on subsequent queries to slowly recover memory hit rate
BNL 转 BKA
- The general idea: Let's join statement can spend on the index driven table to trigger BKA algorithm to improve query performance
- Is built directly on the drive index table, then it can be directly converted into BKA algorithm (and if the service requires the use of non-low
- Create a temporary table with indexes
  - General idea
    - Data table t2 satisfy the condition on the temporary table tmp_t
    - In order to join BKA use algorithms to temporary table tmp_t field b plus index
    - Let t1 table and tmp_t do join operations
  - Implementation of the results
    - The sum of the execution time of the process less than one second, compared to the preceding 1 minute 11 seconds, performance has improved tremendously
Extended -hash join
- No need to replace the array with hash tables, queries matching efficiency improved significantly
- One reason for the MySQL optimizer and executor has been criticized: does not support a hash join
- MySQL official roadmap, also has yet to put this optimization rafts agenda

The same name as a temporary table

Difference in memory temporary tables and table concept
- Memory table, refers to the use of table Memory engine, built table syntax create table ... engine = memory
  - This data tables are stored in memory, when the system is restarted will be cleared, but still table structure
- Temporary tables, you can use a variety of engine types (if using Memory engine, the characteristics of the above
  - If you are using InnoDB engine or temporary tables MyISAM engine, when the write data is written to disk
Characteristics of a temporary table
- Temporary table has the following characteristics in the use
  - Built table syntax is create temporary table ...
  - A temporary table is created it can only access the session, not visible to other threads (automatically deleted at the end of session
  - Temporary table with the same name can ordinary table
  - There are temporary tables with the same name and in the ordinary session A time table, show create statements, and CRUD statements to access the temporary table
  - show tables command does not display a temporary table
- Temporary table is particularly suitable for the beginning of the article, join us optimize this scene
  - Temporary table is different session of the same name (can support multiple session simultaneously perform join optimization scenarios
  - You do not need to worry about data deletion problem (automatic recovery at the end of session
Application of temporary tables
- Do not worry because the same name conflicts between threads, temporary tables are often used in the optimization process complex queries in
  - Among them, the sub-library sub-table cross-database query system is a typical usage scenario
- Sub-library sub-table sketch
  - Usually the scene sub-library sub-table, the table is to take a large logical distributed to different database instances
  - In this architecture, the partition key choice is to "reduce cross-database and cross-table queries" as the basis of
    - For every query can use when in the partition field conditions, which are sub-library sub-table programs most popular form of the statement
  - When the conditions for which the query is not used in the partition field Solutions scene f
    - The first idea is that in the process of proxy layer to achieve the sort code
      - The advantage of this embodiment is the processing speed, the data sub-library after get directly involved in the calculation in the memory
      - Shortcoming
        
        Development effort required is relatively large (if they are complex to operate, develop the ability to require a higher intermediate layer
        
        Pressure on the proxy side is relatively large, in particular, is a problem not enough memory and CPU bottlenecks prone
    - Another idea is, the summary of each sub-library to get a MySQL data table example of a logic do
      - Schematic cross-database query process
      - In practice, we often find that the amount of computation for each sub-library is not saturated
      - Therefore, the temporary table temp_ht directly onto the sub-library 32 one
Why can the same name as a temporary table
- 栗子：create temporary table temp_t(id int primary key)engine=innodb;
- MySQL InnoDB tables give this to create a frm save the table structure definition file, but also a place to store the table data
  - This frm file in a temporary file directory (select @@ tmpdir command to display the temporary file directory of the instance
    - Filename suffix is .frm, the prefix is "#sql {process id} {thread id} serial number
  - For storage of data in the table way, with a different approach in the different versions of MySQL
    - In version 5.6 and earlier, MySQL creates a temporary file in the same directory prefix, suffix to .ibd files, used to store data files
    - From the start version 5.7, MySQL introduced a temporary table space, designed to store temporary data files
- When maintenance MySQL data table memory which also have a mechanism to distinguish different tables, each table corresponding to a table_def_key
  - A value table_def_key of ordinary table is from the "library name table name +" get
  - For temporary tables, table_def_key in the "library name + table name" basis, but also joined the "server_id + thread_id"
    - Temporary table t1 that is created in a different session, they have different table_def_key, disk file name is different
- In the realization that each thread maintains its own list of temporary table
  - When operating in the session table, it will traverse the list first, if not the existence of ordinary operating table again
  - At the end of each session of the list in a temporary table, perform "DROP TEMPORARY TABLE + table name" Operation
Temporary tables and primary and replicate
- Problem: they can only access the temporary table in the thread, DROP TEMPORARY TABLE why should there be written binlog
  - If you do not record the temporary table operations, the backup in the implementation insert into t_normal select * from temp_t will complain
- Different formats binlog row, standby operation of the synchronization replication strategy temporary table
  - binlog a row format, the recording data corresponding to the operation (write_row event record insert a row (1,1)
  - Only binlog_format = statment / mixed time, binlog the temporary table before the recording operation
    - Will automatically delete the temporary table when the main library exit, but the standby database synchronization thread is continued in operation, all need to write delete operations
- Another problem: DROP TABLE t_normal/ * Generated by Server * / (Why synchronize wanted to change the standard format
  - drop table command can delete more than one table, if binlog_format = row, may cause the thread to stop synchronization
    - Because there is no table temp_t on library equipment, drop table command binlog record time, it is necessary to rewrite the statement to make
- The next question: the main library of the same name in different threads to create a temporary table is okay, but spread to the standby database to perform is how to deal
  - MySQL binlog at the time of recording, the main library will execute this statement written binlog thread id in
  - With this thread id line standby application library constructed temporary table table_def_key
    - session A temporary table t1, the library is prepared: library name + t1 + "M of serverid" + "thread_id session A's";
    - session B temporary table t1, the library is prepared: library name + t1 + "M of serverid" + "session B of thread_id";
  - Due to the different table_def_key, so the two tables in the application threads library which is not prepared conflict

Memory temporary table

union execution process
- 栗子：(select 1000 as f) union (select id from t1 order by id desc limit 2);
- union execution process
  - The second sub-query execution, to get the required check uniqueness constraint when trying to insert a temporary table data
  - Temporary memory table in this scenario acts as a temporary data, and by checking the temporary table's primary key uniqueness constraint
- After union all substituted Union, no longer dependent on the temporary table, the results obtained directly as part of the result set returned to the client
group by execution process
- 栗子：select id%10 as m, count(*) as c from t1 group by m;
- explain the results of the group by
  - Using index, indicates that the statement uses covered index, select the index a, do not need back to the table;
  - Using temporary, indicate the use of a temporary table;
  - Using filesort, expressed the need for the sort;
- group by execution process
  - Create a temporary table memory, and the table has two fields C m, m is the primary key;
  - T1 scan table index a, id values taken successively on the leaf node, the calculation result id% 10, denoted by x;
    - If the temporary table is not the primary key for the line x, a record is inserted (x,. 1);
    - If the table has a primary key of the row of x, x C will increase the value of the line 1;
  - After the traversal is complete, then sorted according to fields do m, result set returned to the client
- If demand does not need to sort the results, you can increase the order by null at the end of the SQL statement
- Memory temporary table size is limited, tmp_table_size parameters that control the memory size, the default is 16M
  - After the threshold is exceeded, the temporary memory table into a temporary table disk, disk temporary tables are InnoDB engine used by default
group by optimization method - index
- Why perform group by statement needs a temporary table
  - Group by the logic semantics, statistics of the number of different values appearing
  - As a result of each row id% 100 are unordered, so we need to have a temporary table to records and statistics
- If you can ensure that the data input is ordered, then the calculated group by the time you only need to scan from left to right order
  - In MySQL 5.7 release supports the generated column mechanism used to implement column data associated update
    - 栗子：alter table t1 add column z int generated always as(id % 100), add index(z);
  - Using the index z, can achieve no temporary file does not need to sort and tables (to sequentially read the index z
Optimization group by Method - Direct Sorting
- Background: If you do not run into the scene and create an index for the amount of data required on temporary tables particularly large place
  - Original process: first memory into a temporary table, insert some data, found that not enough memory temporary table then into a temporary table disk
  - Optimization: Optimization told by SQL_BIG_RESULT this hint (hint), a large number, go directly to disk temporary tables
    - Disk temporary table is a B + tree storage, the storage array is more efficient than high (considering the disk space it directly with an array of deposit
- Performed using a flowchart SQL_BIG_RESULT
  - Sort_buffer initialization, into a determined integer field, denoted as m;
  - T1 scan table index a, which is successively extracted id value, the value of id% 100 stored in sort_buffer;
  - After the scan is completed, the field m sort_buffer do sort (if enough memory, temporary files using Disk Secondary Sort
  - After sequencing is complete, you get an ordered array
  - According to an ordered array, which array to obtain different values, and the number of occurrences of each value, the result set is returned to the assembly
MySQL when it will use internal temporary tables
- If the process can execute the statement while reading the data, while the results directly, do not need additional memory
  - Otherwise you will need additional memory to store intermediate results
- join_buffer is disordered array, sort_buffer is an ordered array, a temporary table is a two-dimensional table structure
- If you do need to use two-dimensional table logic characteristics, it will give priority to the use of temporary table
  - union need to use a unique constraint index, group by further additional need to use a field to save accumulated count
Guidelines for the use of the group by
- If the result is not sorted group by statement of requirements, in order to increase end of the statement by null;
- Try to make the process of using the table above group by the index (explain the results, there are no Using temporary and Using filesort
- If the amount of data required statistical group by little, try to use only memory temporary tables
  - By appropriate transfer large tmp_table_size parameters, to avoid the temporary table to use Disk
- If the amount of data is too great, use SQL_BIG_RESULT tips (directly tell the optimizer to use sorting algorithm to get the result value

Affect memory table

The main reason to use the memory table in the production environment is not recommended
- Lock granularity problem
- Data persistence problem
Memory table data organization structures
- Memory and data organization InnoDB engine is different
  - InnoDB 引擎把数据放在主键索引上，其他索引上保存的是主键 id，称为索引组织表
  - Memory 引擎采用的是把数据单独存放，索引上保存数据位置的数据组织形式，称为堆组织表
- 两个引擎的一些典型不同
  - InnoDB 表的数据总是有序存放的，而内存表的数据就是按照写入顺序存放的
  - 当数据文件有空洞的时候，InnoDB 表在插入新数据的时候，为了保证数据有序性，只能在固定的位置写入新值，而内存表找到空位就可以插入新值
  - 数据位置发生变化的时候，InnoDB 表只需要修改主键索引，而内存表需要修改所有索引
  - InnoDB 表用主键索引查询时需要走一次索引查找，用普通索引查询的时候，需要走两次索引查找。而内存表没有这个区别，所有索引的“地位”都是相同的。
  - InnoDB 支持变长数据类型，不同记录的长度可能不同；内存表不支持 Blob 和 Text 字段，并且即使定义了 varchar(N)，实际也当作 char(N)，也就是固定长度字符串来存储，因此内存表的每行数据长度相同
hash 索引和 B-Tree 索引
- 内存表不仅支持 hash 索引同时也支持 B-Tree 索引
  - 栗子：alter table t1 add index a_btree_index using btree (id);
- 内存表的优势是速度快，其中的一个原因就是 Memory 引擎支持 hash 索引
  - 更重要的原因是，内存表的所有数据都保存在内存，而内存的读写速度总是比磁盘快
内存表的锁
- 内存表不支持行锁，只支持表锁（即一个表内数据只支持串休更新
  - 这里的表锁区别于 MDL 锁，是用来锁表内数据的（前者为表结构
- 跟行锁比起来，表锁对并发访问的支持不够好
  - 内存表的锁粒度问题，决定了它在处理并发事务的时候，性能也不会太好
数据持久性问题
- 数据放在内存中是内存表的优势，但也是一个劣势（数据库重启的时候所有的内存表都会被清空
- 由于 MySQL 担心主库重启之后，出现主备不一致，MySQL 在实现上做了这样一件事儿
  - 在数据库重启之后，往 binlog 里面写入一行 DELETE FROM t1
  - 如果此时使用双 M 结构的话
    - 备库重启的时候，备库 binlog 里的 delete 语句就会传到主库（主库内存表的内容删除
    - 主库再使用的时候就会发现，内存表数据突然被清空….
- 内存表并不适合在生产环境上作为普通数据表使用
  - 如何看待 “内存表执行速度快” 这个观点
    - 如果你的表更新量大，那么并发度是一个很重要的参考指标（行锁 >> 表锁
    - 能放到内存表的数据量都不大，而且 InnoDB 有 InnoDB Buffer Pool 作读性能保障
  - 建议把普通内存表都用 InnoDB 表来代替
    - 例外：在数据量可控，不会耗费过多内存的情况下，你可以考虑使用内存表
      - 数量量可控时，使用内存临时表的效果更好的原因
        
        相比于 InnoDB 表，使用内存表不需要写磁盘，往表 temp_t 写数据的速度更快
        
        索引 b 使用 hash 索引，查找的速度比 B-Tree 索引快
        
        临时表数据只有 2000 行，占用的内存有限
    - 内存临时表刚好可以无视内存表的两个不足
      - 临时表不会被其他线程访问，没有并发性的问题
      - 临时表重启后也是需要删除的，清空数据这个问题不存在
      - 备库的临时表也不会影响主库的用户线程

D.Chuan

Published 98 original articles · won 197 Like · views 60000 +

Private letter concerns

"Note" MySQL combat 45 say - practice articles (five)

MySQL Join

Join Optimization

The same name as a temporary table

Memory temporary table

Affect memory table

Guess you like