MySQL integration chapter (SQL statement execution process-->Index chapter-->Transaction chapter-->Lock chapter)

MySQL Basics 1.1 What happens when executing a SQL statement

 1. MySQL architecture is divided into two layers: server and storage engine layer (usually Innodb engine)

     The main execution processes are in the server layer: connector, query cache, parsing SQL (parser), executing SQL (preprocessor, optimizer, executor)

    Storage engine layer: The index data structure is implemented by the engine layer. The Innodb engine supports B+ tree indexes. The primary key index and secondary index created in the data table are implemented by B+ tree.

     What happens when you execute a SQL statement

1. Connector: Make three TCP connections with the client (long connection)

                    Verify username and password

                    After the user name is verified, logical judgment will be made based on the user permissions read at this time.

       TCP long connections will not be easily disconnected, so the advantage of using long connections is to reduce the process of establishing and disconnecting connections.

      However, it will also occupy too much memory over time, so the following two methods will be used to disconnect:

               1) Disconnect long connections regularly

               2) The client actively resets the connection

 2. Query cache: If the query statement hits the query cache, it will be returned directly, otherwise it will continue to execute. This module has been deleted in MySQL 8.0; (because when a table is updated, the corresponding statement will be cleared, so it will be useless)

3. Parse SQL: two processes:

              1. Lexical analysis: Identify keywords based on the input characters and build a SQL syntax tree to facilitate subsequent modules to obtain SQL types, table names, and field names.

              2. Syntax analysis: Based on the syntax tree of lexical analysis, determine whether the statement conforms to the syntax of MySQL.

4. Execute SQL 

    1. Preprocessor:

                      Check whether the tables and fields in the SQL statement exist

                      Expand * in SQL statements to all columns in the table

    2. Optimizer:

                      Based on query cost considerations, select the execution plan with the smallest cost. (Mainly choose primary key index or secondary index)

   3. Actuator:

                       Execute SQL query statements according to the execution plan, read records from the storage engine, and return them to the client;

How is a row of records stored in MySQL?

In which file are MySQL data files stored?

There are three files in total: db.opt is used to store the default character set and character verification rules of the current database.

                              t_order.frm is used to store the current table structure

                              t_order.ibd is used to store the current table data (MySQL data is stored here)

File structure of table space:

 Row: The data in the database table is stored in the form of rows. Each row of records adopts a different storage structure according to different row formats.

Page: Reading only one row at a time will be very inefficient, so the database reads data in units of pages. The default size of each page is 16KB.

Area: The Innodb storage engine stores data in the form of a B+ tree. Each layer of the B+ tree is connected in the form of a doubly linked list. However, two adjacent pages in the linked list are not continuous, so the form of area is used to make the pages adjacent to each other. Therefore, when the amount of data in the table is large, memory space is allocated in units of areas, and the size of each area is 1MB.

Segment: The table space is composed of multiple segments, and the segment is composed of multiple extents. Segments include data segment, index segment, rollback segment, etc.

Index segment: a collection of areas that store non-leaf nodes in the B+ tree

Data segment: stores the collection of leaf nodes in the B+ tree

Rollback segment: A collection of areas that store rollback data.

Innodb row format: COMPACT

Innodb's row format is divided into two parts: additional information recorded and real data recorded

The additional information recorded is divided into three parts: variable length field length list, NULL value list, and record header information.

Real data recorded:

The front contains three fields:

row_id: When the primary key or unique constraint is set when creating the table, there will be no row_id field.

trx_id: transaction id, indicating which transaction the current data was generated by.

roll_pointer: Pointer to the previous version of this record. 

Summarize:

How MySQL's NULL values ​​are stored:

NULL columns are marked by a list of NULL values ​​in the Compact row format. NULL is not stored in the real data part of the row format.

How does MySQL know the actual data size occupied by varchar(n):

Take advantage of variable field length lists in row format.

After the row overflows, how to deal with the excess data:

When one row cannot store all the data, the excess data will be stored in the overflow page. And in the real data part, 20KB of space is flowed out to point to the address of the overflow page.

_____________________________________________________________________________

MySQL index

 What is an index? An index is equivalent to the table of contents of a book. Corresponding data can be quickly found through the index.

Index common interview questions:

1. Classification of indexes

Classified by data structure:

       

Classified by physical storage: generally divided into clustered index (primary key index), secondary index (auxiliary index)

   Data is generally stored on the leaf nodes of the B+ tree of the primary key index, and all complete user data is stored on the leaf nodes of the primary key index.

   The leaf nodes of the B+ tree of the secondary index generally store the primary key value instead of the actual data. Then the corresponding data is found through the primary key value, which means that the secondary index needs to be queried twice.

   Reply to table:

If the corresponding data is obtained in the first query of the secondary index, there is no need to query again. This is called a covering index. If the corresponding data is not obtained in the first query and only the primary key value is obtained, then another primary key retrieval is required. This process is called table return.             

Classified by field characteristics:

      Primary key index: The primary key index is an index built on the primary key. It is generally created when creating a table. A table can only have one primary key index, and the value of the column in the primary key index is not allowed to be null.

      Unique index: A unique index is an index built on the UNIQUE field. A table can have multiple unique indexes. The value of the index column must be unique and null values ​​are allowed.

      Ordinary index: An index built on ordinary fields, which neither requires the field to be a primary key index nor a unique index.

      Prefix index: A prefix index is an index built on the first few characters of a character field. The purpose of using prefix index is to reduce the memory space occupied by the index. Improve query efficiency.

Classified by number of fields:

      Single column index: an index built on a single column field

      Union index: An index built on multi-column fields (using the leftmost matching principle).

2. When is it necessary to create an index/when is it not necessary to create an index?

 Benefits and disadvantages of indexing:

     The biggest benefit of indexing is to increase query speed.

     Disadvantages of indexes: they need to occupy physical space. The larger the number, the larger the space occupied.

                          Creating and maintaining indexes takes time, and this time increases with the amount of data. Index is a typical exchange of space for time

                          This will reduce the efficiency of table additions, deletions, modifications, and queries. Every time a table is added, deleted, modified, and checked, the B+ number must be dynamically maintained in order to maintain the orderliness of the index.

When do you need to create an index:

     Fields have uniqueness restrictions, just like product codes

     It is often used in the fields of where statement query statements, which can improve the query efficiency of the entire table. If the query field is not one, a joint query can be performed.

      The statement contains ORDER BY and GROUP BY fields, so there is no need to do a sorting when querying. Because after the index is established, the data on the B+ tree is ordered.

When there is no need to query data:

      WHERE, ORDER BY and GROUP BY statements that are not used.

      A large amount of data appears in the field, such as the gender field, and only general data will appear in each query.

     There is no need to create an index when the table data is too small

     For fields that are frequently updated, the B+ tree needs to maintain the orderliness of the data. Frequent additions, deletions, modifications and queries will affect the performance of the database.

3. Index optimization methods :

    Prefix index optimization method: Prefix index uses the first few characters of a certain field as an index. When some large fields are used as indexes, using prefix index can reduce the size of the index items.

    Covering index optimization: In secondary index queries, if the corresponding data is obtained in the first query, the table return operation can be avoided. Method: We can build a union query. That is, "product ID, name, price" is used as a joint index. If this data exists in the index, the query will not retrieve the primary key index again, thus avoiding table backs.

   It is best for the primary key to be auto-increasing: If we set the primary key to be auto-increasing, then every time the data is increased, it will be added to the corresponding leaf node in order. No mobile data required. Every time new data is inserted, it is an append operation.

  It is best to set the index to NOT NULL: NULL in the index will make the optimizer more troublesome when making index selection.

A NULL value is meaningless data, but will add 1KB of space to the list of NULL values ​​in the row format.

  To prevent index failure:

  • When we use left or left fuzzy matching, either of  like %xx these  like %xx%methods will cause index failure;
  • When we perform calculations, functions, and type conversion operations on index columns in query conditions, these situations will cause index failure;
  • To use the joint index correctly, you need to follow the leftmost matching principle, that is, the index is matched according to the leftmost priority method, otherwise the index will become invalid.
  • In the WHERE clause, if the condition column before the OR is an index column and the condition column after the OR is not an index column, the index will fail.

Looking at B+ trees from the perspective of data pages

 How Innodb stores data

Innodb reads data in the form of data pages (the efficiency is very low in the form of rows). The default size of each page is 16KB, which means that each time the database reads data, it is in units of 16KB, and at least 16KB is read at a time. data into memory. At least 16KB of data in memory is flushed to disk at a time.

The main function of the data page is to store data, that is, the data in the database. The records in the data page form a one-way linked list in order of primary keys. The main advantage of a one-way linked list is that it facilitates insertion and deletion. But the retrieval efficiency is not high.

There is a page directory in the data page, which plays a role in record retrieval.

Each slot is equivalent to a pointer pointing to the last record of a different group. ,

The page directory is prepared with multiple slots, which are equivalent to the index of grouped records. When we search for data, we can quickly locate the corresponding slot through the dichotomy method, and then traverse all the records in the slot to find the corresponding record.

How does B+ tree perform queries?

 The number of disk I/Os is crucial to the efficiency of index usage. Therefore, using B+ tree is the most appropriate.

Each node in the B+ tree is a data page.

The non-leaf nodes in the B+ tree are only used to store directory entries for indexing, and the leaf nodes of the B+ tree are used to store data.

All nodes are sorted according to the size of the index key, forming a doubly linked list to facilitate range queries.

Why does MySQL use B+ tree to store data?

Designing a data structure suitable for MySQL index needs to meet the following points

Ability to perform I/O operations on as few disks as possible

To query a record efficiently, you must also be able to perform range searches efficiently.

 What is binary search

 Binary search can halve the range every time and the time complexity will be reduced.

binary search tree

All nodes in the left subtree of each node are smaller than this node. All nodes in the right subtree are greater than this node.

self-balancing binary tree

The height difference between the left subtree and the right subtree of each node cannot exceed 1. However, regardless of whether it is a balanced binary tree, as more data is inserted, the height of the tree will also increase. This means that the number of disk I/O will increase. Will affect the overall query efficiency. Each node will have two child nodes.

B-tree

Each child node of the B-tree will have multiple child nodes. Reduced tree height issue.

 B+tree

The main differences between B+ trees and B trees are the following points:

  • Only leaf nodes (the bottom nodes) will store actual data (index + record), and non-leaf nodes will only store indexes;
  • All indexes will appear at leaf nodes, and an ordered linked list is formed between leaf nodes;
  • The index of a non-leaf node will also exist in the child nodes and is the maximum (or minimum) of all indexes in the child nodes.
  • There are as many indexes as there are child nodes in non-leaf nodes;

_____________________________________________________________________________

MySQL Transactions 

What are the characteristics of transactions?

Atomicity:

All operations in a transaction must be executed in full. Either it's all done or it's not done at all. If an error occurs in the middle, it must be rolled back to the initial state.

Durability:

After the transaction is completed, the modification to the data is permanent and will not change even if the system fails.

consistency:

It means that the data remains intact and binding before and after the transaction operation.

Isolation:

The database has the ability for multiple concurrent transactions to read, write and modify data at the same time. Isolation can prevent data inconsistency caused by cross-execution when multiple concurrent transactions are executed.

In order to achieve the above four characteristics. The database implements the following mechanisms:

Durability is achieved through redo log (redo log)

Atomicity is achieved through undo log (rollback log)

Isolation is achieved through MVCC multi-concurrency control or locks

Consistency is achieved through persistence + atomicity + isolation

 Let’s talk about isolation first:

Problems caused by concurrent transactions:

Dirty read: One transaction reads uncommitted data from another transaction

Non-repeatability: The same data is read multiple times within a transaction, and the data read twice is found to be inconsistent.

Phantom reading: Query the number of records that meet the query conditions multiple times in a transaction, and find that the number of records in the two queries is inconsistent.

SQL provides four isolation levels to avoid the above four phenomena. The higher the isolation level, the lower the performance efficiency.

Read uncommitted: Uncommitted data of another transaction can be read (dirty reads, non-repeatable reads, and phantom reads may occur)

Read commit: You can read data that has been committed by another transaction (non-repeatable reads and phantom reads may occur)

Repeatable reading: Starting from a transaction, the data read by it is consistent from beginning to end (phantom reading may occur)

The default isolation level of Innodb engine is Repeatable Read. But he largely avoided phantom reading. The following two methods are adopted

1. For snapshot reading (ordinary select statement)

It uses MVCC multi-concurrency control, because the data seen during the execution of the transaction is always consistent with the data seen when the transaction is started. So even if another transaction inserts a piece of data midway, it will not be seen by this transaction.

2. For current reading (select ...for update)

Gap lock + record lock to achieve. When a statement is executed, record locks and gap locks are added. If there are other transactions inserting a piece of data in the gap lock and record lock, it will be blocked. If data cannot be inserted, the phenomenon of phantom reading can be avoided.

 Let’s talk about how ReadView works in MVCC

First understand two important knowledge of ReadView

1. Four fields of ReadView

2. Two hidden columns related to transactions in the clustered index record

 1. m_ids is the list of active transaction IDs in the current database. Active transactions refer to data that has been started but has not yet been submitted.

 2. min_trx_id When creating ReadView, the transaction ID of the smallest transaction among the active and uncommitted transactions in the current database

 3. max_trx_id creates the id value that should be given to the next transaction in the current database of ReadView. The maximum id value of the current database is +1.

  4. creator_trx_id refers to the id value when the transaction was created

Two hidden columns in clustered index records

 trx_id 

When a transaction changes a clustered index record, the id of the transaction will be hidden in trx_id

 roll_pointer

Every time a clustered index record is modified, the old version of the record will be written to the undo log, and roll_pointer is a pointer pointing to each old version of the record, through which the record before modification can be found.

After creating ReadView, the trx_id in the record can be divided into three situations:

 When a transaction accesses records, in addition to the records it updates being always visible, there are also the following situations:

1. If the trx_id value of the record is less than min_trx_id, it means that this version of the record was generated before the ReadView was created. So the version record is visible to the transaction.

 2. If the trx_id value of the record is greater than max_trx_id, it means that this version of the record was generated after the transaction was submitted after the ReadView was created, so this version of the record is not visible to the transaction.

 3. If the trx_id of the record is between min_trx_id and max_trx_id, you need to determine whether the trx_id is in the m_ids list

         1) In the m_ids list, it means that the transaction that generated the record is still active, so the record is not visible

         2) Not in the m_ids list, it means that the transaction has been submitted and the record is visible.

This behavior of controlling multiple concurrent transactions accessing the same record through a version chain (implemented through undo logs, using roll_pointer to point to the old version of the record) is called MVCC (Multi-version Concurrency Control)

Repeatable reading and read-commit are both implemented through ReadView. Repeatable reading means that each transaction has a ReadView from beginning to end.

Read submission creates a new ReadView every time data is read.

_____________________________________________________________________________

MySQL locks

1. What locks does MySQL have?

       1) After global lock is executed, the entire database will be in a read-only state and is generally used for full database logical backup.

       2) Table-level lock:

               Table lock:

                        Table-level shared lock: read lock     

                        Table-level exclusive lock: write lock

                Metadata lock (MDL): When we operate on the database, we will add metadata locks to the table

                           When we perform CRUD on a table, we add an MDL read lock.

                           When changing the table structure of a table, an MDL write lock is added.

               Intention lock: When performing add, delete, modify, query, you need to add an intention exclusive lock to the table first, and then add an exclusive lock to the record.

                           The purpose of adding intention locks is to quickly determine whether any records in the table are locked.

               AUTO-INC lock: The primary key in the table is usually set to auto-increment, which is achieved by declaring AUTO-INCREMENT on the field.

    3) Row level lock

            Record lock Record lock (lock the data of a certain row in the table):

                S lock: When a transaction adds an S lock to a record, another transaction can also add an S lock, but cannot add an X lock.

                X lock: When a transaction adds X lock to a record, other transactions cannot add X lock or S lock.

            Gap lock: It is used for repeatable read isolation level. The purpose is to solve the situation of phantom reading under repeatable read isolation level.

                       Gap locks are compatible, that is, two transactions can have gap locks with a common gap range. There is no mutually exclusive relationship.

             Temporary lock: It is a record lock + gap lock. It locks a range and locks the record itself.

2. How does MySQL lock?

      Unique index equivalent query:

              When the queried record exists, after locating this record in the index tree, the temporary lock of the record's index is degraded into a record lock. The phenomenon of phantom reading can be avoided by relying solely on the record lock. X-type intention lock and exclusive lock.

             When the queried record does not exist, after finding the first record in the index book that is larger than the queried record, the temporary lock of the index of the record is degraded into a gap lock.

     Non-unique index equivalent query:

  • When the queried record "exists", since it is not a unique index, there must be a record with the same index value. Therefore, the process of non-unique index equivalent query is a scanning process until the first secondary level that does not meet the conditions is scanned. The index record stops scanning, and then during the scanning process, the next-key lock is added to the scanned secondary index record, and for the first secondary index record that does not meet the conditions, the next-key lock of the secondary index is Key locks will degenerate into gap locks. At the same time, record locks are added to the primary key index of records that meet the query conditions .
  • When the queried record "does not exist" and the first secondary index record that does not meet the conditions is scanned, the next-key lock of the secondary index will degenerate into a gap lock. Because there are no records that meet the query conditions, the primary key index will not be locked .

     

The differences in the locking rules for range queries of non-unique indexes and primary key indexes are:

  • When certain conditions are met for a unique index, the next-key lock of the index degenerates into a gap lock or a record lock.
  • For non-unique index range queries, the next-key lock of the index will not degenerate into gap locks and record locks.

Guess you like

Origin blog.csdn.net/weixin_55347789/article/details/131664497