Forty-five lectures on MySQL (basic articles) knowledge summary (full)

According to Geek Time MySQL Forty-Five Lectures Basic Knowledge Points Summary

Look for Teacher Lin Xiaobin
insert image description here

1. MySQL Basic Architecture

basic structure

  • Connector: The connector is responsible for establishing a connection with the client, obtaining permissions, maintaining and managing the connection
  • Query cache: Previously executed statements and their results may be directly cached in memory in the form of key-value pairs. The key is the query statement, and the value is the query result. If your query can find the key directly in this cache, then the value will be returned directly to the client.
  • Analyzer: The analyzer will do "lexical analysis" first. What you input is an SQL statement composed of multiple strings and spaces. MySQL needs to identify what the strings in it are and what they represent. After completing these identifications, it is necessary to do "syntax analysis". According to the result of lexical analysis, the syntax analyzer will judge whether the SQL statement you input satisfies the MySQL syntax according to the syntax rules. If your statement is wrong, you will receive an error reminder of "You have an error in your SQL syntax"
  • Optimizer: The optimizer decides which index to use when there are multiple indexes in the table; or decides the connection order of each table when a statement has multiple table associations (join)
  • Executor: Know what you want to do through the analyzer, and know what to do through the optimizer, so you enter the executor stage and start executing the statement. When you start executing, you must first judge whether you have any knowledge about the table T Permission to execute the query, if not, it will return no permission error
  • Storage engine: The storage engine is the underlying software organization of the database, and the database management system (DBMS) uses the data engine to create, query, update and delete data. Different database management systems support a variety of different data engines, and different storage engines provide different storage mechanisms, indexing techniques, locking levels, and other functions.


The core of MySQL is the storage engine.

Tip: InnoDB is the preferred engine for transactional databases, supports transaction-safe tables (ACID), supports row locking and foreign keys. After MySQL 5.5.5, InnoDB is used as the default storage engine.



2. Log (Redo log and Binlog)

2.1 Important log module ----> Redo log

First record the amount of credit on the powder board, and then record it in the ledger when you are not busy

  • powderboard = "redo log"
  • ledger="disk"

With the redo log, InnoDB can guarantee that even if the database restarts abnormally, the previously submitted records will not be lost. This capability is called crash-safe

crash-safe

  • The credit record is recorded on the pink board or written on the account book. Even if the shopkeeper forgets it later, for example, the business is suddenly closed for a few days, after the business resumes, the credit account can still be clarified through the data on the account book and the pink board.

2.2 Important log module: Binlog

  • binlog will record all logical operations
  • two-phase commit

In order to make the data logically consistent

1. Write to redo log Prepare stage -> 2. Write to binlog -> 3.Commit submission


2.3 Redo log is different from Binlog

  1. The redo log is unique to the InnoDB engine; the binlog is implemented by the server layer of MySQL and can be used by all engines.
  2. Redo log is a physical log, which records "what modification was made on a certain data page"; binlog is a logical log, which records the original logic of this statement, such as "Add 1 to the c field of the row ID=2" .
  3. The redo log is written in a loop, and the space will always be used up; the binlog can be appended. "Append write" means that after the binlog file is written to a certain size, it will switch to the next one, and will not overwrite the previous log.

Redo log is used to ensure crash-safe capability. When the parameter innodb_flush_log_at_trx_commit is set to 1, it means that the redo log of each transaction is directly persisted to disk. I suggest you set this parameter to 1, so as to ensure that the data will not be lost after MySQL restarts abnormally.


When the sync_binlog parameter is set to 1, it means that the binlog of each transaction is persisted to disk. I also recommend that you set this parameter to 1, so as to ensure that the binlog will not be lost after MySQL restarts abnormally.



3. Transaction isolation

3.1 Characteristics of transactions

ACID (Atomicity, Consistency, Isolation, Durability, namely Atomicity, Consistency, Isolation, Durability)

3.2 Isolation level

When the database executes multiple transactions at the same time, avoid the problems of dirty read , non-repeatable read , and phantom read.

SQL standard transaction isolation levels: read uncommitted (read uncommitted) , read committed (read committed) , repeatable read (repeatable read) and serializable (serializable) .

  1. Read uncommitted means that when a transaction has not been committed, the changes it makes can be seen by other transactions.
  2. Read commit means that after a transaction is committed, the changes it makes will be seen by other transactions.
  3. Repeatable reading means that the data seen during the execution of a transaction is always consistent with the data seen when the transaction was started. Of course, under the repeatable read isolation level, uncommitted changes are also invisible to other transactions.
  4. Serialization , as the name implies, is for the same line of records, "write" will add "write lock", "read" will add "read lock". When a read-write lock conflict occurs, the later accessed transaction must wait for the previous transaction to complete before continuing to execute.

The default isolation level of the Oracle database is actually "read committed"

3.2 Implementation of transaction isolation

Each record will also record a rollback operation when it is updated. The same record can have multiple versions in the system, which is the multi-version concurrency control (MVCC) of the database

for example:

insert image description here
If this is the case, the rollback log will definitely be very large, but don't worry,

The rollback log will be deleted!

Well , when will it be deleted? ? ?

The system will judge that when no transaction needs to use these rollback logs, the rollback logs will be deleted

Well, what is the time when you don't need it? ? ?

When there is no read-view earlier than this rollback log in the system

Note: Try not to use long transactions

Well, why is this ? ? ?

Long transactions mean that there will be very old transaction views in the system. Before the transaction is committed, the rollback records must be kept, which will cause a large amount of storage space to be occupied. In addition, long transactions also occupy lock resources, which may drag down the library.

3. 4 Transaction start method

  1. An explicit start transaction statement, begin or start transaction. The matching commit statement is commit, and the rollback statement is rollback.
  2. set autocommit=0, this command will turn off the automatic submission of this thread. It means that if you only execute a select
    statement, the transaction will start and will not be automatically committed. This transaction persists until you actively execute a commit or rollback statement, or disconnect.

For example:
insert image description here
Let's take a look at the different return results of transaction A under different isolation levels, that is, what are the return values ​​of V1, V2, and V3 in the figure.

  • If the isolation level is "read uncommitted", the value of V1 is 2. At this time, although transaction B has not been submitted, the result has been seen by A. Therefore, V2 and V3 are also 2.
  • If the isolation level is "read committed", the value of V1 is 1, and the value of V2 is 2. Transaction B's updates cannot be seen by A until they are committed. Therefore, the value of V3 is also 2.
  • If the isolation level is "repeatable read", V1 and V2 are 1, and V3 is 2. The reason why V2 is still 1 is to follow this requirement: the data seen by the transaction during execution must be consistent before and after.
  • If the isolation level is "serialization", it will be locked when transaction B executes "change 1 to 2". Transaction B cannot continue until transaction A commits. So from the perspective of A, the values ​​of V1 and V2 are 1, and the value of V3 is 2.


4. Index in simple terms (Part 1)

4.1 Index function and model

The role of the index : improve data query efficiency
Common index models : hash table, ordered array, search tree

4.1.1 Hash table

Ideas:

A hash table is a structure that stores data with key-value (key-value). We only need to enter the value to be searched, namely key, to find its corresponding value, namely Value. The idea of ​​​​hashing is very simple. Put the value in the array, use a hash function to convert the key into a certain position, and then put the value in this position of the array.

How to resolve conflicts:

After multiple key values ​​are converted by the hash function, the same value will appear. One way to handle this situation is to pull out a linked list.

insert image description here

Practical scene:

The hash table structure is suitable for scenarios where there are only equivalent queries




4.1.2 Sorted arrays

Ideas:

Ordered array: stored in order. The query can be quickly queried by using the dichotomy method, and the time complexity is: O(log(N))
insert image description here

efficiency:

Ordered arrays have high query efficiency and low update efficiency

Applicable scene:

Ordinal array indexes only work with static storage engines




4.1.3 Binary Search Tree

Idea:
the left son of each node is smaller than the parent node, and the parent node is smaller than the right son

insert image description here

Time complexity:
query time complexity O(log(N)), update time complexity O(log(N))

Most of the database storage is not suitable for binary trees, because the tree height is too high, it will be suitable for N-ary trees

Note:
Most of the database storage is not suitable for binary trees, because the tree height is too high, it will be suitable for N-ary trees

4.2 InnoDB index model

4.2.1 Index type

The leaf nodes of the primary key index store the entire row of data. The primary key index is also called a clustered index.

The content of the leaf node of the non-primary key index is the value of the primary key. The non-primary key index is also called the secondary index (secondary index).

4.2.2 The difference between primary key index and ordinary index

The primary key index only needs to search the B+Tree of ID to get the data. The ordinary index first searches the index to get the primary key value, and then searches the primary key index tree once (return to the table)

Queries based on non-primary key indexes need to scan an additional index tree

The smaller the length of the primary key, the smaller the leaf nodes of the ordinary index, and the smaller the space occupied by the ordinary index.

Notice:

  • When a data page is full, adding a new data page according to the B+Tree algorithm is called page splitting, which will lead to performance degradation. The space utilization rate is reduced by about 50%. When the utilization rate of two adjacent data pages is very low, the data pages will be merged. The process of merging is the reverse process of the splitting process.
  • In terms of performance and storage space, auto-increment primary keys are often a more reasonable choice.



V. In-depth and easy-to-understand index (below)

5.1 Return form

In the process of searching again, when we query the non-primary key index, the process of returning to the primary key is called returning to the table

5.2 Covering indexes

If the queried data can directly provide results without returning to the table, it is called a covering index

 例如:select ID from T where k between 3 and 5

If the executed statement is select ID from T where k between 3 and 5, then you only need to check the ID value, and the ID value is already on the k index tree, so you can directly provide the query result without returning to the table. That is to say, in this query, the index k has "covered" our query requirements, which we call a covering index.

Note: Since covering indexes can reduce the number of tree searches and significantly improve query performance, using covering indexes is a common performance optimization method.




6. Global locks and table locks

6.1 Global locks

As the name implies, the global lock is to lock the entire database instance

MySQL provides a way to add a global read lock, the command is

 Flush tables with read lock (FTWRL)

When you need to make the entire library read-only, you can use this command, and then the following statements of other threads will be blocked:

  • Data update statement (addition, deletion and modification of data),
  • Data definition statements (including creating tables, modifying table structures, etc.)
  • Commit statement for an update transaction

Usage scenario:
The typical usage scenario of the global lock is to do a logical backup of the whole database. That is, select each table in the entire database and save it as text.

  • If you back up on the main library, you cannot perform updates during the backup period, and the business basically has to stop;
  • If you back up on the slave library, the slave library cannot execute the binlog synchronized from the master library during the backup, which will cause master-slave delays.

6.2 Table-level locks

There are two types of table-level locks in MySQL: one is a table lock , and the other is a metadata lock (meta data lock, MDL).

6.2.1 Table locks

The syntax for a table lock is

lock tables … read/write。

You can use unlock tables to actively release the lock, or it can be released automatically when the client is disconnected.

When finer-grained locks have not yet appeared, table locks are the most commonly used way to deal with concurrency.
For an engine that supports row locks like InnoDB, the lock tables command is generally not used to control concurrency. After all, the impact of locking the entire table is still too large.

6.2.2 MDL: (Metadata Lock)

MDL does not need to be used explicitly, it will be added automatically when accessing a table

The role of MDL is to ensure the correctness of reading and writing.

  • Read locks are not mutually exclusive, so you can have multiple threads add, delete, modify and query a table at the same time.
  • The read-write locks and the write locks are mutually exclusive to ensure the security of the operation of changing the table structure. Therefore, if two threads want to add fields to a table at the same time, one of them will not start executing until the other finishes executing.

6.3 Summary

Global locks are mainly used in the logical backup process . For libraries that are all InnoDB engines, I suggest you choose to use the --single-transaction parameter, which will be more friendly to the application.

Table locks are generally used when the database engine does not support row locks . If you find that there is a statement like lock tables in your application, you need to track it down. The more likely situation is:

  • Either your system is still using an engine that does not support transactions such as MyISAM, then you need to arrange to upgrade the engine;
  • Either your engine has been upgraded, but the code has not been upgraded. I have seen such a situation, the last business development is to change the lock tables and unlock tables to
    begin and commit, and the problem is solved.

MDL will not be released until the transaction is committed. When making table structure changes, you must be careful not to cause locks in online queries and updates.




Seven, row lock

7.1 What is a row lock?

A row lock is a lock for row records in a data table

explain:

For example, transaction A updates a row, and at this time, transaction B also needs to update the same row, and the update must wait until the operation of transaction A is completed.

In InnoDB transactions, row locks are added when needed, but they are not released immediately when they are not needed, but are not released until the end of the transaction. This is the two-phase locking protocol.
insert image description here

When the amount of data is large, row locks will cause lock conflicts. At this time, we need to reduce this concurrency conflict

7.2 Deadlocks and deadlock detection

7.2.1 Deadlock

When different threads in a concurrent system have cyclic resource dependencies, and the threads involved are all waiting for other threads to release resources, it will cause these threads to enter an infinite waiting state, which is called a deadlock.

When a deadlock occurs, there are two strategies:

  • One strategy is to just go ahead and wait until it times out. This timeout can be set by the parameter innodb_lock_wait_timeout.
  • Another strategy is to initiate deadlock detection, and after a deadlock is found, actively roll back a transaction in the deadlock chain so that other transactions can continue to execute. Set the parameter innodb_deadlock_detect to on to enable this logic.

7.2.2 Deadlock detection

First of all, we need to know this: deadlock detection also has an additional burden

Let's give an example to illustrate:

Every newly blocked thread has to judge whether it will cause a deadlock due to its own joining. This is an operation with a time complexity of O(n). Assuming that 1000 concurrent threads want to update the same row at the same time, then the deadlock detection operation is on the order of 1 million. Although the final detection result is that there is no deadlock, a lot of CPU resources will be consumed during this period. As a result, you'll see high CPU utilization, but few transactions per second.

How to solve the performance problem caused by this hot row update?

  • One way to cure headaches is to temporarily turn off deadlock detection if you can ensure that there will be no deadlock in this business. But this kind of operation itself has certain risks, because the deadlock is generally not regarded as a serious error in business design. After all, if a deadlock occurs, it will be rolled back, and then it will be no problem to retry through the business. This is The business is harmless. Turning off deadlock detection means that a large number of timeouts may occur, which is detrimental to business.
  • Another idea is to control the degree of concurrency. According to the above analysis, you will find that if the concurrency can be controlled, for example, only a maximum of 10 threads are updating the same row at the same time, then the cost of deadlock detection is very low, and this problem will not occur. A direct idea is to do concurrency control on the client side. However, you will quickly find that this method is not feasible because of the large number of clients. I have seen an application with 600 clients, so even if each client is controlled to only 5 concurrent threads, after being aggregated to the database server, the peak concurrent number may reach 3000.



8. Transaction isolation

Since we are talking about transaction isolation, we must first know whether MySQL is transaction-isolated or not, right? ? ?

In MySQL, there are two concepts of "views":

  • One is view. It is a virtual table defined with a query statement , which executes the query statement and generates results when called. The syntax for creating a view is create view ... , and its query method is the same as that of a table.
  • The other is the consistent read view used by InnoDB when implementing MVCC , that is, consistent read view, which is used to support
    the implementation of RC (Read Committed, read committed) and RR (Repeatable Read, repeatable read) isolation levels.

The role is to define "what data can I see" during transaction execution .

8.1 How does "snapshot" work in MVCC?

Consistency view:
insert image description here

In this way, for the moment of starting the current transaction, a data version of row trx_id has the following possibilities:

  1. If it falls in the green part, it means that this version is a committed transaction or generated by the current transaction itself, and this data is visible;
  2. If it falls in the red part, it means that this version is generated by a transaction started in the future, and it is definitely invisible;
  3. If it falls in the yellow part, it includes two situations
    a. If the row trx_id is in the array, it means that this version is generated by a transaction that has not been committed and is not visible; b
    . If the row trx_id is not in the array, it means that this version is Generated by committed transactions, visible.

So this is why
the subsequent updates in the system have nothing to do with what the transaction sees, but to see the version

InnoDB takes advantage of the feature that "all data has multiple versions" and realizes the ability to "create snapshots in seconds".

  1. For a data version, for a transaction view, there are three situations except that its own update is always visible: the version is not committed and not visible;
  2. The version was committed, but it was committed after the view was created and is not visible;
  3. The version has been committed, and it was committed before the view was created, so it is visible.

8.2 Update logic

Update data is read first and then written, and this read can only read the current value, which is called "current read".

The logic of read commit is similar to that of repeatable read. The main difference between them is:

  • Under the repeatable read isolation level, you only need to create a consistent view at the beginning of the transaction, and then other queries in the transaction will share this consistent view;
  • Under the read-committed isolation level, a new view is recalculated before each statement is executed.

8.3 Summary:

InnoDB row data has multiple versions, each data version has its own row trx_id, and each transaction or statement has its own consistent view. Ordinary query statements are consistent reads, which determine the visibility of data versions based on row trx_id and consistent views.

  • For repeatable reading, the query only recognizes the data that has been committed before the transaction is started ;
  • For read submission, the query only recognizes the data that has been submitted before the statement is started ;

The current read always reads the latest version that has been submitted.




After the basic chapter is released, the practice chapter will be updated later

Guess you like

Origin blog.csdn.net/qq_54729417/article/details/124718731