High-performance MySQL notes-MySQL basics (1)

1. MySQL architecture and history

Refer to the notes written by high-performance MySQL

1.1 MySQL logical architecture

Insert picture description here

The second-tier architect is the core part of MySQL. Most of the MySQL core functions are in this layer, including query, parsing, optimization, caching, and all built-in functions (date, time, number and encryption functions), all across storage The functions of the engine are implemented in this layer: stored procedures, views, triggers, etc.

The storage engine will not parse our SQL, and different storage engines will not communicate with each other, but simply respond to requests from the upper server.

InnoDB is an exception , he will parse foreign key definitions, because the MySQL server itself does not implement this function

1.1.1 Connection management and security

Each client will have a thread on the server, and the query of this connection will only be executed in this single thread, and the thread can only run in a certain CPU core or CPU in turn. The server will be responsible for caching threads, so it is not necessary to create or destroy threads for each new connection

Versions after MySQL5.5 provide an API to support the Thread-Pooling plug-in, which can use a small number of threads in the pool to service a large number of connections

1.1.2 Optimization and execution

MySQL will parse the query, create an internal data structure (parse tree), and then perform various optimizations on it, including rewriting optimization, determining the reading order of the table, selecting the appropriate index, and so on.

The optimizer does not care what storage engine the table uses, but the storage engine has an impact on optimizing queries

For the SELECT statement, before parsing the query, the server will first check the query cache (Query Cache) . If the corresponding query can be found in it, the server will not perform the entire process of query parsing, optimization and execution, but directly return to the query cache. Result set in

1.2 Concurrency control

1.2.1 Read-write lock

Also called shared lock and exclusive lock

1.2.2 Lock Granularity

One way to improve the concurrency of shared resources is to make the locked objects more selective, and try to lock only part of the data that needs to be modified instead of all resources.

The problem is that various lock operations such as locking and reading locks also consume resources.

The so-called lock strategy is to seek a balance between the overhead of the lock and the security of the data. Of course, this balance will also affect our performance.

MySQL provides a variety of options, each storage engine can implement its own lock strategy and lock granularity

Table lock

The entire table is locked. When users modify, add, delete, etc., they need to obtain a write lock first, which will block all read and write operations on the table by other users. The read locks do not block each other.

Write locks also have a higher priority than read locks, so write lock requests may be inserted into the queue. Conversely, read locks cannot be inserted into the write lock queue.

Row lock

As the name suggests is to lock a row of data, the level of the table lock is accurate to the level of the row

1.3 Affairs

A transaction is a set of atomic SQL queries, or an independent unit. Either all statements in the transaction are executed, or all execution fails.

ACID

It means: atomicity, consistency, isolation, durability

Atomicity:

​ A thing must be regarded as an indivisible minimum unit of work, that is to say, either all submitted successfully, or all failed to roll back, it is impossible to perform only part of the operation

consistency:

​ The database always transitions from one consistency state to another consistency state

Isolation:

Generally speaking , a modification of things done before the final submission of the transaction are not visible to others

Endurance:

​ Once the transaction is committed, the changes made will be permanently saved in the database.

1.3.1 Isolation level

SQL provides four isolation levels, each of which specifies the modifications made in a transaction, which are visible within and between transactions, and which are invisible. Lower isolation levels can usually perform higher concurrency, and the system overhead is also lower

The isolation level implemented by each storage engine is not the same

READ UNCOMMITTED uncommitted read

At this level, the modification of a transaction, even if it is not committed, is visible to other transactions.

Transactions can read uncommitted data, which is also called dirty read , and there are certain security issues

The performance is not much better than other levels, but it lacks the security of other levels, so it is generally rarely used

READ COMMITTED

The default isolation level of most database systems is READ COMMITTED, but MySQL is not. READ COMMITTED satisfies the simple definition of isolation we mentioned earlier:

At the beginning of a transaction, you can only "see" the changes made by the committed transaction. That is to say, from the beginning of a transaction until it is committed, any changes made are invisible to other transactions.

This level is also called non-repeatable read, because two executions of the same query may get different results

REPEATABLE READ repeatability

REPEATABLE READ solves the problem of dirty reads. This level ensures that the results of reading the same record multiple times in the same transaction are consistent.

But theoretically, the repeatable read isolation level still cannot solve another
Phantom Read problem. The so-called phantom read refers to when a transaction is reading a record in a certain range, another transaction inserts a new record in the range, and when the previous transaction reads the record in the range again, it will Phantom Row is generated. InnoDB and XtraDB storage engines solve the problem of phantom reading through multi-version concurrency control (MVCC, Multiversion Concurrency Control). We will discuss this further later in this chapter.
Repeatable read is the default transaction isolation level of MySQL.

SERIALIZABLE can be serialized

SERIALIZABLE is the highest isolation level . It avoids the problem of phantom reading mentioned earlier by forcing transactions to be executed serially. Simply put, SERIALIZABLE will lock each row of data read, so it may cause a lot of timeout and lock contention problems. This isolation level is rarely used in practical applications. It is only considered to adopt this level when it is necessary to ensure data consistency and it is acceptable to have no concurrency.

Isolation level Dirty read possibility Possibility of non-repeatable reading Possibility of phantom reading Lock read
READ UNCOMMITTED yes yes yes no
READ COMMITTED no yes yes no
REPEATABLE READ no no yes no
SERIALIZABLE no no no yes

1.3.2 Deadlock

Deadlock refers to the phenomenon of two or more transactions occupying each other on the same resource and requesting to lock the resources occupied by each other, which leads to a vicious circle. When multiple transactions try to lock resources in different orders, deadlocks may occur. When multiple transactions lock the same resource at the same time, deadlock will also occur.

For example, imagine that the following two transactions process the student table at the same time

START TRANSACTION
UPDATE Student SET class=5 WHERE student_id = 3 AND DATE="2021-02-12";
UPDATE Student SET class=6 WHERE student_id = 4 AND DATE="2021-02-13";
COMMIT;

START TRANSACTION
UPDATE Student SET student_name="yyyy" WHERE student_id = 4 AND DATE="2021-02-13";
UPDATE Student SET student_name="ssss" WHERE student_id = 3 AND DATE="2021-02-12";
COMMIT;

If it happens, both transactions executed the first UPDATE statement, updated a row of data, and also locked the row of data, and then each transaction tried to execute the second UPDATE statement, but found that the row has been locked by the other party , And then both transactions are waiting for the other to release the lock, and at the same time holding the lock that the other needs, it will fall into an endless loop. Unless external factors intervene, it is possible to remove the deadlock.

In order to solve this problem, the database system implements various deadlock detection and deadlock timeout mechanisms. The more complex the system, such as the InnoDB storage engine, the more it can detect the deadlock's circular dependency and return an error immediately. This solution is very effective, otherwise deadlock will lead to very slow queries. Another solution is to give up the lock request when the query time reaches the lock wait timeout setting. This method is usually not good.

InnoDB's current method of dealing with deadlocks is to roll back the transaction that holds the fewest row-level exclusive locks (this is a relatively simple deadlock rollback algorithm).

The lock behavior and sequence are related to the storage engine. Execute the statements in the same order, some storage engines will deadlock, some will not. There are two reasons for deadlocks: some are due to real data conflicts, which are usually difficult to avoid, but some are entirely due to the implementation of the storage engine.

After the deadlock occurs, only a partial or complete rollback of one of the transactions can break the deadlock. For transactional systems, this is unavoidable, so applications must consider how to deal with deadlocks when designing. In most cases, you only need to re-execute the transaction that was rolled back due to deadlock .

1.3.3 Transaction Log

The transaction log can help improve transaction efficiency.

Using the transaction log, the storage engine only needs to modify its memory copy when modifying the data of the table , and then record the modification behavior in the transaction log persisted on the hard disk, instead of persisting the modified data itself to the disk every time . The transaction log uses an append method, so the operation of writing the log is sequential I/O in a small area on the disk, unlike random I/O that requires moving the head in multiple places on the disk, so the
transaction log is used The way is relatively faster.

After the transaction log is durable, the modified data in the memory can be slowly flushed back to the disk in the background.

At present, most storage engines are implemented in this way. We usually call it Write-Ahead Logging. Modifying data requires writing to disk twice.
If the data modification has been recorded in the transaction log and persisted, but the data itself has not been written back to the disk, the system crashes at this time, and the storage engine can automatically restore this part of the modified data when it restarts. The specific recovery method depends on the storage engine.

1.3.4 Transactions in MySQL

MySQL provides two transactional storage engines: InnoDB and NDB Cluster. In addition, some third-party storage engines also support transactions. The more well-known ones include XtraDB and PBXT. Some of their respective characteristics will be discussed in detail later. (MyISAM is also among them)

Automatic submission (AUTOCOMMIT)

MySQL uses AUTOCOMMIT mode by default.

In other words, if you do not explicitly start a transaction, each query is treated as a transaction to perform the commit operation. In the current connection, you can enable or disable the auto-commit mode by setting the AUTOCOMIT variable:

查看本地默认的提交模式
SHOW VARIABLES LIKE 'AUTOCOMMIT';

Insert picture description here

My ON here means to enable our automatic submission, if we want to modify the default value

SET AUTOCOMMIT = 0;

When AUTOCOMMIT=0, all queries are in one transaction, until the COMMIT commit or ROLLBACK rollback is performed explicitly, the transaction ends and another new transaction is started at the same time.

Modifying AUTOCOMMIT has
no effect on non-transactional tables, such as MyISAM or memory tables . For this type of table, there is no concept of COMMIT or ROLLBACK, and it can also be said to be equivalent to having been in AUTOCOMMIT enabled mode.

There are also some commands that will force COMMIT to commit the current active transaction before execution . A typical example, in the data definition language (DDL), if it is an operation that will cause a large amount of data to change, such as ALTER TABLE, this is the case . In addition, other statements such as LOCK TABLES can cause the same result. If necessary, please check the official documentation of the corresponding version to confirm the list of all statements that may cause automatic submission.
MySQL can set the isolation level by executing the SET TRANSACTION ISOLATION LEVEL command. The new isolation level will take effect when the next transaction starts. The isolation level of the entire database can be set in the configuration file, or only the isolation level of the current session can be changed:

SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITED;

MySQL can recognize all four ANSI isolation levels, and the InnoDB engine also supports all isolation levels.

Mix storage engines in transactions

The MySQL server layer does not manage transactions, and transactions are implemented by the underlying storage engine. Therefore, it is unreliable to use multiple storage engines in the same transaction.
If you mix transactional and non-transactional tables (such as InnoDB and MyISAM tables) in a transaction, there will be no problems in the normal commit.
But if the transaction needs to be rolled back, the changes on the non-transactional table cannot be undone, which will cause the database to be in an inconsistent state. This situation is difficult to repair, and the final result of the transaction will be uncertain. Therefore, it is very important to choose the appropriate storage engine for each table.
When performing transaction-related operations on a non-transactional table, MySQL usually does not issue a reminder or report an error. Sometimes a warning is issued only when it is rolled back: "Changes on certain non-transactional tables cannot be rolled back." But in most cases, there will be no prompts for operations on non-transactional tables.

Implicit and explicit lock

  1. Implicit lock

    InnoDB uses a two-phase locking protocol. During the execution of the transaction, locking can be performed at any time. The lock is released only when COMMIT or ROLLBACK is executed, and all locks are released at the same time. The locks described above are all implicit locks, and InnoDB will automatically lock when needed according to the isolation level.

  2. Display lock

    In addition, InnoDB also supports explicit locking through specific statements, which are not part of the SQL specification:

    SELECT ... LOCK IN SHARE MODE
    SELECT ... FOR UPDATE
    

    MySQL also supports LOCK TABLES and UNL0CK TABLES statements, which are implemented at the server layer and have nothing to do with the storage engine. They have their own uses, but they are not a substitute for transaction processing. If the application needs to use transactions, you should still choose a transactional storage engine.

It can often be found that the application has converted the table from MyISAM to InnoDB, but still explicitly uses the LOCK TABLES statement. This is not only unnecessary, but also severely affects performance. In fact, InnoDB's row-level locks work better.
If L0CK TABLES and transactions affect each other, the situation will become very complicated, and even unpredictable results will be produced in some MySQL versions. Therefore, this book recommends that, except that AUTOCOMIT is disabled in the transaction and LOCK TABLES can be used , do not explicitly execute LOCK TABLES at any time, no matter what storage engine is used.

1.4 Multi-version concurrency control

Most of MySQL's transactional storage engines do not implement simple row-level locks.

Based on the consideration of improving concurrency performance, they generally implement multi-version concurrency control (MVCC) at the same time. Not only MySQL, but also other database systems such as Oracle and PostgreSQL have also implemented MVCC, but their implementation mechanisms are not the same, because MVCC does not have a unified implementation standard.

It can be considered that MVCC is a variant of row-level locking, but it avoids locking operations in many cases, so the overhead is lower. Although the implementation mechanisms are different, most of them implement non-blocking read operations, and write operations only lock necessary rows.

The realization of MVCC is achieved by saving a snapshot of the data at a certain point in time . In other words, no matter how long it takes to execute, the data seen by each transaction is consistent.

According to the time when the transaction starts, the data seen by each transaction on the same table at the same time may be different. If there is no such concept before, this sentence sounds a bit confusing. After you are familiar with it, you will find that this sentence is actually very easy to understand.
As mentioned earlier, the implementation of MVCC for different storage engines is different. The typical ones are optimistic concurrency control and pessimistic concurrency control. Below we use InnoDB's simplified version of the behavior to illustrate how MVCC works.

InnoDB's MVCC is implemented by storing two hidden columns behind each row of records . Of these two columns, one holds the creation time of the row , and the other holds the expiration time (or deletion time) of the row .

Of course, what is stored is not the actual time value, but the system version number. Every time you start a new transaction, the system version number is automatically incremented.

The system version number at the beginning of the transaction will be used as the version number of the transaction to
be compared with the version number of each row of the query . Let's take a look at how MVCC operates under the REPEATABLE READ isolation level .

SELECT
	InnoDB会根据以下两个条件检查每行记录:
	a. InnoDB只查找版本早于当前事务版本的数据行(也就是,行的系统版本号小于或等于事务的系统版本号),这样可以确保事务读取的行,要么是在事务开
始前已经存在的,要么是事务自身插人或者修改过的。
	b.行的删除版本要么未定义,要么大于当前事务版本号。这可以确保事务读取到的行,在事务开始之前未被删除。只有符合上述两个条件的记录,才能返回作为查询结果。

The meaning of the above is simply: only the data modified by the previous transaction can be found, and the data before the start of the transaction can be found

INSERT
	InnoDB为新插入的每一行保存当前系统版本号作为行版本号。
DELETE
	InnoDB为删除的每一行保存当前系统版本号作为行删除标识。
UPDATE
	InnoDB为插入一行新记录,保存当前系统版本号作为行版本号,同时保存当前系统版本号到原来的行作为行删除标识。

Save these two additional system version numbers, so that most read operations can be unlocked. This design makes the operation of reading data very simple, the performance is very good, and it can also guarantee that only the rows that meet the standards will be read.

The disadvantage is that each row of records requires additional storage space, more row inspection work, and some additional maintenance work.
MVCC only works under two isolation levels: REPEATABLE READ and READ COMMITTED . The other two isolation levels are not compatible with MVCC (MVCC does not have a formal specification, so the implementation of each storage engine and database system is different, no one can say that other implementations are wrong)

Because READ UNCOMMITTED always reads the latest data row, not the data row that conforms to the current transaction version. And SERIALIZABLE will lock all rows read.

Guess you like

Origin blog.csdn.net/qq_22155255/article/details/109908057
Recommended