MySql index, lock, transaction, you must know the points

1. Index

Index, similar to a book catalog, can find the corresponding content immediately according to a certain page number of the catalog.

Advantages of indexes: 1. Naturally sorted. 2. Quick search.
Disadvantages of index: 1. Take up space. 2. Reduce the speed of updating tables.

Note: Small tables use full table scans faster, and medium and large tables use indexes. The index of the super large table is basically invalid.

In terms of implementation, the index is divided into two types: clustered index and auxiliary index (also called secondary index or non-clustered index)

Functionally, there are 6 types: ordinary index, unique index, primary key index, compound index, foreign key index, and full-text index.

Talk about 6 kinds of indexes in detail:

1. Ordinary index: The most basic index without any constraints.
2. Unique index: similar to ordinary index, but with unique constraints.
3. Primary key index: a special unique index, no null values ​​are allowed .
4. Composite index: combine multiple columns to create an index, which can cover multiple columns.
5. Foreign key index: Only InnoDB type tables can use foreign key index to ensure data consistency and integrity and realize cascading operation.
6. Full-text indexing: MySQL's own full-text indexing can only be used for InnoDB, MyISAM, and can only be used for full-text retrieval in English, generally using full-text indexing engines (ES, Solr).

  • 注意:主键就是唯一索引,但是唯一索引不一定是主键,唯一索引可以为空,但是空值只能有一个,主键不能为空。

In addition, InnoDB clusters data through the primary key. If the primary key is not defined and the clustered index is not defined, MySql will choose a unique non-empty index instead. If there is no such index, it will implicitly define a 6-byte primary key as a clustered index. , Users cannot view or access.

Simply put:

  1. When the primary key is set, a unique index is automatically generated. If there is no clustered index before , then the primary key is a clustered index.
  2. When the primary key is not set, a unique index that is not empty will be selected as the clustered index. If it has not, an implicit 6-byte index will be generated.

MySql stores data according to pages. The default page is 16kb. When you query, you will not only load a certain piece of data, but load the page where the data is located into the pageCache. This is actually the same as the OS’s nearby access principle. similar.

MySql index uses B+ tree structure. Before talking about the B+ tree, let's talk about the B tree. The B tree is a multi-way balanced search tree. Compared with the ordinary binary tree, it will not be extremely unbalanced, and it is also multi-way.

The characteristic of the B-tree is that it will also store data in non-page child nodes.

Looking at the picture, we can see:

And this feature will cause non-page child nodes to be unable to store a large number of indexes.

The B+ Tree is optimized for this. As shown below:

We see that B+ Tree saves all data data in the leaf nodes, and the non-child nodes only save the index and pointer.

We assume that a non-page child node is 16kb, each index, that is, the primary key is bigint, which is 8b, and the pointer is 8b. Then each page can store about 1000 indexes (16kb/ 8b + 8b).

And how many indexes can a 3-level B+ tree store? As shown below:

Approximately 1 billion indexes can be stored. Generally, the height of the B+ tree is 2-4 levels. Because MySql is running, the root node is resident in memory, so each search only needs about 2-3 times of IO. It can be said that the design of the B+ tree is based on the characteristics of the mechanical disk.

Knowing the design of the index, we can know some other information:

  1. The primary key of MySql cannot be too large. If UUID is used, non-leaf nodes of the B+ tree will be wasted.
  2. The primary key of MySql is best to be self-incremented. If you use UUID, the B+ tree will be adjusted every time you insert, which will cause page splits and seriously affect performance.

So, if sub-databases and tables are used in the project, we usually need a primary key for sharding, what should we do? In terms of implementation, we can keep the self-incrementing primary key, and the logical primary key can be used as a unique index.

2. Locking mechanism

Regarding Mysql locks, various concepts will spew out. In fact, locks have several dimensions. Let's explain them.

1. Type dimension
  • Shared lock (read lock / S lock)

  • Exclusive lock (write lock / X lock)

    Type breakdown:

    • Shared intention lock
    • Intent exclusive (mutually exclusive) lock
  • Pessimistic lock (use lock, ie for update)

  • Optimistic lock (using the version number field, similar to the CAS mechanism, that is, the user controls himself. Disadvantage: When the concurrency is high, there are many useless retries)

2. Lock granularity (granularity dimension)
  • Table lock
  • Page lock (Mysql BerkeleyDB engine)
  • Row lock (InnoDB)
3. Locking algorithm (algorithm dimension)
  • Record Lock (single line record)
  • Gap Lock (gap lock, lock a range, but does not include locked records)
  • Next-Key Lock (Record Lock + Gap Lock, lock a range, and lock the record itself, MySql prevents phantom reading, which is implemented using this lock)
4. Is the default read operation locked?
  • The default is the MVCC mechanism ("consistent non-locking read") to ensure the correctness of isolation at the RR level and is not locked.

You can choose to lock manually: select xxxx for update (exclusive lock); select xxxx lock in share mode (shared lock), which is called "consistent lock read".

After using the lock, you can avoid phantom reads at the RR level. Of course, the default MVCC reading can also avoid phantom reading.

Since RR can prevent phantom reading, then, what use is SERIALIZABLE?

Prevent lost updates. For example:

At this time, we must use the SERIALIZABLE level for serial reading.

Finally, the realization principle of row lock is to lock the clustered index. If you do not hit the index correctly when querying, the MySql optimizer will abandon the row lock and use the table lock.

3. Affairs

Transaction is the eternal topic of database, ACID: Atomicity, consistency, isolation, durability.

Of the four characteristics, the most important is consistency. The consistency is guaranteed by atomicity, isolation, and durability.

  • Atomicity is guaranteed by Undo log. Undo Log saves the record before each change, so that it can be rolled back when an error occurs.
  • Isolation is guaranteed by MVCC and Lock. Said later.
  • Durability is guaranteed by Redo Log. Each time the data is actually modified, the record will be written to the Redo Log. Only when the Redo Log is successfully written will it be written to the B+ tree. If the power is cut off before submission, the record can be restored through the Redo Log.

Then talk about isolation.

Isolation level:

  1. Uncommitted read (RU)
  2. Submitted Read (RC)
  3. Repeatable read (RR)
  4. Serializable

Each level will solve different problems, usually 3 problems: dirty read, non-repeatable read, phantom read. A classic picture:

There is a point to note here. Regarding the phantom reading, in the database specification, the RR level will cause the phantom reading. However, due to the optimization of Mysql, the RR level of MySql will not cause the phantom read: When using the default select, MySql uses the MVCC mechanism It is guaranteed that there will not be phantom reads; you can also use locks. When using locks, such as for update (X lock), lock in share mode (S lock), MySql will use Next-Key Lock to ensure that phantom reads will not occur. The former is called snapshot read, and the latter is called current read.

Principle analysis:

  • Reasons for dirty reads of RU: The principle of RU is to lock the row records of each update statement, rather than lock the entire transaction, so dirty reads will occur. The RC and RR will lock the entire transaction.
  • The reason why RC cannot read repeatedly: RC generates a new Read View every time a SQL statement is executed, and the read view is different each time. The RR transaction uses the same Read View from beginning to end.
  • The reason why RR does not occur phantom reading: I have said it above.

What is the difference between RR and Serializble? Answer: Lost update. The part about locks in this article has already been mentioned.

MVCC introduction: full name multi-version concurrency control.

InnoDB each clustered index has 4 hidden fields, which are the primary key (RowID), the transaction ID of the most recent change (MVCC core), the pointer of Undo Log (isolated core), and the index delete mark (when deleted, it will not be deleted immediately , But mark it and delete it asynchronously);

Essentially, MVCC is realized with Undo Log linked list.

The implementation of MVCC: The transaction modifies the original data in an exclusive lock mode, stores the modified data in the Undo Log, and associates the data with the rollback pointer. If the modification is successful, nothing is done. If the modification fails, the Undo is restored. Data in Log.

One more thing, we usually think that MVCC is similar to optimistic locking, that is, using the version number. In fact, innoDB does not implement this. Of course, this does not affect our use of MySql.

Guess you like

Origin blog.csdn.net/doubututou/article/details/109112309