An article to analyze the three core knowledge points of MySql-index, lock, transaction

1. Index

Indexes, like book catalogs, can be found immediately according to a certain page number in the catalog.

The advantages of indexing: 1. Natural sorting. 2. Quick search .

Disadvantages of the index: 1. Take up space. 2. Reduce the speed of updating the table .

注意点:小表使用全表扫描更快,中大表才使用索引。超级大表索引基本无效

In terms of implementation, indexes are divided into two types: clustered indexes and auxiliary indexes (also called secondary indexes or non-clustered indexes)

From a functional point of view, it is divided into 6 types: ordinary index, unique index, primary key index, composite index, foreign key index, full-text index.

Talk about 6 indexes in detail :

1. Ordinary index: the most basic index, without any constraints.

2. Unique index: similar to ordinary index, but with unique constraints.

3. Primary key index: a special unique index that does not allow null values .

4. Compound index: combine multiple columns to create an index, which can cover multiple columns.

5. Foreign key index: Only InnoDB type tables can use foreign key indexes to ensure data consistency, integrity and cascading operations.

6. Full-text index: The full-text index that comes with MySQL can only be used for InnoDB, MyISAM, and can only perform full-text search in English. Generally, full-text indexing engines (ES, Solr) are used.

  • 注意:主键就是唯一索引,但是唯一索引不一定是主键,唯一索引可以为空,但是空值只能有一个,主键不能为空。

In addition, InnoDB clusters data by primary key. If the primary key is not defined and the clustered index is not defined, MySql will select a unique non-null index instead. If there is no such index, a 6-byte primary key will be implicitly defined as the clustered index , Users cannot view or access.

Simply put:

  1. When setting the primary key, a unique index is automatically generated. If there is no clustered index before , then the primary key is the clustered index.
  2. When the primary key is not set, a non-null unique index will be selected as the clustered index. If it is not already, an implicit 6-byte index will be generated.

MySql stores data according to pages. The default page is 16kb. When you are querying, you will not only load a certain piece of data, but load all the pages where this data is located in pageCache. This is actually the principle of access to the OS. similar.

The MySql index uses a B + tree structure. Before talking about the B + tree, let's talk about the B tree. The B tree is a multi-way balanced search tree. Compared with the ordinary binary tree, it will not be extremely unbalanced, and it is also multi-way.

The characteristic of B-tree is that he will also save the data in non-page subnodes.

You can see from the picture:

This feature can cause non-page child nodes to store a large number of indexes.

The B + Tree is optimized for this. As shown below:

We see that B + Tree saves all data data to leaf nodes, and non-child nodes only save indexes and pointers.

We assume that a non-page child node is 16kb, and each index, that is, the primary key is bigint, that is, 8b, and the pointer is 8b. Then each page can store about 1000 indexes (16kb / 8b + 8b).

And how many indexes can a 3-layer B + tree store? As shown below:

Approximately 1 billion indexes can be stored. Usually the height of the B + tree is at 2-4 layers. Since MySql is running, the root node is resident in memory, so each lookup only needs about 2-3 IOs. It can be said that the design of the B + tree is based on the characteristics of the mechanical disk.

Knowing the design of the index, we can know some additional information:

  1. The primary key of MySql cannot be too large. If UUID is used, it will waste non-leaf nodes of the B + tree.
  2. The primary key of MySql is preferably self-incrementing. If you use UUID, the B + tree will be adjusted every time you insert, which will cause page splitting and seriously affect performance.

So, if sub-database and sub-table are used in the project, we usually need a primary key for sharding, what should we do? In the implementation, we can keep the self-incrementing primary key, and the logical primary key can be used as a unique index.

Second, the lock mechanism

Regarding Mysql locks, various concepts will spew out. In fact, locks have several dimensions, let's explain.

1. Type dimension

  • Shared lock (read lock / S lock)

  • Exclusive lock (write lock / X lock)

    Type breakdown:

    • Intent shared lock
    • Intent exclusive (mutually exclusive) lock
  • Pessimistic locks (using locks, ie for update)

  • Optimistic locking (using the version number field, similar to the CAS mechanism, that is, the user controls it. Disadvantages: when the concurrency is high, there are many useless retries)

2. The granularity of the lock (granularity dimension)

  • Table lock
  • Page lock (Mysql BerkeleyDB engine)
  • Row lock (InnoDB)

3. The lock algorithm (algorithm dimension)

  • Record Lock (single line record)
  • Gap Lock (gap lock, lock a range, but does not include locked records)
  • Next-Key Lock (Record Lock + Gap Lock, lock a range, and lock the record itself, MySql prevents magic reading, is to use this lock to achieve)

4. Is the default read operation locked?

  • The default is the MVCC mechanism ("consistent non-locking read") to ensure the isolation accuracy of the RR level, and it is not locked.

You can choose to manually lock: select xxxx for update (exclusive lock); select xxxx lock in share mode (shared lock), called "consistent lock read".

After using the lock, you can avoid phantom reading at the RR level. Of course, the default MVCC reading can also avoid phantom reading.

Now that RR can prevent phantom reading, what is the use of SERIALIZABLE?

Prevent lost updates. For example, the following figure:

At this time, we must use the SERIALIZABLE level for serial reading.

Finally, the principle of row locks is to lock the clustered index. If you do not hit the index correctly when querying, the MySql optimizer will discard row locks and use table locks.

3. Affairs

Transactions are the eternal topic of the database, ACID: atomicity, consistency, isolation, durability.

The four most important features are consistency. And consistency is guaranteed by atomicity, isolation, and durability.

  • The atomicity is guaranteed by Undo log. Undo Log will save the record before each change, so as to roll back when an error occurs.
  • Isolation is guaranteed by MVCC and Lock. Say it later.
  • Persistence is guaranteed by Redo Log. Every time before the data is actually modified, the record will be written to the Redo Log. Only when the Redo Log is successfully written will it be written to the B + tree. If the power is turned off before submission, the record can be restored through the Redo Log.

Then talk about isolation.

Isolation level :

  1. Unsubmitted Reading (RU)
  2. Read submitted (RC)
  3. Repeatable read (RR)
  4. Serializable (serializable)

Each level will solve different problems, usually 3 problems: dirty read, non-repeatable read, magic read. A classic picture:

There is a note here, regarding the magic read, in the database specification, the RR level will cause the magic read, but, due to the optimization of Mysql, the RR level of MySql will not cause the magic read: When using the default select, MySql uses the MVCC mechanism Guaranteed not to read magic; you can also use locks, when using locks, such as for update (X lock), lock in share mode (S lock), MySql will use Next-Key Lock to ensure that no magic reads will occur. The former is called snapshot read, and the latter is called current read.

Principle analysis :

  • Reasons for RU dirty reading: The principle of RU is to lock the row record of each update statement, rather than locking the entire transaction, so dirty reading will occur. RC and RR will lock the entire transaction.
  • The reason why RC can't read repeatedly: Each time RC executes a SQL statement, it will generate a new Read View, and each read is different. The RR transaction uses the same Read View from beginning to end.
  • Reasons why RR does not happen to phantom reading: As mentioned above.

What is the difference between RR and Serializble? Answer: The update is missing. The lock section of this article has already been mentioned.

MVCC introduction: Full name multi-version concurrency control.

InnoDB each clustered index has 4 hidden fields, namely the primary key (RowID), the recently changed transaction ID (MVCC core), the pointer of Undo Log (isolated core), and the index deletion mark (when deleted, it will not be deleted immediately) , But mark, and then delete asynchronously);

In essence, MVCC is implemented using Undo Log linked list.

The implementation of MVCC: the transaction modifies the original data in an exclusive lock, stores the data before modification in the Undo Log, and associates it with the data through the rollback pointer. If the modification is successful, nothing is done. If the modification fails, Undo is restored. The data in the log.

To say one more thing, we usually think of MVCC as a way of optimistic locking, that is, using the version number, but in fact, innoDB is not implemented this way. Of course, this does not affect our use of MySql.


Make progress together, learn and share

Everyone is welcome to pay attention to my public account [the wind and waves are quiet and quiet ], a large number of Java related articles, learning materials will be updated in it, and the collated information will also be placed in it.

If you think it's good to write, just like it and add attention! Pay attention, don't get lost, keep updating! ! !

Guess you like

Origin juejin.im/post/5e96b3e3518825737f1a7415