MySQL~Why is it recommended to add id as the primary key

Page concept

In a computer, whether it is memory or disk, the operating system reads according to the size of the page (the page size is usually 4 kb), and the disk will be pre-read every time the disk is read, and continuous data will be read into the memory in advance , This avoids multiple IOs. This is the well-known locality principle in computers, that is, I use a piece of data. It is very likely that the data near this piece of data will also be used. Simply load it together, saving multiple IO drags. Slow speed. How big is this continuous data? It must be an integer multiple of the operating system page size.

Therefore, the default value of MySQL page is 16 KB, which means that for the nodes of the B+ tree, it is best to set the page size (16 KB), so that a node on the B+ tree will only have one IO read.

Then someone will ask, is the page size bigger, the better? Set a larger value, the more data the node can hold, the smaller the tree height, the smaller the IO, shouldn’t the page size be smaller? The bigger the better, InnoDB manages the page data read from the disk through a pool buffer in memory. If the page size is too large, the buffer pool will fill up quickly, which may cause pages to be swapped in and out frequently between memory and disk, affecting performance.

Through the above analysis, I believe it is not difficult for us to guess how to set N in the N-ary tree, as long as the size of each node is equal to the size of a page (16kb) when selecting.

Now let’s take a look at the question at the beginning, why is it recommended to increase id as the primary key?The first reason is that the auto-increment primary key is generally set to int, which occupies 4 bytes, so it can be said that InnoDB's B+ tree has a maximum of 4 forks.

The second reason is the following example.
Someone might say that the ID of the user is unique and can be used as the primary key. If the ID is used as the primary key, what will be the problem.

Page split and page merge

In order to maintain the orderliness of the index, the B+ tree updates the index every time a record is inserted or updated. Suppose the original B+ tree indexed based on ID card is as follows (assuming it is a binary tree, only the first four digits of ID card are listed in the figure)
Insert picture description here
Now there is a record corresponding to ID card starting with 3604 inserted into db, and the index needs to be updated at this time. If you update in order, obviously the 3604 ID number should be inserted after the left node 3504 (as shown in the figure below, assuming a binary tree)

Insert picture description here
If the ID number of 3604 is inserted after 3504, the number of elements in this node will be 3, which obviously does not meet the conditions of the binary tree. At this time, the page will be split. You need to adjust this node to make it conform to the binary tree. condition
Insert picture description here
adjustment resulting from this due to a page split will inevitably lead to performance degradation, especially in ID as the primary key, then because of the randomness of ID cards, will inevitably lead to insert a large number of nodes in random, thus resulting in a large number of pages split, This results in a sharp drop in performance. If the auto-increment id is used as the primary key, since the id generated in the newly inserted table is larger than all the values ​​in the index, it must be merged into an existing node (the number of elements is not Full), or put it into the newly created node (as shown below), so if you use auto-incrementing id as the primary key, there will be no page splitting problem, recommended! If
Insert picture description here
there is a page split, there must be a page merge, when will it happen Page merging, when deleting table records, the index must also be deleted, and page merging may occur at this time. As shown in the figure,
Insert picture description here
when we delete the row corresponding to id 7 and 9, the index in the figure above will be updated. Delete 7, 9 and 8, 10 should be combined into one node at this time, otherwise 8, 10 will be scattered on two nodes, which may cause two IO reads, which will inevitably affect the search efficiency!

When will page merge occur? We can set a threshold. For example, for N-ary tree, when the number of nodes is less than N/2, it should be merged with nearby nodes, but it should be noted that the merged node The size of the elements in may exceed N, causing page splits, and the parent node needs to be adjusted to meet the conditions of the N-ary tree.

In fact, Innodb's primary key index, non-leaf nodes only store the index value, and only store the row record in the last row, which greatly reduces the size of the index, and as long as the index value is found, the row record is found, which is also improved effectiveness,

This kind of index that stores an entire row of records on a leaf node is called a clustered index, and the others are called a non-clustered index.

Guess you like

Origin blog.csdn.net/Shangxingya/article/details/114857636