Table organization and index organization in Sql Server (clustered index structure, non-clustered index structure, heap structure)

Text
SqlServer uses three methods to organize data or index pages in its partitions:

1. Clustered index structure

The clustered index is organized according to the B-tree structure, and each page in the B-tree is called an index node. Each index row contains a key value and a pointer. The pointer points to an intermediate-level page on the B-tree (for example, the root node points to an index page in an intermediate-level node) or a data row in a leaf-level index (for example, an index row in an intermediate-level index page points to a leaf node). data page). Pages at each level of the index are linked in a doubly linked list. The pages and rows in the data chain will be sorted according to the clustered index key value. The clustered index ensures that the data in the table is arranged in the order of the index rows;
Supplement (PS: 2012-7-9)
As can be seen from the above figure, the leaves of the clustered index Nodes are composed of data pages, and all data in the table are included in the leaf nodes of the clustered index. This is why the previous blog mentioned "If the index is a clustered index then an index scan is really a table scan."

Supplement (PS: 2012-7-13)
Today I suddenly understood why clustered indexes carry real data. This is because the data itself is also part of the index. The data content itself is arranged according to a rule, then the arrangement rule + data form a clustered index.

For example:
The text of the Chinese dictionary itself is a clustered index. For example, if we want to look up the word "安", we will naturally open the first few pages of the dictionary, because the pinyin of "安" is "an", and the dictionary of Chinese characters sorted according to pinyin starts with the English letter "a" and If it ends with "z", then the word "安" will naturally be ranked at the front of the dictionary. If you still can't find the word after reading all the parts starting with "a", it means that the word is not in your dictionary; similarly, if you look up the word "Zhang", you will also look up the word "Zhang" in your dictionary. The last part is because the pinyin of "Zhang" is "zhang". In other words, the main text part of the dictionary itself is a directory, and you do not need to search other directories to find what you are looking for. If the text content itself is a directory arranged according to certain rules, it is called a "clustered index".
Insert image description here

2. Heap structure

The heap is a table without a clustered index, and the pages of the heap are linked together using "Index Allocation Map (IAM)" pages. As shown in the figure below,
the data pages and rows in the heap are not in any specific order; the pages are not linked together. The only logical connection between the data pages is the information recorded in the IAM page. There is no close connection between the pages. Contact; uses the IAM page to find each page in the collection of data pages. From the perspective of data storage management, it is difficult to use a heap to manage an extremely large table. Clustered indexes are established on frequently used tables.

By default, SQL Server creates a clustered index on the primary key. It means that your data can be physically sorted by ID in the database, but this does not make much sense. The advantages of the clustered index are obvious, and there can only be one clustered index rule in each table, which makes the clustered index become Got to be more precious. Because ID numbers are rarely used for queries, it becomes a waste of resources to use the primary key of the ID number as a clustered index.

3. Non-clustered index

Non-clustered indexes and clustered indexes have the same B-tree structure, but there are still significant differences between them, mainly in the following three points:

Nonclustered indexes do not affect the order of data rows.
The data rows of the basic table are not sorted and stored in the order of non-clustered keys. The
leaf layer of the non-clustered index is composed of index pages rather than data pages. The non-clustered index will not change or improve the storage mode of the data pages.
You can use a clustered index to define a nonclustered index for a table or view, or you can define a nonclustered index based on the heap. Each index row in a nonclustered index contains a nonclustered key value and a row locator. This locator points to the data row in the clustered index or heap that contains the key value.

The row locator in a nonclustered index row can be a pointer to the row or the row's clustered index key, depending on the following conditions:

If the table is a heap (meaning the table does not have a clustered index), the row locator is a pointer to the row. The pointer is generated from the file identifier (ID), the page number, and the number of lines on the page. The entire pointer is called the row ID (RID).
If the table has a clustered index or there is a clustered index on an indexed view, the row locator is the row's clustered index key. If the clustered index is not a unique index, SQL Server adds an internally generated value (called a unique value) to make all duplicate keys unique. This four-byte value is not visible to the user. Add this value only if you need to make the clustered key unique for use in a nonclustered index. SQL Server retrieves rows of data by searching the clustered index using the clustered index keys stored within the leaf rows of the nonclustered index.

Supplement (PS: 2012-7-9)
As can be seen from the above figure, the leaf nodes of the non-clustered index are composed of index pages. The format of each index row in the index page is in the form of "index key value + pointer". The index key value is a column in our table. If it is a composite index, the index key value is multiple columns. The specific pointing of the pointer depends on the organizational structure of the table. If a clustered index already exists in this table, then the pointer points to the clustered index. If there is no clustered index added to the table, then this table has an unordered heap structure. The pointer points to the location of each record in the table. Therefore, there is a one-to-one correspondence between the index row and the data row. If the column queried after select and the condition column after where in a query are both in the index, then it is index coverage. At this time, there is no need to search through the pointer of the index row. Data page, just return the content in the index page directly.

Example
Insert image description here

The difference between clustered index and non-clustered index

One of the main ways to distinguish between a clustered index and a non-clustered index is to look at the leaf nodes. If the leaf nodes are real data, then it is a clustered index; if the leaf nodes are pointers, then it is a non-clustered index.

If a non-clustered index is used in a table with a clustered index, then the non-clustered index leaf node points to the location of the clustered index. If there is no clustered index, then it points to the rowid of the data page. This means that it is unordered. Also called a heap table.

An example to illustrate the relationship between clustered index and non-clustered index (ps: 2012-7-17)
As shown in the figure below, we need to find the Last Name of a person whose First Name is Anson. We created a non-clustered index on the First Name field. A clustered index is created on the employeeID column. Then our query step is to search for Anson through the non-clustered index, then find the key value 7 of the clustered index on the leaf node of the non-clustered index, and then use this key value 7 to find the clustered index. However, the real data is stored in the leaf nodes of the clustered index, so we found that Anson's Last Name is Kim in the leaf nodes of the clustered index.

This is what we mentioned earlier. The leaf nodes of the clustered index are real data, while the leaf nodes of the non-clustered index are a bookmark. This bookmark may be two situations. If there is a clustered index in the table, then this bookmark is a clustered index. The key value (we often say it points to the clustered index, more accurately it should be the key value of the clustered index, and then use this key value to directly find the data rows we need in the clustered index). If there is no clustered index, then this bookmark It is row identifier (RID, row identifier), and the format is "File#:Page#:Slot#".

Guess you like

Origin blog.csdn.net/kalvin_y_liu/article/details/118601729