SQL Server Index

SQL Server Index Resolution

https://www.cnblogs.com/michaeldonghan/p/index001.html

Full text chapters:

1. Clustered index and non-clustered index
2. Structure of
index 3. Index contains column and bookmark lookup
1. Clustered index and non-clustered index
Index is divided into clustered index and non-clustered index
1) Clustered index: The data of the table is stored in the data In the page (the PageType of the data page is marked as 1), one page of SqlServer is 8k, and the next page will be opened for storage when one page is full. If the table has a clustered index, then a piece of data is stored in the page in ascending/descending order of the size of the clustered index field. When updating a clustered index field or inserting/deleting data in between, it will cause the table data to move because it maintains ascending/descending order.
Note that the primary key is only a clustered index by default, it can also be set to a non-clustered index, or it can be set to a clustered index on a non-primary key field, and the entire table can only have one clustered index.
An excellent clustered index field generally contains the following 4 characteristics:
(A). Self-growth
Always add records at the end, reducing paging and index fragmentation.
(B). Not changed
Reduce data movement.
(C). Uniqueness
Uniqueness is the most desirable characteristic of any index, which can clarify the position of the index key value in the sorting.
More importantly, if the index key is unique, it can correctly point to the RID of the source data row in each record. If the key value of the clustered index is not unique, SqlServer needs to generate the uniquifier column combination internally as the clustering key to ensure the uniqueness of the "key value"; if the key value of the non-clustered index is not unique, the RID column (clustered index key or heap table) will be added. row pointer) guarantees "key-value" uniqueness.
Thinking: The index "key value" is also guaranteed to be unique in non-leaf nodes. The reason should be to clarify the position of index records in non-leaf nodes. For example, there is a non-clustered index field Name2, there are many records of Name2='a' in the table, resulting in multiple index records (nodes) of Name2='a' on non-leaf nodes, then insert another Name2=' When there is a record of a', you can quickly determine which index record (node) to insert according to the RID of the non-leaf node and the RID of the newly added record. If there is no RID of the non-leaf node, you have to traverse all Name2=' Only the leaf nodes of a' can determine the position. In addition, when we select * from Table1 where Name2<='a', the returned data is sorted by the non-clustered index Name2 and RID. It is well understood that the returned data is sorted by the order stored in the index here. This is the result of using the Name2 index in this sql query. If the database query plan selects direct table data scan due to the "critical point" problem, the returned data will be sorted in the order of the table data by default.
For "key value" uniqueness, for clustered indexes, the uniquifier column is only incremented if the index value is duplicated. For a non-clustered index, if the uniqueness is not defined when the index is created, the RID will be added to all records, even if the index value is unique; if the uniqueness is defined when the index is created, the RID will only be added at the leaf layer, which is used to find source data rows, that is, bookmarks Find action.
(D). Small field length The smaller
the length of the clustered index key, the more index records can be accommodated in one index page, thereby reducing the depth of the index B-tree structure. For example, a million-record table with an int clustered index may only require a 3-level B-tree structure. If the clustered index is defined in a wider column (such as the uniqueidentifier column requires 16 bytes), then the depth of the index will increase to 4 levels. Any clustered index lookup requires 4 I/O operations (4 logical reads to be exact), compared to 3 I/O operations previously.
Similarly, the non-clustered index will contain the key value of the clustered index. The smaller the length of the clustered index key, the smaller the non-clustered index records, and one index page can hold more index records.
2) Non-clustered index: It is also stored in pages (pages marked with PageType 2 are called index pages). For example, if table T establishes a non-clustered index Index_A, then if table T has 100 pieces of data, then the index Index_A also has 100 pieces of data (to be precise, 100 pieces of leaf node data, the index is a B-tree structure, if the height of the tree is If it is greater than 0, then there is root node page or intermediate node page data, then there are more than 100 index data), if table T also has a non-clustered index Index_B, then Index_B is also at least 100 data, so the more indexes are built, the higher the cost. Big.
Updating an index field, inserting a piece of data, or deleting a piece of data will cause the maintenance of the index to have a certain impact on performance. The performance impact is different in different situations. For example, when you have a clustered index, the inserted data is all at the end, which hardly causes data movement and has little impact; if the inserted data is in the middle, it will generally cause data movement, and may cause paging and page Fragmentation, the impact will be slightly larger (if the inserted intermediate page has enough remaining space to accommodate the inserted data, and the position is at the end of the page, it will not cause data movement)
2. The structure
of the index says that the index of SqlServer is B-tree structure (assuming you have a certain understanding of B-tree structure here), what does it look like? You can use Sql statements to view its logical
execution: DBCC IND(Test,OrderBo,-1) -- Among them, the OrderBo table of the Test library has 10,000 pieces of data, and there is a clustered index Id primary key field
. Results:

As shown in the figure above, you can see an index page 2112 with IndexLevel=2 (here it is the root node of the B tree. There is only one root page as the access entry point of the tree structure), indicating that there must be an index page with IndexLevel=1 and IndexLevel= 0 pages. Since this is the index page of the clustered index, when the leaf page of IndexLevel=0 is the data page, it stores a sum of physical data. As you can see from the above figure, the PageType of the row with IndexLevel=0 is equal to 1. It represents the data page (and if it is a non-clustered index, the leaf page of IndexLevel=0, PageType is equal to 2, it is still an index page).

Similarly, we use the Sql command DBCC PAGE to take a look--
DBCC TRACEON(3604,-1)
DBCC PAGE(Test,1,2112,3)--root node 2112, we can find out its two child nodes 2280 and 2448
DBCC PAGE(Test,1,2280,3)
DBCC PAGE(Test,1,2448,3)

As shown in the figure above, page 2112 of IndexLevel=2 has two sub-nodes 2280 and 2448 of IndexLevel=1, and there are sub-nodes under the sub-nodes. Each node is responsible for a different range of index key values ​​(that is, "Id(key )" field, the first line value is Null, indicating the minimum value or the maximum value in reverse order). Now I understand that IndexLevel is actually the height in the B-tree structure.
When SqlServer searches for a record in the index, it finds the leaf node from the root node down, because all data addresses have leaf nodes, which is actually one of the characteristics of the B+ tree (the characteristic of the B tree is that if the searched value is in If you find a non-leaf node, you can return directly. Obviously, SqlServer does not do this. To verify this, you can set statistics io on to turn on the statistics, and then select to see the number of logical reads). Since the leaf node must be found, the inclusion column only needs to be recorded in the leaf node, that is, the non-leaf node does not record the inclusion column.
The feature of B+ tree (all data addresses have leaf nodes) is also conducive to the query between value1 and value2, as long as you find value1 and value2 (at the leaf nodes), and then string them together is the desired result.
The SqlServer index structure is more like a B+ tree, and is ultimately a mixed version of a B tree and a B+ tree. The data structure is determined by humans, not necessarily a pure B tree or a pure B+ tree.
3. Index contains column and bookmark search
Speaking of index, here is another "index contains column" function that SqlServer2005 began to add, which is very practical.
For example, when querying data in a large report, the where condition uses the index field Name2, but the field to be selected is Name1. At this time, you can use "Index contains column" to include Name1 in the index field Name2
Syntax: Create [UNIQUE] Nonclustered/Clustered Index IndexName On dbo.Table1(Name2) Include(Name1);
Still use the DBCC PAGE command to view a non-clustered index with index data containing columns:

As can be seen from the above figure, the included column Name1 is also stored in the index data. Therefore, when the database uses the index field Name2 to locate a row to be searched, it can directly return the value of Name1 without locating the value in the data page according to the RID, which reduces bookmark search. When the query only returns one piece of data, and there is only one bookmark search, of course nothing. If the data returned by the query is very large, each time it is necessary to go to the data page to find the data and take it out. 1000 searches are 1000 bookmark searches, and it is conceivable that the performance is greatly improved. Discount, at this time, the value of "index contains column" is greatly reflected. Regarding a bookmark search, when the table has a clustered index (such as Id), it is similar to execute a select Name1 from Table1 where Id=1, and use the clustered index key Id to search (the search method is the B-tree structure search of the index Id), and if the table If there is no clustered index, it is based on the data row pointer (composed of "file number 2byte: page number 4byte: slot number 2byte"). Clustered index keys and row pointers are generally collectively referred to as RID (Row ID) pointers. From this we can think that if your table does not have a good clustered index field, it is better to consider the self-increasing Id field as the primary key of the clustered index (redundant Id fields are also available), which is in line with self-increasing, unchanged, unique , the characteristics of small length, is a good choice for clustered index. The self-increasing Id is applicable in most cases, and it depends on the specific needs in special cases. There is also a defect to be considered for the self-increasing Id. When a large amount of data is inserted into the table concurrently, it is conceivable that each thread wants to insert to the end page, and competition and waiting will occur. To solve this situation, you can use a uniqueidentifier type field (16 bytes, which I do not recommend) or hash partitioning (that is, a table is divided into multiple tables, and it is normal to separate databases and tables in big data processing). But I recommend optimizing your insert efficiency first (insert performance itself is very fast), and test whether the number of concurrent inserts per second meets the production environment, so as to retain the simple, stable and efficient self-increasing Id method.
The self-increasing Id is not necessarily the self-increasing provided by the database. You can also write your own algorithm to generate a unique Id (in this case, the general length is bitint, 8-byte integer), which is suitable for the scenario of distribution. In the master-slave replication mode database, the Id field is required to not make mistakes (in the general mode of master-slave replication, the Id of the master database is increased by the master database, and the ID of the slave database is also increased by the slave database itself. When the master-slave replication is out of sync due to deadlock and other reasons, the Id of the slave library and the Id of the master library are self-increasing and the numbers will not match). If the self-increasing Id is a redundant primary key, then the master-slave library Id has no effect on the number.
The last column "Row Size" in the above figure also tells us that the size of the index column or index containing column should not be too long, otherwise one page cannot hold several records, which greatly increases the number of index pages, and the space occupied by the index data is also large. increased.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325947992&siteId=291194637