9.1 Overview of -1 index

SQL database development -TSQL- Chapter IX index

9.1 Overview Index

1 is an index on a table view associated with the disk or in memory, or structure, can speed up table or view from the retrieved rows. It comprises an index or a table or view column generation key. For the index on disk, these keys are stored in a structure (B tree) in the SQL Server to quickly and efficiently find the row associated with the key value.

https://gss3.bdstatic.com/-Po3dSag_xI4khGkpoWK1HF6hhy/baike/c0%3Dbaike80%2C5%2C5%2C80%2C26/sign=35678d408794a4c21e2eef796f9d70b0/4e4a20a4462309f773bdc15d720e0cf3d6cad6ab.jpg

2 to index logically organized as a table comprising rows and columns for storing data; a data format in a row (referred to as a storage line), or to store data on a physical data format by column (called a column storage).

 

What index is 3: indexes in the database directory is similar to a book, use the table of contents in a book you can quickly find the information you want, without having to read the whole book. In a database, the program can be re-used index to the data table ah, without scanning the entire table. Contents of the book page is a list of words and each word is located, the database index is a list of values ​​in the table as well as the value of the storage location.

Pros and cons of 4 indexes: most of the query execution cost is the I / O, a major goal of using the index to improve performance is to avoid full table scan, because a full table scan needs to read each page table data from the disk, if there is an index points to data values, the query needs to read only a few times a disk on the line. So the rational use of the index can accelerate query data.

5 index does not always improve system performance, indexed tables require more storage space in the database, the same command is used to add or delete data, run time and maintenance time required to process the index will be longer. So we have to the rational use of the index, to update the removal of sub-optimal index.

9.1.1 Glossary

 1 the basic structure of the data

When a new table is created, the system will assign some to 8K units of contiguous space on the disk, when the field's value is written to disk from memory, it is saved in the randomly given space, when a 8K run out of time , the database will automatically allocate space for a pointer to 8K. Here, each 8K space is called a data page (Page), also known as page or data page, and the page number assigned from 0-7, p 0 record each file guide information, call the file header (File header ); each combination of eight data pages (64K) is formed extended area (Extent), referred to as extended. Combination of all data pages form a stack (Heap).

SQL can not cross the line defined data pages, so the maximum amount of data per row can only be recorded as 8K. This is both char and string types varchar capacity is limited to less than 8K reason, store more than 8K of data should use the text type, in fact, the value field of type text can not be directly entered and saved, it merely stores a pointer, point to the extension area by the text data pages consisting of a number of 8K, the real data is the data on these pages. 

There are pages and pages on spatial data page of.  

When the eight pages of data in an extended region contains both the space and the page data or index page includes a called hybrid expansion (Mixed Extent), begins with mixing each table extension; otherwise, consistent called extension (Uniform Extent), specifically to save the data and index information. 

When the table is created, SQL assigned at least one data page in the mixed extension, with the amount of data grows, SQLS can be instantly assigned a seven pages in mixed extents when more than eight pages of data, from the same extended allocated data page.  

2 page splitting

Half of the data will remain in the old page, while the other half will be placed in a new page and a new page may be assigned to any available page. Therefore, frequent page split, the consequences are serious, physical table will generate large amounts of data fragmentation, resulting in a direct result of a sharp decline in I / O efficiency, and finally, stop the run and rebuild the SQL index will be our only choice!

3 fill factor

A characteristic index, a defined amount of available space on each page of the index. The FILLFACTOR (fill factor) after the adaptation extension table data and reduces the likelihood of page splits. The fill factor is a percentage value from 0 to 100, while 100 shows a data page to fill. Only when no changes to the data (e.g. read-only table) was used for this setting. The smaller the value, the greater the free space on the data page, thus reducing the need to split pages in the index growth process, but this may take up more hard disk space. Fill factor specified incorrectly, will decrease the reading performance of the database, which reduces the amount of the set value is inversely proportional to the fill factor.

9.1.2 heap (a table without a clustered index)

The heap is a table without a clustered index. You can create one or more non-clustered index on the table stored as a heap. The data stored in the stack and without a specified order. Typically, data is stored in the order of the first row into the table, but the database engine may be moved in the heap four data, in order to efficiently store row; therefore, the data sequence can not be predicted. To ensure order of rows from the heap returned, you must use the ORDER BY clause. 

1 When to use heap

If a table is a heap and does not have any non-clustered index, you must check the entire table (table scan) in order to find any rows. This is when the table is small (for example, a list of 12 regional offices of the company) is acceptable.

2 When not to use the heap

When data is often return to order after the sort, do not use the heap. Clustered index on the sort column sorting operation can be avoided.

When data are often grouped together, do not use the heap. Data must be sorted before it can be grouped and clustered index on the sort column sorting operation can be avoided.

When the scope of frequently query data from a table, do not use the heap. Clustered index on the scope of the column to avoid the whole heap sort.

When the non-clustered index does not exist and the table is relatively large, do not use the heap. In a heap, to find any line, you must read all the lines of the heap.

3 Management heap

To create a heap, create a table without a clustered index. If the table has a clustered index, the clustered index is deleted in order to return to the table in a pile.

To remove the pile, please create a clustered index on the heap.

To regenerate the piles in order to reclaim wasted space, the heap is created in a clustered index, and then delete the clustered index.

 

 

9.2 Index principle involved

The vast majority of people know the benefits of indexing, but very few people deeply feel its disadvantages. I always think so, for SQL Server in the index (or even all database management systems, and even then most of worldly things) concerned, and there is no absolute good or absolute bad. Just like to give you an ax and a blade, so you cut a tree and cut a piece of paper, you have to cut down trees with a blade, ax cutter, blades and axes and said bad things are, obviously unreasonable (this not a piece is not fast thinking, do not tangle too much).

For OLTP database, a single table index should not be too much. According to practical experience:

For the core table (Common Table): not more than six indexes all.

For ordinary table: not all indexes more than four.

For small tables: all indexes not more than two.

 

When designing the index, should consider the following

9.2.1 database guidelines:

1 pair of table prepared mass index affects INSERT, UPDATE, DELETE, and performance of the MERGE statement, because when the data changes to the table, all indexes must be adjusted appropriately. For example, if a plurality of indexes in a column, and the column data modification performed UPDATE statement must be updated to include the column and the base of each index base table (heap or clustered index) of the column .

 

2 to avoid frequent updates of the table too many indexes, and the index should be kept narrow , that is, the column you want as little as possible .

3 using multiple index update can improve small and large volumes of data query performance. A large number of indexes can not modify the data query (such as SELECT statements) performance because the query optimizer has more indexes to choose from, which can determine the fastest access method.

Four pairs of small table index may not produce optimized results, because the query optimizer when traversing an index for searching the data, the time it takes longer than might perform a simple table scan. Therefore, the index of small tables may never use, but still must be maintained when the data changes to the table.

9.2.2 Query Notes

When designing the index, consider the following query guidelines:

1 Create a non-clustered index to a column in a query predicates and join conditions are often provided. These are your SARGable1 column. However, you should avoid adding unnecessary columns. Adding too many index columns of disk space and may adversely affect the performance index maintenance.

2 covering index can improve query performance because all the data match the query are present in the index itself. In other words, only the index pages, without the need for a table or clustered index data pages to retrieve the required data, thus reducing the overall disk I / O. For example, on a table (column wherein a, b and c to create a composite index) columns a and b of the query, only the specified data can be retrieved from the index itself.

 important

A covering index is designated for non-clustered index, which directly resolve one or several similar results, without accessing the base table, and will not lead to lookup. Such indexes have all the necessary non-SARGable listed on the leaf level. This means, is covered by a SELECT clause and all parameters are returned WHERE and JOIN column index. When the table itself compared to the rows and columns, if the index is narrow enough, then the execution of the query I / O may be much less, which means it is a true subset of the total column. If a selected fraction of large tables, consider covering index, in which a small portion is defined by a fixed predicate, such as a sparse columns, for example, it contains only a few non-NULL value.

 

3 query to insert or modify as many rows within a single written statement instead of multiple queries to update the same row. Use only one statement, you can use an optimized index maintenance.

4 evaluate the query type and how columns in a query. For example, using the exact match query on the column type suitable for use in a non-clustered index or aggregate index.

 "SARGable" word in the middle of a relational database is a demonstrable search predicates that can use an index to speed up the process of executing the query.

9.2.3 Notes column

When designing the index, should consider the following guidelines:

For an aggregate index, index key length is kept short. In addition, a unique column or columns of air can create a clustered index clustered index benefit.

2 can not be specified ntext, text, image, varchar (max), nvarchar (max) and varbinary (max) data type as an index key column. However, varchar (max), nvarchar (max), varbinary (max) and the xml data type of the column as the non-involvement of non-aggregated key index column index. For more information, please refer to the index with included columns in this guide.

3 xml data type column can be used as the index key column in the XML. For more information, see XML index (SQL Server). SQL Server 2012 SP1 introduces a new XML index called selective XML index. This new index can improve query performance in SQL Server for data stored as XML, so to speed up the indexing of large XML data workloads and improve scalability by reducing storage costs index itself. For more information, see XML selectivity index (SXI).

4 Check the uniqueness of the column. In a unique index rather than non-unique index on the same combination of columns provides additional information about the query optimizer index more useful.

5 Check data distribution in the column. Typically, the column that contains few unique values ​​to create an index or performing a join on such a column would result in long-running queries. This is the basic problem of data and queries, usually do not recognize this situation can not solve these problems. For example, if the physical telephone directory sorted alphabetically last name, and everyone in the city are named Smith or Jones, you can not quickly find someone. For more information about data distribution, see the statistics.

6 considerations (e.g. row, sparse columns, the value is NULL most column, the column containing the value of various types and with different ranges of values) of a column having a well-defined subset using a filtered index. A well-designed filtered index can improve query performance, reduce index maintenance costs and storage costs.

7 if the index includes a plurality of columns, column order should be considered. Is equal to (=), greater than (>), less than (<), or BETWEEN search condition in the WHERE clause join or participate in the column should be placed first. Other column should be sorted based on their distinct levels, that is, from the least to the most duplicate column duplicate column.

For example, if the index is defined as LastName, FirstName, the index search as WHERE LastName = 'Smith' or WHERE LastName = Smith AND FirstName LIKE when 'J%' would be useful. However, the query optimizer does not query this index is used based on FirstName (WHERE FirstName = 'Jane') and search.

9.3 Classification Index

1 SQL SERVER There are many types of indexes.

Storage structure according to distinguish: "clustered index (also called clustering index, cluster index)", "sub-clustered index (non-clustered index, non-clustered index)"

Data distinguished by uniqueness: "unique index", "non-unique index."

Non-clustered index table is full or filtered, using WHERE filtered index.

Non-clustered index contains columns, INCLUDE.

The number of key columns distinguished: "a separate index", "multi-column index."

There columnstore, hash, memory optimization non-clustered index.

Published 37 original articles · won praise 0 · Views 2419

Guess you like

Origin blog.csdn.net/syjhct/article/details/86652666
9.1