(Transfer) Clustered Index and Non-Clustered Index

Official statement:

 

clustered index

  An index in which the logical order of key values ​​determines the physical order of corresponding rows in a table.
  A clustered index determines the physical order of data in a table. A clustered index is similar to a phone book, which arranges data by last name. Because the clustered index specifies the physical storage order of data in the table, a table can only contain one clustered index. But the index can contain multiple columns (composite index), just like a phone book is organized by first and last name.
    
     Clustered indexes are particularly effective for columns where range values ​​are frequently searched. Using a clustered index to find the row containing the first value ensures that rows containing subsequent index values ​​are physically adjacent. For example, if an application executes a query that frequently retrieves records within a certain date range, a clustered index can quickly find the row that contains the start date, and then retrieve all adjacent rows in the table until the end date is reached. This can help improve the performance of such queries. Likewise, if a column is frequently used when sorting data retrieved from a table, the table can be aggregated (physically sorted) on that column, saving costs by avoiding sorting each time the column is queried.
    

     Using a clustered index to find specific rows is also efficient when the index value is unique. For example, the fastest way to find a specific employee using the unique employee ID column emp_id is to create a clustered index or PRIMARY KEY constraint on the emp_id column.

 

 

 

nonclustered index

  An index in which the logical order of the indexes differs from the physical storage order of the disk row.

 

 

The index is described by the data structure of the binary tree. We can understand the clustered index in this way: the leaf nodes of the index are the data nodes. The leaf node of a non-clustered index is still an index node, but there is a pointer to the corresponding data block. As shown below:

 

 

 

                                      (non-clustered index)

 

 

 

 

 

 

 

                                      (clustered index)

 

 

    1. Understand the index structure in simple terms

      In fact, you can understand the index as a special kind of directory. Microsoft's SQL SERVER provides two kinds of indexes: clustered index (clustered index, also known as clustered index, clustered index) and non-clustered index (nonclustered index, also known as non-clustered index, non-clustered index). Below, we illustrate the difference between a clustered index and a non-clustered index:
      In fact, the body of our Chinese dictionary itself is a clustered index. For example, if we want to look up the word "An", we will naturally turn to the first few pages of the dictionary, because the pinyin of "An" is "an", and the dictionary that sorts Chinese characters according to the pinyin starts with the English letter "a" and ends with If it ends with "z", then the word "An" is naturally placed at the front of the dictionary. If you look up all the parts starting with "a" and still can't find this word, then it means that this word is not in your dictionary; similarly, if you look up the word "zhang", you will also turn your dictionary to The last part, because the pinyin of "Zhang" is "zhang". That is, the body part of the dictionary is itself a directory, and you don't need to look up other directories to find what you're looking for. We call this kind of content itself a kind of directory arranged according to certain rules as "clustered index".
      If you know a word, you can quickly look it up from Auto. But you may also come across a word you don't know and don't know its pronunciation. At this time, you can't find the word you are looking for according to the method just now, but you need to find the word you are looking for according to the "radical". word, and then directly turn to a page according to the page number after the word to find the word you are looking for. However, the sorting of the characters you find by combining the "Radical Catalog" and the "Character Checklist" is not the real text sorting method. For example, if you check the character "Zhang", we can see the check characters after checking the radicals. The page number of "Zhang" in the table is 672 pages. In the word check table, the word "Chi" is above the word "Zhang", but the page number is 63 pages. The word "crossbow" is below "Zhang", and the page is 390 pages. Obviously, these words are not really located above and below the word "Zhang". Now the continuous three words "Chi, Zhang, and Nu" you see are actually their sorting in the non-clustered index, which is the dictionary text. Mapping of words in nonclustered indexes. We can find the word you need this way, but it requires two processes, first find the results in the table of contents, and then turn to the page number you need. We call this kind of directory purely a directory, and the sorting method of the text is purely text is called "non-clustered index".
      Through the above examples, we can understand what is "clustered index" and "non-clustered index". By extension, we can easily understand: each table can only have one clustered index, because the catalog can only be sorted according to one method.

    2. When to use a clustered index or a non-clustered index

The following table summarizes when to use a clustered or nonclustered index (important):

 

Action description Use a clustered index Use a nonclustered index
Columns are often grouped and sorted answer answer
return data in a range answer should not
one or very few distinct values should not should not
small number of distinct values answer should not
large number of distinct values should not answer
Frequently updated columns should not answer
foreign key column answer answer
primary key column answer answer
Frequently modify index columns should not answer



      In fact, we can understand the above table through the example of the definition of clustered index and non-clustered index above. Such as: return a data item within a certain range. For example, one of your tables has a time column, and you just set up the aggregated index on this column. When you query all the data between January 1, 2004 and October 1, 2004, the speed will be is very fast, because your dictionary body is sorted by date, clustered indexes only need to find the beginning and end of all the data to be retrieved; unlike non-clustered indexes, you must first look up The page number corresponding to each item of data is found in the catalog, and then the specific content is found according to the page number.       3. Combined with the actual situation, the purpose of the theory of

    misunderstanding about the use of indexes is to apply.

Although we just listed when to use a clustered or non-clustered index, in practice the above rules are easily overlooked or cannot be comprehensively analyzed according to the actual situation. Next, we will talk about the misunderstandings of index use according to the actual problems encountered in practice, so that you can master the method of index establishment.       1. The idea that the

    primary key is the clustered index is extremely wrong and a waste of the clustered index.
Although SQL SERVER defaults to create a clustered index on the primary key.
      Usually, we will create an ID column in each table to distinguish each piece of data, and this ID column is automatically increased, and the step size is generally 1. This is the case for the column Gid in our example of office automation. At this point, if we set this column as the primary key, SQL SERVER will default to this column as a clustered index. This has the benefit of having your data physically sorted by ID in the database, but I don't think it makes much sense.
      Obviously, the advantages of the clustered index are obvious, and the rule that there can only be one clustered index in each table makes the clustered index even more precious.
      From the definition of the clustered index we talked about earlier, we can see that the biggest advantage of using the clustered index is that it can quickly narrow the query scope according to the query requirements and avoid full table scans. In practical applications, because the ID number is automatically generated, we do not know the ID number of each record, so it is difficult for us to use the ID number to query in practice. This makes it a waste of resources to use the primary key of ID number as a clustered index. Secondly, making a field with a different ID number as a clustered index also does not meet the rule of "aggregate index should not be established in the case of a large number of different values"; of course, this situation is only for users who often modify the content of records, especially index items It will have a negative effect when it is used, but it has no effect on the query speed.
      In the office automation system, whether it is a document displayed on the home page of the system that needs to be signed by the user, a meeting, or a user's file query, etc., the data query cannot be separated from the field. The "date" and the user's own "username" .
      Usually, the home page of office automation will display the documents or meetings that each user has not yet signed. Although our where statement can only limit the situation that the current user has not signed, but if your system has been established for a long time and the amount of data is large, then every time each user opens the home page, a full table scan is performed , this is of little significance. Most users have already browsed the files one month ago, and doing so can only increase the overhead of the database. In fact, when a user opens the home page of the system, the database can only query the files that the user has not read in the past 3 months, and the "date" field is used to limit the table scan and improve the query speed. If your office automation system has been established for 2 years, then your home page display speed will theoretically be 8 times the original speed, or even faster.
      The reason why the word "theoretical" is mentioned here is because if your clustered index is still blindly built on the primary key of ID, your query speed is not so high, even if you are on the "date" field Index built (non-aggregated index). Let's take a look at the speed performance of various queries in the case of 10 million data volumes (250,000 data in 3 months):

    (1) Only build a clustered index on the primary key, and do not divide the time period :

    Select gid, fariqi, neibuyonghu, title from tgongwen

    time: 128470 milliseconds (ie: 128 seconds)

    (2) Create a clustered index on the primary key and a non-clustered index on fariq:

    select gid, fariqi, neibuyonghu, title from Tgongwen
    where fariqi > dateadd(day,-90,getdate())

    time: 53763 milliseconds (54 seconds)

    (3) Build an aggregated index on the date column (fariqi):

    select gid, fariqi, neibuyonghu, title from Tgongwen
    where fariqi> dateadd( day, -90, getdate())

    time: 2423 milliseconds (2 seconds)

      Although each statement extracts 250,000 pieces of data, the differences in various situations are huge, especially if the clustered index is built on the date column difference in time. In fact, if your database really has a capacity of 10 million, the primary key is established on the ID column, just like the first and second cases above, the performance on the web page is timeout, and it cannot be displayed at all. This is also one of the most important factors for me to abandon the ID column as a clustered index. The way to get the above speed is: add before each select statement:

    declare @d datetime
    set @d=getdate()

    and add after the select statement:

    select [statement execution time (milliseconds)]=datediff(ms,@d ,getdate())

    2. As long as the index is established, the query speed can be significantly improved.
      In fact, we can find that in the above example, the second and third statements are exactly the same, and the indexed fields are also the same; the only difference is that the former is established on the fariqi field It is a non-aggregated index. The latter builds an aggregated index on this field, but the query speed is very different. Therefore, it is not simply indexing on any field to improve query speed.
      From the statement for creating the table, we can see that there are 5003 different records in the fariqi field in this table with 10 million data. Building an aggregated index on this field is a perfect fit. In reality, we send several files every day, and the date of these files is the same, which is completely in line with the requirements of building a clustered index: "neither most of them are the same, nor only a very few of them". From this point of view, it is very important for us to build a "proper" aggregated index for us to improve the query speed.

    3. Add all the fields that need to improve the query speed into the clustered index to improve the query speed. As
      mentioned above: the fields that are inseparable from the data query are the "date" and the user's own "username". Since both fields are so important, we can combine them to create a compound index.
      Many people think that just adding any field to the clustered index can improve the query speed, and some people are confused: if the composite clustered index fields are queried separately, will the query speed slow down? With this question in mind, let's take a look at the following query speed (the result set is 250,000 pieces of data): (The date column fariqi is first ranked in the starting column of the composite clustered index, and the user name neibuyonghu is ranked in the last column):

    (1 ) select gid, fariqi, neibuyonghu, title from Tgongwen where fariqi>''2004-5-5''

    query speed: 2513 ms

    (2) select gid, fariqi, neibuyonghu, title from Tgongwen
                where fariqi>''2004-5-5'' and neibuyonghu=''office''

    Query speed: 2516 milliseconds

    (3) select gid, fariqi, neibuyonghu, title from Tgongwen where neibuyonghu=''office''

    query speed: 60280 milliseconds

      From the above experiments, we can see that if only the starting column of the clustered index is used as the query condition and all the columns of the composite clustered index are used at the same time, the query speed is almost The same, even slightly faster than using all the composite index columns (in the case of the same number of query result sets); and if only the non-starting column of the composite clustered index is used as the query condition, this index is not worth anything. effect. Of course, the query speed of statements 1 and 2 is the same because the number of items in the query is the same. If all the columns of the composite index are used, and the query results are few, this will form "index coverage", so the performance can be optimal. . At the same time, keep in mind: whether or not you frequently use other columns of the aggregated index, the leading columns must be the most frequently used columns.     4. Summary of experience in using indexes that       are

    not in other books

    1. Using an aggregated index is faster than using a primary key that is not an aggregated index. Tgongwen where fariqi=''2004-9-16''Use     time: 3326ms






    select gid,fariqi,neibuyonghu,reader,title from Tgongwen where gid<=250000Usage

    time: 4470ms

    Here, using an aggregated index is nearly 1/4 faster than using a primary key that is not an aggregated index.

    2. Using aggregated index is faster than using general primary key for order by, especially in the case of small data volume

    select gid, fariqi, neibuyonghu, reader, title from Tgongwen order by fariqi

    time: 12936

    select gid, fariqi, neibuyonghu, Reader, title from Tgongwen order by gid

    time: 18843

      Here, using the aggregated index is 3/10 faster than using the general primary key for order by. In fact, if the amount of data is small, using a clustered index as the sorting column is significantly faster than using a non-clustered index; and if the amount of data is large, such as more than 100,000, the speed difference between the two is not obvious. .

    3. Using the time period in the aggregated index, the search time will be proportionally reduced according to the percentage of the data in the entire data table, no matter how many aggregated indexes are used:

    select gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi>' '2004-1-1''

    time: 6343 milliseconds (extract 1 million records)

    select gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi>''2004-6-6''Time

    : 3170ms (500,000 entries extracted)

    select gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi=' '2004-9-16''

    time: 3326 milliseconds (exactly the same as the result in the previous sentence. If the number of acquisitions is the same, the greater than and equals signs are the same)

    select gid,fariqi,neibuyonghu,reader,title from Tgongwen
                where fariqi>''2004-1-1'' and fariqi<''2004-6-6''

    time: 3280 milliseconds

    4. The date column will not slow down the query speed due to the input of minutes and seconds
      . 1 million pieces of data, there are 500,000 pieces of data after January 1, 2004, but only two different dates, the date is accurate to the day; there are 500,000 pieces of data before, there are 5,000 different dates, the date is accurate to the second .

    select gid,fariqi,neibuyonghu,reader,title from Tgongwen
              where fariqi>''2004-1-1''




                where fariqi<''2004-1-1'' order by fariqi

    Time: 6453 milliseconds

    5. Other precautions

      "Water can carry a boat or capsize it", and the index is the same. Indexes help improve retrieval performance, but excessive or inappropriate indexes can also lead to system inefficiencies. Because each time the user adds an index to the table, the database has to do more work. Too many indexes can even lead to index fragmentation.
      Therefore, we need to establish a "proper" index system, especially for the creation of aggregated indexes, we should strive for perfection, so that your database can achieve high performance.
      Of course, in practice, as a conscientious database administrator, you have to test a few more options to find out which is the most efficient and effective.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326253546&siteId=291194637