Index database types and their principles

The word index for a developed ape who believe, like we see the same code bow to see the rise in the number of daily inquiries to optimize the efficiency of our program, we not only consider optimizing sql statement, the other is to use the index. Use the index is as simple as creating a table to write the statement, to be sure to write a statement to create an index, you know that does not exist will not create a table of server-side programmers in the world. However, it will use the index is one thing, and in-depth understanding of the principles of the index can use the index just right is another matter, which is an entirely different poles other realm (I myself have not reached this level state). A large part of the programmer understanding of the index is limited to up to "indexed to make queries faster" concept.

Here reference to some online articles, summed up the understanding of the index and its principles. First, a few questions:

1. Why will the data table using the primary key?

2. Why use the index after the query efficiency will be faster?

3. Use the index will insert, update, delete slow down?

4. When will need to add an index on two fields?

These problems they might not be able to find out the answer. We know the answers to these questions what good is it? If the database tables used by the application developed only 10000 data, so understanding and really do not understand there is no difference, however, if the application has developed hundreds of millions even billions levels of data, so do not understand the principles of index write out the program simply can not run, like if the truck to put on a car engine, this truck can pull a move available? Index must clearly want to understand the principles of a data structure "balanced tree" (non-binary), which is the b tree or b + tree, the important thing to say three times: "balanced tree, balanced tree, balanced tree." Of course, some database also use data structures hash bucket effect of the index, however, are the mainstream RDBMS data tables balanced trees as the default index data structure.

Answer: The usual construction of the table when we will add the primary key for the table, in some relational database, if you do not specify a primary key table when built, the database will refuse to execute the construction of the table statement. In fact, the addition of a table's primary key, and not be called "table." Did not add a primary key table, its data disorderly placed on disk storage, line by line arranged very neatly, with my knowledge of the "table" is very close. If the primary key to the table, then the table is stored on disk structure is transformed by the alignment of the structure became a tree, which is said above "balanced tree" structure, in other words, the entire table becomes an index. Yes, again, the whole table turned into an index, the so-called "clustered index." This is why a table can have only one primary key, a table can have only a "clustered index", because the role of the primary key is to convert the "table" of the data format to "index (balanced tree)" format placement.

 

 


The figure is the table (clustered index) with a configuration diagram of a primary key. The picture is not very good, I will look forward to. Wherein all the nodes of the tree (except the bottom) of the data is constructed by data primary key field, is generally designated our primary key id field. The bottom portion of the data is a real table.

Answer 2: If we execute a SQL statement:
the SELECT * from the Table the WHERE the above mentioned id = 1256;

First, according to the 1256 index-index value of the leaf node is located, and then take the data lines is equal to the id 1256 by leaf nodes. Not here to explain the details of running a balanced tree, but can be seen from the figure, a total of three tree from root to leaf nodes only need to be able to get the results look after three. As shown below

 

 

 

If a table has one hundred million data, you need to find a piece of data which, according to conventional logic, one by one to match it, the worst case needs to match one hundred million times to get the result, with big O notation is O ( n) worst time complexity, this is unacceptable, and this is obviously one hundred million data can not be read into memory for program use a one-time, so this match million times in the case without cache optimization is one hundred million times IO overhead to current disk IO capacity and computing power of the CPU, it may take months to obtain results. If you convert this table into a balanced tree structure (a very lush and very multi-node tree), assuming that this tree has 10 layers, then you only need 10 IO overhead will be able to find the required data, speed index level increase, with a large O notation is O (log n), n is the total recording tree, tree branch number is in base, the result is the number of hierarchical tree. In other words, the number of lookups based on the number of bifurcation tree is the end, the total number of records, is expressed by the formula

 

 

 

Program to indicate that Math.Log (100000000,10), 100000000 is the number of records, number 10 is the bifurcation of the tree (the number of bifurcation real environment far more than 10), the result is the number of lookups, here are the results from billion to the Digits. Therefore, the use of the index will have an amazing database query performance.

Answer 3: Everything is there, the speed of the rise in the index to make the database query data on both sides of the write data rate of decline, the reason is very simple, because this balanced tree structure must be maintained in a proper state, additions and deletions data will change the balance index data content of each node in the tree, destroying the tree structure, and therefore, each time the data changes, DBMS have to re-sort the tree (index) to ensure that it is correct, this will bring no small performance overhead, which is why the index will bring a query operation other than the cause of side effects.

A 4: finished clustered index, then talk about non-clustered index, which is often mentioned and we usually use the regular index. Which relate to the problems before us add two fields in the index multiple-field index query.

Non-clustered index and a clustered index, same as a balanced tree data structure as an index. Value of the index tree structure each node from the table index field, if the user name field to the table, plus index, the index value is composed of a name field, when the data changes, the DBMS has been required to maintain the index structure correctness. If we add the index to the table in a plurality of fields, a plurality of separate index structure will then occur, each index (non-clustered index) no correlation between each other. As shown below

 

 

 

Every field to build a new index, the data field will be a copy of it to generate an index. Therefore, add an index to the table, the table will increase the volume, taking up disk storage space.

Non-clustered index and a clustered index difference is that the data required can be found by looking clustered index, and the primary key value corresponding to the record can be found by non-clustered index, and then using the primary key value to find the data required by the aggregation index, As shown below

 

 

 

No matter in any way look-up table, will eventually be using the main keys to navigate to the data by the clustered index, clustered index (primary key) is the only path to real data resides.

However, there is one exception may not use the clustered index will be able to check out the required data, such non-mainstream approach called "covering index" inquiry, which is usually said composite index or a multiple-field index query. The above article has dealt When indexed field, the contents of the field will be synchronized to the index in, if you specify two fields into one index, then the two fields will be synchronized to the content being indexed.

Look at the following SQL statement

// index

create index index_birthday on user_info(birthday);

// query the user name in the user's birthday born November 1, 1991

select user_name from user_info where birthday = '1991-11-1'

Execution of sentence SQL statement is as follows

First, index_birthday birthday equal to find all records 1991-11-1 the primary key ID value by non-clustered index

Then, by the primary key value of ID to get a clustered index lookup to find the real value of the primary key ID data (rows) of storage on the location of the

Finally, get user_name field data obtained from the real value is returned, which is to win the final result

We put the index on the birthday field into double coverage index fields

create index index_birthday_and_user_name on user_info(birthday, user_name);

Execution of sentence SQL statement will become

By way of non-clustered index lookup index_birthday_and_user_name birthday 1991-11-1 equal to the content of the leaf node, however, in addition to the leaf nodes user_name primary key ID value other than the value of the field user_name also inside, so no primary key ID value Find where real data lines, direct access to the value of the leaf node user_name returns can be. In this way the cover index direct lookup can be omitted without using a two step back cover index lookups, greatly improve the query performance, as shown below

 

 

 

Roughly it works like a database index is described in the text, but the details may be slightly bias, but this will not have an impact on the result of the concept explained.

This article is reproduced Reference: https://blog.csdn.net/qq_35673617/article/details/80802623

Guess you like

Origin www.cnblogs.com/guangxiang/p/11547039.html