Database optimization - what do index

Database index, in the end is what to do?

Question 1. Why should database design index?

 

1000W saved library books, found from the "architect of the Road", a check books, to be found when to go?

So, librarian design a set of rules:

(1) put history class on the first floor, second floor, put literature class, third floor, put IT class ...

(2) IT class, the class was divided into software, hardware category ...

(3) software category, sorted by title and sequencer ...

In order to quickly find a book.

 

In contrast analogy, the 1000W database storage of data, records name = "shenjian" from which to find a section of the investigation, to be found when to go?

So, there must be an index , the database used to improve search speed.

 

Question 2. hash (hash) than the tree (tree) faster, why should design the index structure into a tree?

 

Accelerate the speed of lookup data structure, common are two categories:

(1) hash , the HashMap e.g., query / insert / modify / delete the average time complexity is O (1);

(2) the tree , e.g. balanced binary tree search, query / insert / modify / delete the average time complexity is O (lg (n));

 

You can see, whether it is a read request or write request , the type of hash index, the index should be faster than a tree, why, tree index structure to be designed to do?

Voice-over: 80% of the students could not answer all interview.

 

Index is designed to tree, and SQL-related needs.

 

For such a single-row query SQL needs:

select * from t where name=”shenjian”;

Indeed hash index is faster, because each query only one record.

Narrator: So, if business requirements are one-way access, such as passport, can really use a hash index.

 

But for sorting query SQL needs:

  • Groups: group by

  • Sort by: order by

  • Compare: <,>

Hash type of index, the time complexity will be reduced to O (n), and the tree of "orderly" feature, still able to maintain O (log (n)) of high efficiency.

 

Any demand from the design of bullying.

 

Say one more thing, InnoDB does not support hash indexes.

 

Question 3. Why use a database index B + tree?

In order to maintain the integrity of the knowledge system, a brief introduction of several trees.

 

The first: binary search tree

Binary search tree, as shown above, is the most well known of a data structure that describes not start, why it is not suitable for use as a database index?

(1) When the amount of data when the height of the tree will be relatively high, a large amount of data when the inquiry will be slower;

(2) Each node stores only one record, the query may cause a lot of times the disk IO;

Narrator: This tree is often found in college textbooks, so the most well known.

 

The second: B tree

B tree, as shown above, which is characterized by:

(1) is no longer a binary search, but searching m fork;

(2) leaf node, non-leaf nodes, the data are stored;

(3) traversal sequence, all the nodes can be obtained;

Voiceover, really do not want to introduce this characteristic: the number of keywords contained in the non-root node j satisfies, (┌m / 2┐) -1 <= j <= m. 1- , to meet this condition node splitting.

 

B-tree data structure is realized as an index is created, because it can perfect use of "locality principle."

 

What is the principle of locality?

The logical principle of locality is this:

(1) Memory read and write blocks to read and write disk slow and much slower;

 

(2) disk read-ahead : disk read and write is not read on demand, but by the page read-ahead, will read a page of data, each more data loading, data to be read if the future in this one page, you can avoid future disk IO, improve efficiency;

Narrator: In general, a page of data is 4K.

 

(3) the principle of locality : Software designed to try to follow the "read data centralization" and "the use of data to a large probability that the data will be used in its vicinity," so full disk read-ahead can improve disk IO;

 

Why do B-tree indexes for?

(1) Since m is bifurcated, the height can be greatly reduced;

(2) Each node may store the j-th record, if the size of the node is set to the page size, for example 4K, can fully utilize the characteristics of the read-ahead, the IO greatly reduce disk;

 

Third: B + Tree

B + tree, as shown above, m is still binary search trees, B-trees on the basis of, made some improvements :

(1) non-leaf node is no longer store data, the data is stored only on the leaf nodes of the same level;

Vo: B + tree root node to each path length of the same, and a B-tree is not the case.

 

(2) leaves, increased list, acquiring all nodes is no longer necessary in order traversal;

 

These improvements allow the B + tree has better characteristics than the B-tree:

(1) Find the range, after the positioning min and max, the intermediate leaf node, the result is set, in sequence without backtracking;

Voice-over: too much in range queries with SQL, which is a B + tree biggest advantage than the B-tree.

 

(2) the leaf nodes store the actual rows, rows relatively tight storage, disk storage for large amounts of data; PK non-leaf nodes store records for the query acceleration, fit into the memory;

 

(3) non-leaf nodes, actual recording is not stored, but only records storage KEY, then at the same memory, B + tree index can store more;

 

Finally, under quantify say, Why fork m B + tree than the binary search tree height greatly greatly reduced?

Probably the math:

(1) the principle of locality, the size of a node is set to one, a 4K, assuming there is a KEY 8 bytes, a node can store 500 KEY, i.e. j = 500

(2) m-ary tree, about m / 2 <= j <= m, which can almost tree 1000

(3) Then:

One tree: node 1, 1 * 500 KEY, the size of 4K

Layer tree: nodes 1000, 1000 * 500 = 50W a KEY, size 1000 * 4K = 4M

Three trees: node 1000 * 1000 1000 * 1000 * 500 = 500 000 000 KEY, size 1000 * 1000 * 4K = 4G

Voice-over: the amount of help have a look there is no miscalculate.

 

You can see, store large amounts of data (500 million), the tree does not need too much depth (height 3), the index is not up too much memory (4G).

 

to sum up

  • Database index is used to speed up queries

  • Although the hash index is O (1), the index tree is O (log (n)), but there are many SQL "orderly" demand, so the database using the tree index

  • InnoDB does not support hash indexes

  • Data read-ahead idea is: read the disk read and write is not needed, but pre-read page by page, a page will read data, each more data load, in order to reduce disk IO future

  • The principle of locality : Software designed to try to follow the "read data centralization" and "the use of data to a large probability that the data will be used in its vicinity," so full disk read-ahead can improve disk IO

  • The most commonly used index database of B + tree:

(1) it is suitable for disk storage, can make full use of the principle of locality, disk read-ahead;

(2) very low height of the tree, capable of storing large amounts of data;

(3) the index itself is small amount of memory;

(4) can be a good support for single point queries, range queries, ordering inquiries;

 

The more you understand, the more will not, the way knowledge is progressive

Published 115 original articles · won praise 41 · views 60000 +

Guess you like

Origin blog.csdn.net/pangzhaowen/article/details/105122025