High-performance MYSQL (study notes) - index 1

Create high performance indexes

index concept   

   Index (also called "key" in MYSQL) is a data structure in the storage engine that users can quickly find records. The most effective means of performance optimization.

In MYSQL, the storage engine first finds the corresponding value in the index, and then finds the corresponding data row according to the matching index record

例如: selectfirst_name from sakila.actor where actor_id = 5;

If you build an index on actor_id, MYSQL will use the index to find the row with actor_id=5, then return all data rows that contain that value.

index type

MYSQL index types include B-Tree index, hash index, spatial data index (R-Tree), full-text index

There is no specific type specified, most of which are B-Tree indexes, which use the B-Tree data structure to store data. The knowledge concept of B-Tree can be referred to ( https://baike.baidu.com/item/B-tree/6606402?fr=aladdin ), I will not expand it here, because I don’t understand it in detail.

MySIAM uses prefix compression technology to make the index smaller, and InnoDB stores it according to the original data format. For example, MyISAM index refers to the indexed row through the physical location of the data, while InnoDB refers to the indexed row according to the primary key.

B-Tree Index

      B-Tree usually means that all values ​​are stored in order, and each leaf page is the same distance from the root. B-Tree can speed up the speed of accessing data, because the storage engine no longer needs to perform a full table search to obtain the required data, but instead starts from the root node of the index to search. The slot of the root node stores the pointer to the child node, and the storage engine searches down according to the pointer. By comparing the value of the node page with the value to be looked up, it is possible to find the appropriate pointers into the lower node, these pointers actually define the upper and lower bounds of the value in the child node page. Eventually the storage engine either finds the corresponding value or the record does not exist.

    B-Trees are sequentially organized and stored for index columns, so they are very suitable for finding range data. For example, on an index tree based on a text field, it is very appropriate to pass consecutive values ​​alphabetically for lookups. For example, to find names starting with AK, this will be very efficient.

    The index sorts multiple values ​​based on the order in which the index sequence is defined in the CREATE_TABLE statement. If you define a sequential index for the first, last, time, and three fields, the index will be sorted in order.

Query types that can use B-Tree indexes

B-Tree indexes are suitable for full key value, key value range, key prefix lookup (this only applies to lookups based on the leftmost prefix), and are valid for the following query types:

All-value match (matches all columns in the index)

Match the leftmost prefix (only the first column of the index is used)

Match column prefix (match the beginning of a column, e.g. the first letter is J)

Match range values ​​(for example, query A~J range, only use the first column of the index)

Exactly match one column and range match another column (for example, find data whose last name is Jack and first name starts with T)

Queries that only access the index (queries only need ranges)

B-Tree restrictions

You cannot use an index if you don't start the lookup by the leftmost column of the index

Columns in the index cannot be skipped. If the order in which the index is established is (first, second, last), the conditions cannot be indexed as first and last, but only first (because second is skipped)

If there is a range query of a certain column in the query, all the columns on the right cannot use index optimization (first=1 and second like 'J%') and last =1, only first, second, last cannot be indexed

Conclusion: The column order of the index is very important! ! ! The order of query conditions is very important!

hash index

Hash index (hash index) is implemented based on a hash table. Only queries that exactly match all columns of the index are valid. For each row of data, the storage engine will calculate a hash code for all index columns. The hash is a smaller value, and the hash codes calculated for rows with different key values ​​are also different. The hash code is a small value, and the hash code calculated for rows with different key values ​​is different. Hashcodes store all hashcodes in the index, and hold pointers to each data entry in the hashtable.

Note: Only the memory engine explicitly supports hash indexes, menory also supports B-Tree indexes, and memory supports non-unique hash indexes. If the hash values ​​of multiple columns are the same, the name index will record and store more than one column in a linked list. pointers to the same hash entry.

例子:select * from table where fname=’first’;

Query process: Calculate the hash value of first, assuming it is 2323, use this value to find the corresponding record pointer, find the 45th row pointer corresponding to 2323 in the index, and then compare whether the value of the third row is 'first' to ensure that it is found The bank.

Hash Index Limits

Hash indexes only contain hash values ​​and pointers, do not store field values, and access row values ​​in memory based on pointers

Hash index data is not stored in the order of index values ​​and cannot be used for sorting

Hash indexes do not support partial index column match lookups, because hash indexes always use the entire contents of the indexed column to calculate the hash value. Index (A, B) establishes a hash index, if the query only has data column A, the index cannot be used

Hash index only supports equal value comparison query, does not support any range query

Accessing hashed data is very fast, unless hash collisions (different index values ​​have the same hash value)

Some index maintenance operations are also expensive if the hashes collide. For example, a hash index is established on a column with very low selectivity (many hash conflicts). When a row is deleted from the column, the storage engine needs to traverse each row in the linked list of the corresponding hash value, find and delete the corresponding row. References, the more conflicts, the more expensive it is.

Create a custom hash index

For example select idfrom table1 where url= https://www.csdn.net ; we can optimize like this:

1. Delete the index on the original Url column, add an indexed url_crc column, and use CRC32 for hashing, you can use the following query method: select id from table1 where url="https://www.csdn.net "and url_crc=CRC32("https://www.csdn.net/");

 2. Create a trigger. When a url is inserted, a hash code will be generated on the url_crc field!

 Create trigger table1hash_crc_insbefore insert on table1 for each row begin set new.url_crc=crc32(new.url);

  Create trigger table1hash_crc_updatebefore update on table1 for each row begin set new.url_crc=crc32(new.url);

Handling hash collisions

    To deal with hash collisions, you need to include constants in where when you use the hash index to query, for example, where url= abovehttps://www.csdn.net and url_crc= CRC32(“https://www.csdn.net”);

other indexes

Spatial data index (R-Tree), full-text index (special type of index, he searches for keywords in the text, rather than directly comparing the words in the index, full-text indexing is more like what search engines do, and full-text indexing applies in MATCH AGAINST operation)

The advantages of indexing

1. The index greatly reduces the amount of data that the server needs to scan

2. Indexes help the server avoid sorting and temporary tables

3. Indexes can turn random I/O into sequential I/O

Samsung system: If the index puts related records together, it is one star; if the order of the data in the index is the same as the order in the search, it is two stars; if the columns in the index contain all the columns needed in the query, it is three stars;



Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325528607&siteId=291194637