[MySql] storage engine, index and optimization

One, the storage engine

The storage engines supported by MySql5.0 include: InnoDB, MyISAM, BDB, MEMORY, MERGE, EXAMPLE, NDB Cluster, ARCHIVE, CSV, BLACKHOLE, FEDERATED, etc., where InnoDB and BDB provide transactional security tables, and other storage engines are non-transactional security tables .

MySQL5.5之前的默认存储引擎是MyISAM,5.5之后就改为了InnoDB。

Second, various storage engine characteristics

features InnoDB MyISAM MEMORY MERGE
storage limit 64TB have have No
transaction security

support

lock mechanism Row lock (suitable for high concurrency) table lock table lock table lock
B-tree index support support support support
hash index support
full text index Supported (after version 5.6) support
cluster index support
data index support support
index cache support support support support
Data can be compressed support
space use high Low N/A Low
memory usage high Low medium Low
Bulk Insertion Speed Low high high high
Support for foreign keys support

The following focuses on the two most commonly used engines: InnoDB, MyISAM:

1.InnoDB:

The InnoDB storage engine is the default storage engine after Mysql5.5. The InnoDB storage engine provides transaction security with commit, rollback, and crash recovery capabilities. The write processing efficiency is poor, and it will take up more disk space to retain data and index;

InnoDB storage engine is different from other storage engines: transaction control, foreign key constraints ;

InnoDB storage engine storage method:

Ⅰ. Use shared table space storage. The table structure of this method is saved in the .frm file, and the data and indexes are saved in the table space defined by innodb_data_home_dir and innodb_data_file_path, which can be multiple files.

Ⅱ. Using multi-table space storage, the table structure of the table created in this way is still in the .frm file, but the data and indexes of each table are saved separately in .ibd.

2.MyISAM:

MyISAM does not support transactions, nor does it support foreign keys . The advantage is that the access speed is fast, and there is no requirement for the integrity of the transaction or Select, Insert-based applications can basically use this engine to create tables;

MyISAM storage engine storage method:

Each MyISAM stores 3 files on disk with the same file name and table name, but with the extensions:

.frm (storage table definition);

.MYD(MYData, store data);

.MYI(MYIndex, storage index);

Third, the choice of storage engine

When selecting a storage engine, an appropriate storage engine should be selected according to the characteristics of the application system. For complex application systems, multiple storage engines can also be selected for combination according to the actual situation. The following are the usage environments of several commonly used storage engines.

  • InnoDB: It is the default storage engine after Mysql5.5, used for transaction processing applications, and supports foreign keys. If the application has relatively high requirements for the integrity of the transaction, requires data consistency under concurrent conditions, and data operations include many update and delete operations in addition to insertion and query, then the InnoDB storage engine is a more suitable choice. The InnoDB storage engine not only effectively reduces the locking caused by deletion and update, but also ensures the complete submission and rollback of transactions. For systems with high data accuracy requirements such as billing systems or financial systems, InnoDB is the most suitable s Choice.

  • MyISAM: If the application is mainly read and insert operations, with only a few update and delete operations, and the requirements for transaction integrity and concurrency are not very high, then this storage engine is very suitable to choose.

Fourth, the index

1. Index structure

index structure describe
B+Tree index The most common index type, most indexes support B+ tree index
Hash index It is only supported by the Memory engine, and the underlying data structure is implemented with a hash table. Only queries that exactly match all the columns of the index are valid, and range queries are not supported.
R-tree (spatial index) Spatial index is a special index type of MyISAM engine, mainly used for geospatial data types, usually used less
Full-text (full text index) The full-text index looks for keywords in the text, rather than comparing the values ​​in the index, similar to Lucene, Solr, ES

2. Index classification

(by attribute)

Classification meaning features keywords
primary key index Index created on the primary key of the table Automatically created by default, only one PRIMARY
unique index Avoid duplicate values ​​in a data column in the same table can have multiple UNIQUE
regular index Quickly locate specific data can have multiple
full text index The full-text index looks for keywords in the text, rather than comparing values ​​in the index can have multiple FULL TEXT


(according to data storage method)

Classification meaning features index structure
clustered/clustered index Store the data in the index and put it together, and the leaf nodes of the index structure save the row data must have, and only one
Nonclustered Index/Secondary Index/Secondary Index Store the data separately from the index, and the leaf nodes of the index structure are associated with the corresponding primary key There can be multiple

3. Avoid index failure

Ⅰ. Global matching, specifying specific values ​​for all columns in the index.

Ⅱ. If multiple columns are indexed, the leftmost prefix rule must be followed. It means that the query starts from the leftmost front column of the index and cannot skip columns in the index.

Ⅲ. The column on the right side of the range query cannot use the index.

Ⅳ. Do not perform operations on the index column, otherwise the index will fail.

Ⅴ. Strings without single quotes will cause the index to fail.

Ⅵ. Try to use the covering index and reduce the use of select *, but if the query column exceeds the index column, the performance will be reduced.

Ⅶ. Conditions separated by or, if there is an index in the condition before or, but there is no index in the following column, then the involved indexes will not be used.

Ⅷ. For Like fuzzy query starting with %, the index will be invalid; if it is only the tail fuzzy match, the index will not be invalid; if it is the head fuzzy match, the index will be invalid.

Ⅸ. If mysql evaluates using indexes slower than full tables, indexes are not applicable.

Ⅹ. is null, is not null sometimes the index is invalid; in, not in sometimes the index is invalid;

4. Index Design Principles

  • For tables with high query frequency and large data volume.

  • For the selection of index fields, the best candidate columns should be extracted from the conditions of the where clause.

  • Using a unique index, the higher the degree of discrimination, the higher the efficiency of using the index.

  • Indexes can effectively improve the efficiency of querying data, but the number of indexes is not always better. The more indexes there are, the cost of maintaining indexes will naturally increase.

  • With a short index, the hard disk is also used to store the index after it is created, so improving the I/O efficiency of index access can also improve the overall access efficiency.

  • Using the most composite index, a composite index composed of N columns is equivalent to creating N indexes. If the first few fields that make up the index are used in the where clause when querying, then this query SQL can be used Combine indexes to improve query efficiency.

建议使用复合索引,少使用单列索引。

Five, SQL optimization

First look at the order of sql statements

编写顺序
SELECT DISTINCT
	<select list>
FROM
	<left_table> <join_type>
JOIN
	<right_table> ON <join_condition>
WHERE
	<where_condition>
GROUP BY
	<group_by_list>
HAVING
	<having_condition>
ORDER BY
	<order_by_condition>
LIMIT
	<limit_params>

-----------------------------------------------------------------------------------------

执行顺序
FROM	<left_table>

ON 		<join_condition>

<join_type>		JOIN	<right_table>

WHERE		<where_condition>

GROUP BY 	<group_by_list>

HAVING		<having_condition>

SELECT DISTINCT		<select list>

ORDER BY	<order_by_condition>

LIMIT		<limit_params>

1. Optimize the insert statement

If you need to insert many rows of data into a table at the same time, you should try to use the insert statement of multiple value tables. This method will greatly reduce the connection between the client and the database, and the consumption of closing; the data should be inserted as much as possible. sequence insertion;

2. Optimize the order by statement

Let me talk about the two sorting methods first:

Ⅰ: Sorting by returning data, that is, the usual filesort sorting, all sorting that does not directly return the sorting results through the index is called FileSort sorting.

Ⅱ: Sequentially scan the ordered data directly returned by the ordered index. This is the using index, which does not require additional sorting and has high operating efficiency.

After understanding the two sorting methods, the optimization goal is clear: minimize additional sorting, return ordered data directly through the index, where condition and order by use the same index, and the order of order by is the same as the index order, and order by The fields are in ascending or descending order. Otherwise, additional operations are definitely required, so FileSort appears.

For FileSort, mysql has two sorting algorithms: one is (two-scanning algorithm): first remove the sorting field and row pointer information according to the conditions, and then sort in the sort buffer. If the sort buffer is not enough, it will be sorted in the temporary table The sorted results are stored in the table. After the sorting is completed, read the records back to the table according to the row pointer, which may cause a large number of random I/O operations. The other is (one-time scanning algorithm): take out all the fields that meet the conditions at one time, and then output the result set directly after sorting in the sort buffer. The memory overhead of sorting is large, but the sorting efficiency is higher than that of the two-scan algorithm.

MySQL judges whether it is a sorting algorithm by comparing the size of the system variable max_length_for_sort_data with the total size of the fields extracted by the Query statement. If max_length_for_sort_data is larger, then use the second optimized algorithm; otherwise, use the first one.

You can appropriately increase the sort_buffer_size and max_length_for_sort_data system variables to increase the size of the sorting area and improve the efficiency of sorting.

3. Optimize the group by statement

Because group by actually also performs sorting operations, and compared with order by, group by mainly has more grouping operations after sorting. Of course, if some aggregation functions are used in grouping, then some aggregation functions are also calculated. , so in the implementation of group by, we can also use the index.

If the query contains group by, but the user wants to avoid the consumption of sorted results, order by null can be executed to disable sorting.

4. Optimize nested queries

After Mysql4.1 version, SQL subqueries are supported. This technique uses the SELECT statement to create a single-column query result, and then uses this result as a filter condition in another query. Using subqueries can complete many SQL operations that logically require multiple steps to complete at one time, and can also avoid transaction or table locks, and it is easy to write. However, in some cases, subqueries can be replaced by more efficient connections (JOIN).

5. Optimize or condition

For query clauses containing OR, if indexes are to be used, indexes must be used for each conditional column between ORs, and composite indexes cannot be used; if there are no indexes, you should consider adding indexes. It is recommended to use union instead of or;

6. Using SQL Hints

SQL prompt is an important means of optimizing the database. Simply put, it is to add some artificial prompts to the SQL statement to achieve the purpose of optimizing the operation.

1. After the table name in the query statement, add use index to provide a list of indexes that you want MySQL to refer to, so that MySQL will no longer consider other available indexes.

select * from tb_user use index(idx_seller_name) where name = 'zhangsan';

2. If the user simply wants MySQL to ignore one or more indexes, you can use ignore index as a hint.

select * from tb_user ignore index(idx_seller_name) where name = 'zhangsan';

3. To force MySQL to use a specific index, use force index as a hint in the query.

select * from tb_user force index(idx_seller_name) where name = 'zhangsan';

Guess you like

Origin blog.csdn.net/qq_42990433/article/details/121767859