Database specification and SQL tuning

The database design specification chapter is still revised and improved based on the "Alibaba Java Development Manual" as the prototype.
MySQL protocol
(1) Table creation protocol
(2) Index Protocol
(3) SQL protocol
(4) ORM protocol
(1) Table creation protocol
1. [Mandatory] Fields that express the concept of yes or no must be named in the form of is_xxx, and the data type is
unsignedtinyint(
1 means yes, 0 means no), this rule also applies to odps table creation.
Note: If any field is non-negative, it must be unsigned.
2. [Mandatory] Table names and field names must use lowercase letters or numbers; the beginning of a number is prohibited, and two underscores are prohibited
Only numbers appear in the middle of the line. The modification of the database field name is very expensive, because it cannot be pre-released, so the field name
said to require careful consideration.
正例: getter_admin task_config level3_name
Counterexample: GetterAdmin , taskConfig , level_3_name
3. [Mandatory] Do not use plural nouns in the table name.
Note: The table name should only indicate the entity content in the table, not the number of entities, corresponding to the DO class name
It is also a singular form, which conforms to the expression habit.
4. [Mandatory] Disable reserved words, such as desc, range, match, delayed, etc., please refer to MySQL official reserved words
Character.
5. [Mandatory] The unique index name is the uk field name; the common index name is the idx field name.
Note: uk is the unique key; idx is the abbreviation of index.
6. [Mandatory] The decimal type is decimal, and float and double are prohibited.
Explanation: When float and double are stored, there is a problem of loss of precision, and it is likely that when comparing values,
to incorrect results. If the range of stored data exceeds the range of decimal, it is recommended to split the data into integers and
Decimals are stored separately.
7. [Mandatory] If the stored strings are almost equal in length, use the char fixed-length string type.
8. [Mandatory] varchar is a variable-length character string, no storage space is allocated in advance, and the length should not exceed 5000. If
If the storage length is greater than this value, define the field type as text, create an independent table, and use the primary key to correspond to avoid impact
Other field index efficiency.
9. [Mandatory] Mandatory fields for the table: id, creation_time, creator, modified_time, modifier, valid.
Note: The id must be the primary key, the type is unsigned bigint, auto-increment for a single table, and the step size is 1.
The types of gmt_create and gmt_modified are all date_time types.
10. 【Mandatory⭐】 All tables must have a primary key, and the primary key must be in order. Mysql lock, the lock is the primary key index tree, no master
The key will cause all operations to lock the entire table. The disorder of the primary key will cause frequent page splits and greatly reduce the update speed.
11. [Recommendation] It is best to add "business name_function of the table" when naming the table.
正例: tiger_task / tiger_reader / mpp_config
12. [Recommendation] The library name should be as consistent as possible with the application name.
13. [Recommendation] If you modify the meaning of the field or add the state represented by the field, you need to update the field comment in time.
14. [Recommended] Fields allow appropriate redundancy to improve performance, but data synchronization must be considered. redundant field
Should follow:
Not a frequently modified field.
It is not a varchar super long field, let alone a text field.
Positive example: the product category name is used frequently, the field length is short, the name is basically unchanged, and can be redundantly stored in the associated table
Category name, to avoid associated query.
15. [Recommendation] Under normal circumstances, it is recommended to perform partitioning when the number of rows in a single table exceeds 5 million rows or the capacity of a single table exceeds 2GB.
Library table.
If there are too many fields in a certain table, or if the fields are relatively large, it needs to be done in advance. According to previous experience,
A table with 110 fields will seriously slow down the overall performance when the data volume exceeds 1 million.
Note: If the estimated data volume after three years will not reach this level at all, please do not divide the database into separate tables when creating tables
surface.
16. [Reference] Appropriate character storage length not only saves database table space and index storage, but more importantly
Improve retrieval speed.
1 Positive example: unsigned tinyint is used for the age of a person (the range is 0-255 , and the life span of a person will not exceed 255 years); the turtle is
Must be a smallint , but must be an int if the age of the sun ; if the ages of all stars are summed up, then
Just have to use bigint . (2) Index Protocol
1. [Mandatory] Fields with unique characteristics in business, even composite fields, must be built into a unique index.
Explanation: Don't think that the unique index affects the insert speed. This speed loss can be ignored, but it improves the search speed.
The degree is obvious; in addition, even if a very complete checksum control is done at the application layer, as long as there is no unique index,
According to Murphy's Law, there must be dirty data.
2. [Mandatory] Joining is prohibited for more than three tables. The data types of the fields that need to be joined must be absolutely consistent; multi-table association query
When querying, ensure that the associated fields need to have indexes.
Note: Even if you join two tables, you should pay attention to table indexes and SQL performance.
3. [Mandatory] When creating an index on a varchar field, the index length must be specified, and it is not necessary to create an index on the entire field
The index length is determined according to the actual text discrimination (discrete degree).
Explanation: The length of the index and the degree of discrimination are a pair of contradictions. Generally, for string type data, an index with a length of 20
Citation, the discrimination will be as high as more than 90%, you can use count(distinct left(column name, index length))/count(*)
to determine the degree of discrimination.
4. [Mandatory] It is strictly forbidden to search the page with left blur or full blur. If necessary, please use the search engine to solve it.
Explanation: The index file has the leftmost prefix matching characteristic of B-Tree. If the value on the left is not determined, it cannot
Use this index.
5. [Mandatory] For the first index column on the left of the joint index that can be covered, do not add an index separately.
6. [Mandatory] Create index columns without Null values, and queries for Null values ​​cannot be indexed.
7. [Recommendation] If there is an order by scene, please pay attention to the orderliness of the index. The last field of order by is
A part of the combined index, and put it at the end of the index combination order to avoid the situation of file_sort and affect the query
query performance.
1 positive example: where a=? and b=? order by c; index: a_b_c
2 Counter-example: If there is a range search in the index, the index order cannot be used, such as: WHERE a>10 ORDER BY b; index a_
b cannot be sorted.
8. [Recommendation] Use the covering index to perform query operations to avoid returning to the table.
Explanation: If a book needs to know what the title of Chapter 11 is, will the page corresponding to Chapter 11 be opened? head
Just browse through the directory, this directory is to play the role of covering index.
1 positive example: the types of indexes that can be built: primary key index, unique index, common index, and covering index is an effect of a query
As a result, with the result of explain , the extra column will appear: using index .
9. [Recommendation] Use delayed association or subquery to optimize super-multiple pagination scenarios. Explanation: MySQL does not skip the offset row, but takes offset+N rows, and then returns the offset row before giving up.
Return N lines, when the offset is very large, the efficiency is very low, or control the total number of pages returned,
Either do a SQL rewrite for the number of pages above a certain threshold.
10. [Recommendation] The goal of SQL performance optimization: at least reach the range level, the requirement is the ref level, if possible
Is consts best.
illustrate:
1) There is at most one matching row (primary key or unique index) in a consts single table, which can be read during the optimization phase
to the data.
2) ref refers to the use of ordinary index (
normal index)。
3) range performs range retrieval on the index.
11. [Recommendation] When building a combined index, the one with the highest degree of discrimination is on the far left (the leftmost matching principle).
12. [Reference] Avoid the following extreme misunderstandings when creating an index:
Misunderstanding that a query needs to build an index.
Misunderstanding that indexes consume space and seriously slow down updates and additions.
Misunderstood that the unique index needs to be solved by "search first and then insert" at the application layer.
Misunderstanding that the index is built, it will definitely go to the index.
Mistakenly believe that indexes will definitely improve query efficiency.
(3) SQL protocol
1. [Mandatory] Do not use count (column name) or count (constant) instead of count (
*) ,count(
*) is
The standard syntax for counting rows defined by SQL92 has nothing to do with the database, and has nothing to do with NULL and non-NULL.
Explanation: count(*) will count rows whose value is NULL, but count (column name) will not count rows whose value is NULL
OK.
2. [Mandatory] count(distinct column) calculates the number of unique columns except NULL. Notice
count(distinct column1, column2) If one of the columns is all NULL, then even if the other column has a different
value, also returns 0.
1 Positive example: first quickly locate the id segment that needs to be obtained , and then associate:
SELECT a.* FROM 1 a, (select id from 1 where 条件 LIMIT 100000,20 ) b wh
ere a.id=b.id
2
Counter example: the result of the explain table, type=index , full scan of the index physical file, the speed is very slow, this index level
Compared with the range is still low, and the full table scan is insignificant.
1
1 Positive example: if where a=? and b=? , column a is almost unique, then you only need to build the idx_a index.
Note: When there is a mixed judgment condition of non-equal sign and equal sign, please prepend the column of the equal sign condition when building an index. Such as: wherea
>?and b=? Then even if a has a higher degree of discrimination, b must be placed at the forefront of the index.
2 3. [Mandatory] When the value of a column is all NULL, the return result of count(column) is 0, but
The return result of sum(column) is NULL, so pay attention to the NPE problem when using sum(column).
1 Positive example: You can use the following method to avoid the NPE problem of sum : SELECT IF(ISNULL(SUM(g)),0,SUM(g))
FROM table;
4. [Mandatory] Use ISNULL() to determine whether it is a NULL value. Note: A direct comparison of NULL to any value is
NULL。
illustrate:
1) The return result of NULL<>NULL is NULL, not false.
2) The return result of NULL=NULL is NULL, not true.
3) The return result of NULL<>1 is NULL, not true.
5. [Mandatory] When writing paging query logic in the code, if the count is 0, it should be returned directly to avoid the execution of subsequent paging
statement.
6. [Mandatory⭐]Foreign keys and cascading are not allowed, and all foreign key concepts must be resolved at the application layer.
Explanation: (concept explanation) student_id in the student table is the primary key, then the student_id in the grade table is
foreign key. If the student_id in the student table is updated and the student_id update in the grade table is triggered at the same time, it is
Cascading updates. Foreign key and cascading updates are suitable for single machine with low concurrency, not for distributed and high concurrency clusters; cascading update
New is strong blocking, there is a risk of database update storm; foreign keys affect the insertion speed of the database.
7. [Mandatory] The use of stored procedures is prohibited. Stored procedures are difficult to debug and expand, and they are not portable.
8. [Mandatory] When correcting data, when deleting and modifying records, you must first select to avoid accidental deletion and confirm that it is correct
to execute the update statement.
9. [Mandatory] When deleting data, prohibit "delete from Table where Condition". must be queried first to be deleted
data, and then delete based on the data ID. If the program does not pass in the condition of Condition, the final result is not to delete the data,
Instead, all data is deleted and must be avoided.
10. [Recommendation] Avoid the in operation if it can be avoided. If it is really unavoidable, you need to carefully evaluate the number of set elements behind in
Quantity, controlled within 1000.
11. [Reference] If there is a need for globalization, all character storage and representation are encoded in utf-8, then the character count
number method
Notice:
illustrate:
SELECT LENGTH("Easy work"); returns 12
SELECT CHARACTER_LENGTH("Easy work"); returns 4
If you want to use emoticons, use utfmb4 for storage, and pay attention to the difference between it and utf-8 encoding. 12. [Reference] TRUNCATE TABLE is faster than DELETE, and uses less system and transaction log resources.
However, TRUNCATE has no transaction and does not trigger trigger, which may cause accidents, so it is not recommended to use this in development code
statement.
Explanation: TRUNCATE TABLE is functionally identical to a DELETE statement without a WHERE clause.
(4) ORM protocol
1. [Mandatory] In the table query, do not use * as the field list of the query, and which fields are required must be clear
write down.
illustrate:
1) Increase query analyzer parsing cost.
2) The increase or decrease of fields is easy to be inconsistent with the resultMap configuration.
2. [Mandatory] The boolean attribute of the POJO class cannot be added with is, while the database field must be added with is_, which is required in
The mapping between fields and attributes is performed in resultMap.
Note: Refer to the definition of POJO class and the definition of database field. It is necessary to add mapping in sql.xml.
3. [Mandatory] Do not use resultClass as a return parameter, even if all class attribute names correspond to database fields one by one,
Also needs to be defined; in turn, every table must have a corresponding.
Description: Configure the mapping relationship to decouple fields from DO classes for easy maintenance.
4. [Mandatory] Pay attention to the use of parameters in xml configuration: #{}, #param#Do not use ${}, this method is prone to SQL
injection.
5. [Mandatory] The queryForList(String statementName, int start, int size) that comes with iBATIS is not recommended
Use (it is a memory-based paging method, and there is a deep paging problem).
Description: The implementation method is to get all the records of the SQL statement corresponding to statementName in the database, and then
Fetch start and size sub-sets through subList, OOM has appeared online because of this reason.
1 positive example: introduce #start#, #size# in sqlmap.xml
2 Map<String, Object> map = new HashMap<String, Object>();
3 map.put("start", start);
4 map.put("size", size);
6. [Mandatory] It is not allowed to directly use HashMap and Hashtable as the output of the query result set.
7. [Mandatory] When updating a data table record, the updater and update time corresponding to the record must be updated at the same time. Can't take it
In the scene of updating people, you can use the reserved word "system" uniformly and do not leave it blank.
8. [Recommendation] Do not write a large and comprehensive data update interface, pass it in as POJO class, no matter it is your own purpose or not.
Update the fields marked, update table set c1=value1, c2=value2, c3=value3; this is wrong. When executing SQL, try not to update fields that have not been changed. First, it is error-prone; second, it is inefficient; third, binlog increases
storage.
9. [Recommendation] When writing the query statement "select column1,column2,... from table where XXX", the word
If the number of segments is relatively small, it can be appropriately redundant.
Imagine a scenario: there are query methods for an entity in the code at the same time, and their input parameters and return values ​​are completely
All the same, only the conditions are different, which is easy to be misunderstood. The return values ​​​​of the two methods contain the same attribute values. but if
If the column names of select are different in the underlying implementation, it is easy to cause a null pointer.
Thinking: Only query the required fields, and the database only needs to return the fields that are used, but the methods may be different
lead to misuse. Querying the full amount of fields will require the database to return more information, but the method has a clear meaning and will not
Problems are caused by missing field values.
10. [Reference] Don't abuse @Transactional affairs. Transactions will affect the QPS of the database, in addition to using transactions
Various rollback schemes need to be considered, including cache rollback, search engine rollback, message compensation, and statistics modification.
waiting.
11. The compareValue in [Reference] is a constant compared with the attribute value, usually a number, which means bring it when it is equal
This condition; indicates that it is not empty and is not null; it indicates that it is not a null value.
12. [Reference] As a supplement to the update scenario, in order not to update in full, it is generally judged whether the incoming value is empty or not
is updated. At this time, if you really want to set the value of some fields to be empty, it cannot be achieved. In this case, it is recommended to write a separate
The method of setting empty values, rather than through a method of full update.
Tuning of the database layer
Here are mainly some content related to database tuning, including database server, SQL analysis, etc.
Performance of the database server
The analysis on the server side focuses on these directions: hardware resources. Database architecture, and database instance parameter optimization.
Hardware resource analysis
1. Use the top command to view the CPU load.
2. Use the free command to check the memory usage.
3. Use the iostat tool to view disk I/O usage.
4. Use the vmstat command to view the load of the system.
5. Use the perf top command to view system hotspots.
6. Use the nmon tool to monitor the overall situation of the system for a period of time.
When using Tencent Cloud database, you can also use related tools provided by Tencent Cloud. If you find the CPU of the database host,
If the usage rate of I/O and memory is high, there are two reasons:
1. There is a performance bottleneck in the database instance.
2. There is a problem with the hardware of the machine where the instance is located (the possibility is relatively small and easy to rule out).
Database Schema Analysis
At present, the company mainly focuses on the cloud database architecture. Relying on the capabilities of Tencent Cloud itself, the architecture analysis of this piece can be skipped or found.
Xunyun's after-sales support. server tuning
Database server tuning. Database parameters can be optimized in combination with business scenarios. The following is an optimized example.
For example: There is a transaction-intensive database server configured as follows:
CPU: 4-way 8-core.
Memory: 256G.
Disk array: 1T.
Then, the recommended parameter settings are as follows:
Identify slow SQL
Currently using Tencent Cloud database, they provide functions such as database monitoring and slow log monitoring, you can directly get slow
SQL. Next, these SQLs are classified according to the degree of concurrency:
1. The concurrency is very high. SQL features: the number of SQL entries is very small (according to 5% statistics), but the execution frequency is very high, even reaching
Hundreds of times per second, as long as it is slow, the system is likely to be paralyzed.
Optimization level: Highest priority.
Optimization direction:
Optimize the SQL itself and adjust it to the optimum.
Optimize the application that initiates SQL requests to reduce the number of executions. Including the use of caching, multiple requests are merged into one
etc.
2. General concurrency. SQL features: account for the majority (according to 80% statistics), if there are slow ones, it will have little impact on the overall stability of the system
Large, but it will cause some local operations to be slow.
Optimization level: Second priority processing.
Optimization direction: optimize SQL itself, comprehensively consider indexing, SQL rewriting and other methods.
3. Concurrency is rarely particularly slow. SQL features: the number is small (according to 15% statistics), often very complex queries, maybe one day
Executing it several times has little effect on the overall system, but it is very difficult to optimize.
Optimization level: Process last.
Optimization direction: optimize the SQL itself, and at the same time have a certain tolerance for this type of SQL.
Blocking and deadlock analysis
Deadlock and blocking are the two most common performance killers. You can use some SQL to observe the execution of SQL:
【show processlist】: This command can view the server thread status. Using this information, it is possible to find the time-consuming ratio
longer links and optimize them. You can also kill blocked threads. Using [show processlist] will list a
Series information, its meaning is as follows:
1, id. This is the only mark of the thread,【
kill {id}] can terminate the thread.
2. users. This field indicates the user who started the thread.
3, host. This field indicates the host that initiated the connection.
4, db. This field indicates which database is being operated on.
5, command. Commands representing operations, including SQL statements, are also commands.
6, time. Indicates how long the operation lasted.
7. state. Indicates what state the thread is currently in.
8. info. Sql statement containing the first 100 characters. If you want to view the complete SQL, you can use [show full processlist].
【show status】: This command can view the running status of the Mysql server, and the status information will be cleared after restarting.
You can also bring a scope, such as【
show global status】indicates to view the global status.
【show engine】: This command is used to view the status of the storage engine. Including the table lock and row lock information held by the transaction;
Lock waiting status of services; thread semaphore waiting; file IO request; buffer pool statistics, etc.
Characteristics of a deadlock:
1. Deadlock is mutual blocking.
2. The database automatically recognizes the deadlock and unlocks it.
3. Deadlock information will be recorded in the SQL log. Avoiding database deadlocks requires the following measures:
1. Reduce the number of transactions: Reducing the number of concurrent transactions is the first step to avoid deadlocks.
2. Unified transaction processing: try to put the update operations on the same database table in the same transaction to reduce deadlocks
Opportunity.
3. Access resources in the same order: All transactions access resources in the same order to avoid crossover, etc.
waiting situation.
4. Use low-level locks: Use low-level locks as much as possible, such as row locks instead of table locks, to reduce the chance of deadlocks.
5. Shorten the lock time: shorten the lock time as much as possible, and release the lock immediately after completion, so that other transactions can access it in time
Ask about related resources.
6. Regularly clean up useless locks: By regularly cleaning up locks that are no longer in use, deadlocks can be avoided.
7. Use the tools provided by the database to detect deadlocks: Most database systems provide tools for detecting deadlocks or
API, you can use these tools or APIs to monitor the deadlock situation of the database and take timely measures.
8. Optimize database design and query: optimize database table design and query statements, including index establishment and SQL query
Statement optimization, etc., can reduce the risk of database deadlock.
Blocking features:
1. Blocked SQL will not affect each other, it is one-way.
2. The database cannot automatically identify the blockage and handle it.
3. The blocking information will not be recorded in the SQL log, and the manifestation will only be that the execution time of a certain SQL is too long.
Handling blocking requires manual intervention. There are two general directions: reducing concurrent operations and reducing slow SQL. specific measure:
1. Use low-level locks: Try to use low-level row-level locks instead of high-level table-level locks to reduce the impact on resources
block.
2. Use indexes: By using index columns in the WHERE clause, the speed of SQL queries can be greatly accelerated, thereby reducing the
Less time spent on related resources.
3. Process data in batches: If you need to process a large amount of data, you can process it in batches to avoid excessive occupation at one time
H.
4. Avoid long transactions and long queries: Long-running transactions and long-running queries will continue to occupy resources, causing other
His operation is blocked. This should be avoided and useless transactions and queries cleaned up regularly.
5. Adjust database parameters: You can optimize SQL execution performance by adjusting database parameters, such as increasing the cache size
Small, adjust the thread pool size, etc.
6. Regularly optimize the database: Regularly check and optimize the database table structure, index, storage engine, etc., which can improve the data
Library performance and responsiveness.
7. Reasonable design of SQL query statement: Reasonable design of SQL query statement can effectively reduce the load and blockage of the system
risk.
SQL execution process analysis
By optimizing every link involved in the SQL execution process, the final result is the best. According to this idea, first come
Look at the SQL execution process: optimization of the client layer:
1. Reduce connection consumption: multiple requests are merged into one, and the size of the connection pool and connection release time are set reasonably. Note even
The number of connections is not the more the better. In addition to increasing resource consumption, too many connections will also introduce more thread switching costs.
Here is an empirical formula used to infer the number of connections: [Number of connections = number of CPU cores * 2 + 1]
2. Reduce query frequency: use cache reasonably and combine multiple requests into one.
Optimization of Mysql connection layer:
1. Increase the number of connections.
2. Release unused connections in time.
3. Tuning related parameters of Mysql.
Optimization of the query cache layer: The query cache layer is designed to be tangled, and it has been disabled by default, only when the read is far greater than the write field
Views are useful. This layer of optimization can be ignored.
SQL parser and preprocessor: The role of these two components is to generate a correct syntax tree. Its use is for users
It is a black box and cannot be intervened.
Query optimizer: This layer will generate multiple execution plans based on the syntax tree and optimize them. Optimizations for SQL, major sets
in here.
Execution engine: This layer is scheduled according to the execution plan. For users, it can only intervene indirectly by intervening in the execution plan
Execute the work of the engine.
Optimization of the storage engine layer: select the appropriate storage engine according to the business scenario. For example, archive tables often have more requirements on transactions
Weak, and mainly read, you can set to use the Mysiam engine. Another example is that some frequently used small tables can directly use Memory
The engine is resident in memory.
Analyze Execution Plan
Add explain before the SQL statement and execute it to list the execution plan of the SQL statement. The syntax format is as follows:
explain {sql}]. In versions prior to MySQL 5.6.3, only SELECT can be analyzed. MySQL5.6.3 and later can
Analyze update, delete and insert.
A typical execution plan information is as follows: You can see that there are multiple lines of execution plan information in the above SQL statement. The meaning expressed by these execution plans is that the SQL execution
Each piece of data is a step in the execution plan. Let's take a closer look at each field's
meaning.
id field:
id indicates the order in which SQL statements are executed. The execution plan with a larger number is executed first. If the numbers are the same, press from top to top
The following sequence is executed sequentially. For example, a subquery will take precedence over an outer query.
If it is a parallel query, there is no distinction between father and son, and the query optimizer will automatically optimize: it will use the Cartesian product to compare
Smaller results are used as intermediate results so that they are executed first. Mysql's optimizer is based on overhead, smaller intermediate nodes
results in less resource consumption. In addition, this also requires us to choose a smaller table as the driving table when optimizing.
select_type field:
This field indicates the type of query. There are many types of queries, here are a few common ones:
1, simple. simple represents a normal query, which does not contain subqueries or union queries.
2, primary. If an SQL statement contains subqueries, the outermost query type is primary.
3, subquery. If an SQL statement contains subqueries, the innermost subquery type is subquery.
4. derived. Means derived query. If a query operation uses a temporary table before getting the final result, then
Then the type of the query is derived. For example, for a union query, Mysql will first execute the statement on the right side of the union to get the
Put the temporary results into the temporary table, then execute the statement on the left of the union and associate it with the temporary table just now, and its type is derived.
5, union. When a union query is used, the query that appears to the right of the union will be marked as a union type.
6. Union result. This type represents the intermediate results generated after all union statements are executed. even if there are multiple
For union query, there is only one union result.
table field:
The table field indicates the tables used in the query process, including ordinary tables and intermediate result tables. There is no need to elaborate on ordinary watches
Yes, which table name is displayed means the query for that table. In the union query, sometimes the table field will display something like
<union1,3,4>] this kind of value. This value indicates that the result set generated by the query whose ID is 1, 3, and 4 is used as the current query.
The table used.
type field:
This field is critical, and its meaning is the connection type. The execution efficiency of different connection types varies greatly. The following is based on the performance
Arrange them in order of highest to lowest, and explain the meaning of each item separately:
1, system. system means to query the system table, and there is only one row of records in the table. This is the fastest, but is a special
For example, there is no value for discussion, and it will basically not appear.
2, const. It means that it was found by index once. There is only one case where data can be found through one index:
Query data through a unique index, and only one piece of data is queried. For example【
select * from t_table where
id=1】。
3, eq_ref. It usually appears in the join statement. For each index column in the former table, it can only be matched to the
The only result is that the connection type of the subsequent table is eq_ref. Since only unique results can be matched, it is probably a unique index
up. Generally speaking, this type will be matched when the condition of on after the join statement is a unique index.
4, ref. It is also commonly found in join statements. If you are not using a unique index, then the index in the previous table
The value of the column may correspond to multiple results in the back table, and the connection type of the back table is ref. Generally speaking, this type will be matched when the on condition after the join statement is a non-unique index.
5, range. range means to use the index for range lookup. usually【
in (...), between... and ...] or
[where column_a>{number}] This type of statement will match the range type.
6, index. index means a full table scan of the index. Scan all the indexes to find the corresponding results, this has been
Very slow. If you encounter an index type query, it means that it needs to be optimized.
Misunderstanding: Many people think that "index" means query through the index. This understanding is very wrong. The performance of index is very low. Say no
Certainly not as good as a sequential full table scan, at least without random reads.
7, all. all is the worst case, it means no index full table scan. We should try to avoid all.
8. Null. This is a special case. When the current query does not require a query table, its type will be Null. such as obtaining the system
Time does not need to query the table, so the type displayed in the execution plan is Null.
Our optimization of SQL statements needs to be above the range level.
possible_keys field:
This refers to the index that may be used in the query, so you don't need to pay special attention.
key field:
It refers to the index used in the actual query, which does not need to be paid special attention to. But if the value of this field is Null, it means that the check
The query does not use an index. Generally speaking, if the combined index does not meet the leftmost match, there will be "possible_keys"
value, but the "key" field is empty.
Extra field:
The word extra means extra. Here this field indicates what additional things have been done for the query, and also
A more important field. Regarding the value of this field, the following are commonly used:
1. Using filesort. If instead of indexed sorting, an extra collation is used, Extra will
Record Using filesort.
2. Using temporary. Tempoary means temporary workers, which means that Mysql uses temporary tables to store intermediate results
fruit. Generally speaking【
Statements such as group by, distinct] need to use "temporary workers".
3. Using index. It indicates that a covering index is used. If you hit the covering index, you don't need to go back to the table. In addition, the previously introduced
It is also mentioned that when the index condition is pushed down, the value of the extra field will also be Using index when the index condition is pushed down.
4. Using where. It means that after the storage engine returns the data, the server layer uses the where condition to filter the data.
5. Select tables optimized away. This means that the result can be returned without reaching the query stage. According to the official website
Saying, "In the absence of a GROUP BY clause, optimize MIN/MAX operations based on an index, or for MyISAM
The storage engine optimizes the COUNT(*) operation, so you don’t have to wait until the execution phase to perform calculations, and the phase of query execution plan generation is completed
optimization. "
After getting the data, the last step is to return the data to the client. Mysql uses an immediate return strategy. it
It does not wait for all the result sets to be returned together, but starts returning data to the client when the first result is generated. certainly,
If caching is configured, the result will be placed in the cache. The returned data goes through the network, so an obvious conclusion is: the transmitted
The less data the faster it returns. Therefore, when extracting data, you should specify the required fields instead of writing "*" mindlessly.
Index structure and tuning Mysql uses an improved B+ tree to store index data. B+ tree is an enhanced version of B tree, Mysql's B+ tree
Not quite the same as the traditional B+ tree, it has the following characteristics:
1. The number of keywords is equal to the number of paths.
2. In InnoDB, B+ tree nodes do not store actual data, and all data is placed in the bottom leaf node. these leaves
The nodes are projected into an ordered array.
3. In InnoDB, each leaf node maintains a pointer to its next leaf node, thus forming a
Doubly linked list. Since the leaf node projections of the B-tree are themselves ordered, this becomes an ordered doubly linked list. This sorted list pair
Range search is very friendly, and there is no need to go back to the root node to repeat IO.
4. The tree has many forks, which makes the structure of the tree very flat. Such a structure is able to store as much
While maintaining the tree, keep the number of IOs as small as possible. Assuming that the tree has only three layers, calculated according to 1KB of each piece of data, there are about 1.3 million subsections
It can store more than 20 million pieces of data.
5. When storing data, it is necessary to split and merge sub-nodes for readjustment.
The overall structure is as follows:
According to the above structure, the following optimization experience can be deduced:
1. Create indexes only when needed. As long as the index can meet the demand, the fewer the better. On the one hand the index itself is also
It will occupy a lot of database resources (disk + memory). On the other hand, when adding, deleting or modifying data, it is necessary to update the index synchronously
Leading the tree incurs additional overhead.
2. The indexed fields must have a relatively high degree of dispersion. This is because, if the dispersion of a field is too low, the scan
This B+ tree needs to walk more ways. When the dispersion is low to a certain extent, it is even better to go directly to the leaf nodes to scan the whole
Table, so that at least it can be used for sequential disk reading, which is faster than indexing.
3. Creating a union needs to comply with the leftmost matching principle. For a joint index, the key values ​​on the B+ tree are also constructed and matched from left to right
with index data. If the first hit on the left, just press the first one to go. A miss will continue to the second one, and so on.
4. Use the covering index to reduce back-to-table operations. According to the above B+ tree structure, the leaf nodes of the clustered index store the original
data. So if the return value of SQL happens to be the index value, then there is no need to go to the leaf node to get the number, and the return value can be returned directly at the index layer.
I'm back. 5. Ordered IDs reduce page splits. Since this B+ tree needs to ensure that the leaf nodes are in order, if an unordered ID is inserted, it will
Rearrange leaf nodes. But if the data itself is in order, then you only need to write it in order, which will be more efficient.
6. Try not to have function operations or subqueries in the query conditions. The index tree is placed at the storage engine layer and can only do simple logic
judge. If the logic can be kept simple when writing SQL, then this part of the logic can be pushed down to the storage engine layer to return only
file data. Otherwise, you can only scan as much data as possible in the simplest possible way, and then the execution engine filters out the data that does not meet the requirements.
Requested data, resulting in poor performance.
SQL optimization experience
In fact, once you master the analysis method of SQL, the optimization method will come out. There are a lot of piecemeal optimization experience, here is a brief
List some:
1. Try to avoid using functions in the WHERE clause, because it will invalidate the index.
2. Use JOIN statement instead of subquery, because JOIN statement is more efficient.
3. Make sure the table has proper indexes to speed up the query.
4. Avoid using "*" in the SELECT statement, and select only the required columns as much as possible.
5. Try to avoid using the OR operator, it will cause a full table scan instead of using an index.
6. For large data sets, please consider partitioning tables or use paging technology.
7. Use the EXPLAIN statement to check the query execution plan, find the problem and optimize it.
8. Use appropriate data types, such as using INT instead of VARCHAR to store numeric types.
9. Combine multiple single-line queries into one query to reduce the number of communications with the database server.
10. Regularly clean up unused indexes and tables to keep the database performance stable.
Personal experience and online information are too one-sided. If you want to systematically master the method of SQL optimization, it is recommended to read the official website. The following is the official
The optimization method provided by Fang: https://dev.mysql.com/doc/refman/5.7/en/optimizing-innodb-loggin
g.html
Mysql uses a cost-based optimizer, which works well most of the time. If the query optimizer does not perform as expected
To work in this way, you can use FORCE INDEX to scan the table and tell MySQL to force the index. Compared with using the given index, the table
Very expensive scan: 1 SELECT * FROM t1, t2 FORCE INDEX (index_for_column)
2 WHERE t1.col_name=t2.col_name;

Guess you like

Origin blog.csdn.net/u010800804/article/details/130805003