Summary of MySQL data optimization methods

Database study notes record

1. Select the most applicable field attributes

1. The field width should be as small as possible

MySQL can well support the access of large amounts of data, but generally speaking, the smaller the table in the database, the faster the query executed on it. Therefore, when creating a table, in order to obtain better performance, we can set the width of the fields in the table as small as possible.
For example, when defining the postal code field, if it is set to CHAR(255), it obviously adds unnecessary space to the database, and even the use of VARCHAR is redundant, because CHAR(6) can be very good Completed the task. Similarly, if possible, we should use MEDIUMINT instead of BIGIN to define integer fields.

2. Try to set the field as non-NULL

Another way to improve efficiency is to try to set the field to NOTNULL when possible, so that the database does not need to compare NULL values when executing queries in the future.

3. Set the classification data to ENUM type

For some text fields, such as "province" or "gender", we can define them as ENUM type. Because in MySQL, ENUM type is treated as numeric data, and numeric data is processed much faster than text type. In this way, we can improve the performance of the database.

Two, use join (JOIN) instead of sub-queries (Sub-Queries)

Using subqueries can complete many SQL operations that logically require multiple steps to complete at one time. At the same time, it can also avoid transaction or table lockup, and it is easy to write. However, in some cases, sub-queries can be replaced by more efficient joins (JOIN)... For example, suppose we want to fetch all users who have no order records, we can use the following query to complete:

SELECT*FROM customer_info
WHERE customer_id  NOT IN (SELECT customer_id FROM sales_info)

If you use JOIN... to complete this query, the speed will be much faster. Especially when there is an index on CustomerID in the sales_info table, the performance will be better. The query is as follows:

SELECT*FROM customer_info
LEFT JOIN sales_info ON customer_info.customer_id=sales_info.customer_id
WHERE sales_info.customer_id IS NULL

The reason why JOIN is more efficient is that MySQL does not need to create a temporary table in memory to complete this logical two-step query.

Three, use union (UNION) instead of manually created temporary table

The UNION operator is used to combine the result sets of two or more SELECT statements.
Note that each SELECT statement within UNION must have the same number of columns. The columns must also have similar data types. At the same time, the order of the columns in each SELECT statement must be the same.

MySQL has supported UNION queries since version 4.0. It can combine two or more SELECT queries that require temporary tables into one query. At the end of the client's query session, the temporary table will be automatically deleted to ensure that the database is tidy and efficient. When using UNION to create a query, we only need to use UNION as a keyword to connect multiple select statements.

Fourth, use transactions (Transaction) to ensure data consistency and improve speed

Although we can use sub-queries (Sub-Queries), joins (JOIN) and unions (UNION) to create a variety of queries, not all database operations can be completed with only one or a few SQL statements of. More often, a series of statements are needed to complete some kind of work. But in this case, when a certain statement in the statement block runs incorrectly, the operation of the entire statement block becomes uncertain. Imagine that if you want to insert a certain data into two related tables at the same time, there may be such a situation: after the first table is successfully updated, the database suddenly appears unexpected, causing the operation in the second table to not be completed , In this way, it will cause incomplete data, and even destroy the data in the database.
To avoid this situation, you should use the transaction, its role is:

Either every statement in the statement block succeeded, or all failed. In other words, the consistency and integrity of the data in the database can be maintained. Things start with the BEGIN keyword and end with the COMMIT keyword. In the meantime, a SQL operation fails, then the ROLLBACK command can restore the database to the state before BEGIN started.
Another important role of transaction is that when multiple users use the same data source at the same time, it can use the method of locking the database to provide users with a safe access method, so as to ensure that the user's operation is not interfered by other users. .
Written in the form of a transaction, there is no need to connect to the database multiple times, which can improve performance and increase the speed of adding, deleting, checking and modifying.

BEGIN; //或start transaction
INSERT INTO salesinfo SET CustomerID=14;//语句1
UPDATE inventory SET Quantity=11 WHERE item='book';//语句2
COMMIT;
//若失败如 
//if(update失败) {rollback}

A common example is bank transfer. Account A transfers one hundred million (T1) to account B. In the process of this kind of transaction, there are several questions worth thinking about:
how to simultaneously ensure that the above transactions

The total amount of account A decreases by 100 million, and the total amount of account B increases by 100 million? A
A and C at the same time the account if the account transaction (T2), how to make two transactions independently of each other? I
If the transaction is completed sudden collapse of the database, how to ensure the success of the transaction data stored in the database?
How can D ensure the legitimacy of data (no money is created or disappeared out of thin air) while supporting a large number of transactions? C.
To ensure that the transaction is carried out normally and reliably, the database must solve the above four problems. This is the background of the birth of the transaction. It can solve the above four problems. Correspondingly, it has four major characteristics: namely ACID

Atomicity: The transaction is either completed or cancelled. If the transaction crashes, the state returns to before the transaction (the transaction rolls back). They are inseparable.
Isolation: If two transactions T1 and T2 run at the same time, the final results of transactions T1 and T2 are the same, regardless of
who ends T1 and T2 first. Isolation can be solved by locking.
Durability: Once the transaction is committed, no matter what happens (such as a database crash or error), the data can still be stored in the database. When the database is restarted after a crash due to irresistible reasons, it will ensure that the data will be saved to disk for successfully committed transactions, and the corresponding data will be rolled back for uncommitted transactions.
Consistency: Only legal data (according to relational constraints and functional constraints) can be written to the database. To ensure that money does not arise or disappear out of thin air in the system, rely on atomicity and isolation.

The database achieves this goal through the transaction log

If you write to the disk every time you update, because the data is random, it will cause a lot of random IO, and the performance will be very poor.
If you do not write to the disk immediately after each update, once the database crashes, the data will be lost

The compromise is:

The data changes are added to the log buffer in a chronological order in the form of a transaction log, and written into the transaction log by a specific algorithm. This is sequential IO, with better performance
. The transaction log is parsed by the data manager and written by a specific algorithm. plate

There is also the concept of isolation level for transactions. Different levels of isolation locks can be added to different businesses to improve performance.

Five, lock the table

Although the transaction is a very good way to maintain the integrity of the database, but because of its exclusivity, sometimes affect the performance of the database, especially in a large application system. Since the database will be locked during the execution of the transaction, other user requests can only temporarily wait until the end of the transaction. If a database system is used by only a few users, the impact of transactions will not become a big problem; but if there are thousands of users accessing a database system at the same time, for example, accessing an e-commerce website, it will produce More serious response delay.

In fact, in some cases we can obtain better performance by locking the table. The following example uses the method of locking the table to complete the transaction function in the previous example.

LOCK TABLE inventory WRITE SELECT Quantity FRO Minventory WHERE Item='book';
//一些计算写在这里
UPDATE inventory SET Quantity=11 WHERE Item='book';
UNLOCK TABLES

Here, we use a select statement to fetch the initial data, and through some calculations, update the new value to the table with an update statement. The LOCK TABLE statement containing the WRITE keyword can ensure that there will be no other access to insert, update, or delete the inventory before the UNLOCK TABLES command is executed.

Six, use foreign keys

The method of locking the table can maintain the integrity of the data, but it cannot guarantee the relevance of the data. At this time we can use foreign keys.

For example, foreign keys can ensure that every sales record points to an existing customer. Here, the foreign key can map the CustomerID in the customerinfo table to the CustomerID in the salesinfo table. Any record without a valid CustomerID will not be updated or inserted into salesinfo.

 
CREATE TABLE customerinfo( CustomerID INT NOT NULL,PRIMARYKEY(CustomerID)) TYPE=INNODB;
CREATE TABLE salesinfo( SalesID INT NOT NULL,CustomerID INT NOT NULL,PRIMARYKEY(CustomerID,SalesID),
FOREIGN KEY(CustomerID)REFERENCES customerinfo(CustomerID)ON DELETE CASCADE)TYPE=INNODB;

Note the parameter "ON DELETE CASCADE" in the example. This parameter guarantees that when a customer record in the customerinfo table is deleted, all records related to the customer in the salesinfo table will also be deleted automatically.
If you want to use foreign keys in MySQL, you must remember to define the type of the table as the transaction-safe InnoDB type when you create the table. This type is not the default type of MySQL tables. The method of definition is to add TYPE=INNODB to the CREATETABLE statement. As shown in the example.

Seven, use the index

Indexing can use the first letter A, B, C... arranged in the dictionary to help understand, for example, when we add an index to the data, we can quickly find the data based on the "initial letter" instead of each item Go to search.
Especially when the query statement contains MAX(), MIN() and ORDERBY commands, the use of indexes can improve database performance more significantly.

Which fields should be indexed?

Generally speaking, the index should be built on those fields that will be used for JOIN, WHERE judgment and ORDERBY sorting.
Note: Try not to index a field in the database that contains a lot of repeated values. For an ENUM type field, it is very likely that there will be a large number of duplicate values

The detailed operation of the index and the difference between single-column and combined index can be found in the following link (Frequent Exams for Program Ape Interview)
https://blog.csdn.net/S_ZaiJiangHu/article/details/114420976

8. Optimized query statement

In most cases, the use of indexes can improve the speed of queries, but if the SQL statement is not used properly, the index will not be able to play its due role.

The following are some aspects that should be paid attention to.

First of all, it is best to compare operations between fields of the same type.

Before MySQL 3.23, this was even a necessary condition. For example, an indexed INT field cannot be compared with a BIGINT field; however, as a special case, when the field size of a CHAR type field and a VARCHAR type field are the same, they can be compared.

Second, try not to use functions for operations on indexed fields.

For example, when the YEAE() function is used on a DATE type field, the index will not function as it should. Therefore, although the following two queries return the same results, the latter is much faster than the former.

Third, when searching for character fields, we sometimes use LIKE keywords and wildcards. Although this approach is simple, it also comes at the expense of system performance.
For example, the following query will compare every record in the table.

SELECT*FROMbooks

WHEREnamelike"MySQL%"

But if you switch to the following query, the result will be the same, but the speed will be much faster:

SELECT*FROMbooks

WHEREname＞=“MySQL"andname＜"MySQM”

Finally, care should be taken to avoid letting MySQL perform automatic type conversion in the query, because the conversion process will also make the index inoperative.