How to optimize the database

What is a database?

To put it simply, a database is a software that stores data on a disk in a certain order. The sql statements we usually write are to use the language that the database software can recognize to add, delete, modify, and query data. In fact, the data does not exist in the table in essence, but on the disk. The so-called table is just the name of the data storage space.

how to optimize

The underlying logic of database optimization is to optimize the time for the CPU to read and write data from the disk. Generally, there are two ways to reduce the time, one is to increase the speed, and the other is to shorten the distance.

1. Improve the reading and writing speed of data:

Change hardware configuration:

You can replace the mechanical hard drive with a better solid-state drive by replacing it with a higher-end CPU and memory with a higher frequency.

After the hardware upgrade is not enough, it is necessary to change the configuration file of the database, so that the database can experience the hardware changes, and adopt a more radical strategy to read data. For mysql, the following two parameters of innodb can be used to configure:

innodb_io_capacity, innodb_io_capacity_max  controls the ability of innodb to brush dirty pages.

If it is too small, mysql will not be able to flush dirty pages enough, which will affect performance. Too large will make mysql think that the io capability is very strong, which will cause io spikes.

The nnodb_io_capacity
parameter defines the number of I/O operations (IOPS) available per second for InnoDB background tasks, such as  flushing dirty pages from the buffer pool and merging data from the change buffer.
The largest I/O performance index of the innodb background process, which affects the number of refreshed dirty pages and inserted buffers. Under high-speed disks, especially now that SSD disks are popular, the value of this parameter can be appropriately increased as needed.

Under pressure, control the amount of write IO performed by MySQL per second when refreshing dirty data.
Explain what "under pressure" is called "emergency" in MySQL. In order to allow new write operations to come in. Then, MySQL will use innodb_io_capacity_max.
So, how should innodb_io_capacity and innodb_io_capacity_max be set?
The best way is to measure the random write throughput of your storage setup, then set innodb_io_capacity_max to the maximum IOPS your device can achieve. innodb_io_capacity is set to 50-75% of it, especially if your system is mainly write operations.

Another point is to increase the size of the mysql buffer pool. Mysql does not directly read and write disk data in many cases. It will cache disk data into the memory through the buffer pool according to its own algorithm to improve query efficiency. It will also use the elimination algorithm. To update the hot and cold data in the cache to ensure the hit rate of the cache.

One thing to note is that the buffer pool size parameter is not as high as possible. Generally, it is necessary to continuously test and adjust to get a suitable result. For example, by increasing the parameter of innodb_io_capacity, although the speed of refreshing dirty pages is improved, refreshing dirty pages will block the data access of this page, and too many meaningless refreshes will also increase the burden on the CPU.

The following lists the common parameters for mysql data optimization configuration:

 

2. Achieve the optimization effect by shortening the query distance :

1. Index

The key to shortening the distance is the index. Establishing a suitable index can significantly improve the query speed.

If you don’t know much about indexes, you can read: MySQL index 15 consecutive questions, how many questions can you stick to?_YuanlongWang's Blog-CSDN Blog

Of course, when writing SQL, try to avoid full table scan caused by index failure. Under what circumstances will the index become invalid. Here is a table for your reference:

 2. SQL statement optimization

1) Join as little as possible. MySQL's strength is simplicity, but in some ways it's also its weakness. The MySQL optimizer is highly efficient, but due to the limited amount of statistical information, there are more possibilities for deviations in the optimizer's working process. For complex multi-table joins, on the one hand, due to the limitation of its optimizer, and on the other hand, insufficient efforts have been made in joins, so the performance still has a certain distance from the predecessors of relational databases such as Oracle. But if it is a simple single-table query, this gap will be extremely small, and even better than these database predecessors in some scenarios.

2) Sort as little as possible. Sorting operations consume more CPU resources, so reducing sorting can greatly affect the response time of SQL in scenarios where the cache hit rate is high and the IO capabilities are sufficient.

3) Try to avoid select *, and try to use join instead of subquery

4) Use the "or" keyword as little as possible. When multiple conditions in the where clause coexist with "or", MySQL's optimizer does not solve its execution plan optimization problem well, coupled with MySQL's unique SQL and Storage layered architecture, resulting in its The performance is relatively low. In many cases, using union all or union (when necessary) instead of "or" will get better results.

6) Try to use union all instead of union. The difference between union and union all is that the former needs to combine two (or more) result sets and then perform unique filtering operations, which involves sorting, increases a lot of CPU operations, and increases resource consumption and delay. So when we can confirm that duplicate result sets are impossible or don't care about duplicate result sets, try to use union all instead of union.

7) Avoid type conversion

8) Those who can use DISTINCT do not need GROUP BY

9) Try not to use the SELECT INTO statement 

10) Optimize from a global perspective, rather than one-sided adjustments. SQL optimization cannot be performed on a single one, but should fully consider all the SQL in the system, especially when optimizing the SQL execution plan by adjusting the index.

3. Table structure optimization

The MySQL database is a database based on row (Row) storage, and the database operates IO in the form of page (block). That is to say, if the amount of space occupied by each record is reduced, each page will be As the number of data rows that can be stored in the database increases, the number of rows that can be accessed by each IO also increases. Conversely, to process data with the same number of rows, the pages that need to be accessed will be reduced, that is, the number of IO operations will be reduced, which will directly improve performance.

data type selection

The principle is: the length of the data row should not exceed 8020 bytes. If it exceeds this length, this data will occupy two rows in the physical page, causing storage fragmentation and reducing query efficiency; the length of the field should meet the possible needs to the maximum extent. Under the premise, it should be set as short as possible, which can improve query efficiency and reduce resource consumption when indexing. ??

1) Number type: Do not use DOUBLE unless it is absolutely necessary. It is not only a question of storage length, but also a question of accuracy. Similarly, it is not recommended to use DECIMAL for fixed-precision decimals. It is recommended to multiply by fixed multiples and convert them into integers for storage, which can greatly save storage space and will not bring any additional maintenance costs.

2) Character type: fixed-length fields, it is recommended to use CHAR type (char query is fast, but consumes storage space, and can be used for fields with little change in length such as user name and password), and variable-length fields try to use VARCHAR (varchar query is relatively slow However, it saves storage space and can be used for fields with large length changes such as comments), and only sets an appropriate maximum length, instead of setting a very large maximum length limit arbitrarily, because MySQL will have different length ranges. Same storage handling.

3) Time type: try to use TIMESTAMP type, because its storage space only needs half of DATETIME type. For data types that only need to be accurate to a certain day, it is recommended to use the DATE type, because its storage space only needs 3 bytes, which is less than TIMESTAMP. It is not recommended to store a unix timestamp value through the INT type class, because it is too unintuitive, it will bring unnecessary trouble to maintenance, and it will not bring any benefits.

4) ENUM & SET: For the status field, you can try to use ENUM to store it, because the storage space can be greatly reduced, and even if a new type needs to be added, as long as it is added at the end, the structure modification does not need to rebuild the table data.

Character Encoding

The character set directly determines how the data is stored and encoded in MySQL. Because the same content uses different character sets to represent the space occupied by different character sets, the size of the space occupied will be quite different, so by using the appropriate character set, it can help us reduce data as much as possible. amount, thereby reducing the number of IO operations.

Try to use NOT NULL

The NULL type is special, and SQL is difficult to optimize. Although the MySQL NULL type is different from Oracle's NULL, it will enter the index, but if it is a composite index, then this NULL type field will greatly affect the efficiency of the entire index. Although there may indeed be a certain saving in NULL space, it brings about many other optimization problems. Not only does it not save the IO amount, but it increases the IO amount of SQL. So try to ensure that the DEFAULT value is not NULL, which is also a good table structure design optimization habit.

4. Database architecture optimization

distributed and clustered

1) Load balancing. The load balancing cluster is composed of a group of mutually independent computer systems, which are connected through a conventional network or a dedicated network, connected by a router, and each node cooperates with each other, shares the load, and balances the pressure. For the client, the entire cluster can be viewed A standalone server with ultra-high performance. MySQL generally deploys a high-availability load-balancing cluster with read-write separation, and generally only load-balances reads.

2) Read and write separation. The separation of reading and writing is simply to separate the operations of reading and writing to the database to correspond to different database servers, which can effectively reduce the pressure on the database and also reduce the pressure on io. The main database provides write operations, and the slave database provides read operations. In fact, in many systems, it is mainly read operations. When the master database performs a write operation, the data must be synchronized to the slave database, so as to effectively ensure the integrity of the database.

3) Data segmentation. Through certain specific conditions, the data stored in the same database is scattered and stored in multiple databases to realize distributed storage, and access to specific databases is routed through routing rules, so that each access is not faced with a single Server, but N servers, so that the load pressure on a single machine can be reduced.

Guess you like

Origin blog.csdn.net/lwpoor123/article/details/130220148