MySQL Basics (31) Other Database Tuning Strategies

1 Measures for database tuning

1.1 Tuning goals

  • As much as possible 节省系统资源so that the system can serve a larger load. (greater throughput)
  • Reasonable structural design and parameter adjustment to improve user operation 响应的速度. (faster response)
  • Reduce system bottlenecks and improve the overall performance of the MySQL database.

1.2 How to locate tuning problems

How to determine? Generally, there are several ways:

  • User feedback (main)
  • Log analysis (main)
  • Server resource usage monitoring
  • Database internal status monitoring
  • other

In addition to active session monitoring, we can also monitor 事务, 锁等待etc., which can help us
have a more comprehensive understanding of the operating status of the database.

1.3 Dimensions and steps of tuning

The object we need to tune is the entire database management system, which includes not only SQL queries, but also the deployment configuration, architecture, etc. of the database. From this perspective, the dimension of our thinking is not limited to SQL optimization. We sort it out through the following steps:

Step 1: Choose the right DBMS

Step 2: Optimize table design

Step 3: Optimize Logical Query

Step 4: Optimize physical query. Physical query optimization is to use physical optimization technology (such as index, etc.) to estimate various possible access paths
by calculating the cost model after determining the logical query optimization, so as to find the lowest cost execution method.
as an execution plan. 在这个部分中,我们需 要掌握的重点是对索引的创建和使用.

Step 5: Use Redis or Memcached as cache.
In addition to optimizing SQL itself, we can also ask for external help to improve query efficiency.

Because the data is stored in the database, we need to retrieve the data from the database layer and put it into the memory for business logic operations. When the number of users increases, frequent data queries will consume a lot of database resources. If we put commonly used data directly into memory, the query efficiency will be greatly improved.

Key-value storage database can help us solve this problem.

Commonly used key-value storage databases include Redis and Memcached, both of which can store data in memory.

Step 6: Library-Level Optimization

1. Read and write separation
Insert image description here
Insert image description here
2. Data fragmentation
Insert image description here
Insert image description here

However, it should be noted that while splitting improves database performance, it also increases maintenance and usage costs.

2 Optimize MySQL server

2.1 Optimize server hardware

服务器的硬件性能直接决定着MySQL数据库的性能. The performance bottleneck of the hardware directly determines the running speed and efficiency of the MySQL database. Improving hardware configuration for performance bottlenecks can increase the speed of MySQL database query and update. (1) 配置较大的内存(2) 配置高速磁盘系统(3) 合理分布磁盘I/O(4)配置多处理器

2.2 Optimize MySQL parameters

  • innodb_buffer_pool_size: This parameter is one of the most important parameters of the Mysql database, indicating the InnoDB type 表和索引的最大缓存. It 索引数据does n't just cache 表的数据; The larger this value is, the faster the query will be. But if this value is too large, it will affect the performance of the operating system.

  • key_buffer_size: means 索引缓冲区的大小. The index buffer is all there is to it 线程共享. Increasing the index buffer results in better index handling (for all reads and multiple writes). Of course, the bigger the value, the better. Its size depends on the size of the memory. If this value is too large, it will cause the operating system to change pages frequently and reduce system performance. For servers with memory 4GBaround , this parameter can be set to 256Mor 384M.

  • table_cache: means 同时打开的表的个数. The larger this value is, the more tables can be opened at the same time. The larger the physical memory, the larger the setting. The default is 2402, and it is best to adjust it to 512-1024. The bigger the value, the better, because too many tables opened at the same time will affect the performance of the operating system.

  • query_cache_size: means 查询缓冲区的大小. It can be observed in the MySQL console. If the value of Qcache_lowmem_prunes is very large, it indicates that there is often insufficient buffering, so it is necessary to increase the value of Query_cache_size; if the value of Qcache_hits is very large, it indicates that the query buffer is used very frequently. If the value is larger than If the value is small, it will affect the efficiency, so you can consider not querying the cache; Qcache_free_blocks, if the value is very large, it indicates that there are many fragments in the buffer. Invalid after MySQL8.0. This parameter needs to be used in conjunction with query_cache_type.

  • query_cache_typeWhen the value is 0, all queries do not use the query cache. However, query_cache_type=0 will not cause MySQL to release the cache memory configured by query_cache_size.

    • When query_cache_type=1, all queries will use the query cache unless specified in the query statement SQL_NO_CACHE, such as SELECT SQL_NO_CACHE * FROM tbl_name.
    • SQL_CACHEWhen query_cache_type=2, the query will use the query cache only if the keyword is used in the query statement . Using the query cache can improve the speed of queries. This method is only suitable for situations where there are few modification operations and the same query operations are performed frequently.
  • sort_buffer_size: means each 需要进行排序的线程分配的缓冲区的大小. Increasing the value of this parameter can increase the speed of the ORDER BYoperation GROUP BY. The default value is 2097144 bytes (approximately 2MB). For a server with a memory of about 4GB, the recommended setting is 6-8M. If there are 100 connections, the total allocated sort buffer size is 100 × 6 = 600MB.

  • join_buffer_size = 8M: Indicates that 联合查询操作所能使用的缓冲区大小, like sort_buffer_size, the allocated memory corresponding to this parameter is also exclusive to each connection.

  • read_buffer_size: means 每个线程连续扫描时为扫描的每个表分配的缓冲区的大小(字节). This buffer is needed when the thread reads records continuously from the table. SET SESSION read_buffer_size=n can temporarily set the value of this parameter. The default is 64K and can be set to 4M.

  • innodb_flush_log_at_trx_commit: indicates 何时将缓冲区的数据写入日志文件and writes the log file to disk. This parameter is very important for the innoDB engine. This parameter has 3 values, 0, 1 and 2. The default value of this parameter is 1.

    • A value of 0indicates 每秒1次how often data is written to the log file and the log file is written to disk. The commit of each transaction does not trigger any previous operations. This mode is the fastest, but less safe. The crash of the mysqld process will cause the loss of all transaction data in the previous second.
    • When the value 1is , it means 每次提交事务时writing data to the log file and writing the log file to disk for synchronization. This mode is the safest, but also the slowest. Because every transaction submission or instruction outside the transaction requires the log to be written (flush) to the hard disk.
    • When the value 2is , it means 每次提交事务时writing data to the log file and 每隔1秒writing the log file to disk. This mode is faster and safer than 0. Only when the operating system crashes or the system is powered off, all transaction data in the previous second may be lost.
    • innodb_log_buffer_size: This is for the InnoDB storage engine 事务日志所使用的缓冲区. In order to improve performance, the information is also written to the Innodb Log Buffer first. When the corresponding conditions set by the innodb_flush_log_trx_commit parameter are met (or the log buffer is full), the log will be written to the file (or synchronized to the disk).
  • max_connections: means 允许连接到MySQL数据库的最大数量, the default value is 151. If the status variable connection_errors_max_connections is not zero and keeps growing, it means that connection requests continue to fail because the number of database connections has reached the maximum allowed value. In this case, you can consider increasing the value of max_connections. Under the Linux platform, it is not difficult for a server with good performance to support 500-1000 connections. It needs to be evaluated and set based on the server performance. This number of connections 不是越大越好, because these connections waste memory resources. Too many connections may cause the MySQL server to freeze.

  • back_log: used for 控制MySQL监听TCP端口时设置的积压请求栈大小. If the number of MySql connections reaches max_connections, new requests will be stored in the stack to wait for a certain connection to release resources. The number of the stack is back_log. If the number of waiting connections exceeds back_log, connection resources will not be granted. An error will be reported. The default value before version 5.6.6 is 50, and the default value in later versions is 50 + (max_connections / 5). For Linux systems, it is recommended to set it to an integer less than 512, but the maximum does not exceed 900. If you need the database to handle a large number of connection requests in a short period of time, you can consider increasing the value of back_log appropriately.

  • thread_cache_size: 线程池缓存线程数量的大小, cache the current thread after the client disconnects, and respond quickly without creating a new thread when receiving a new connection request. This can greatly improve the efficiency of creating connections, especially for applications that use short connections. Then in order to improve performance, you can increase the value of this parameter. Default is 60, can be set to 120.
    The size of the thread pool can be appropriately adjusted through the following MySQL status values:

    mysql> show global status like 'Thread%';
    +-------------------+-------+
    | Variable_name | Value |
    +-------------------+-------+
    | Threads_cached | 2 |
    | Threads_connected | 1 |
    | Threads_created | 3 |
    | Threads_running | 2 |
    +-------------------+-------+
    4 rows in set (0.01 sec)
    

    When Threads_cached becomes less and less, but Threads_connected never decreases, and Threads_created continues to increase, the size of thread_cache_size can be increased appropriately.

  • wait_timeout: Specify 一个请求的最大连接时间, for a server with about 4GB memory, it can be set to 5-10.

  • interactive_timeout: Indicates the number of seconds the server waits for action before closing the connection.

Here is a reference configuration of my.cnf:

[mysqld]
port = 3306
serverid = 1
socket = /tmp/mysql.sock
skip-locking #Avoid MySQL's external locking, reduce the chance of errors and enhance stability.
skip-name-resolve #Prohibit MySQL from performing DNS resolution on external connections. Using this option can eliminate the time it takes for MySQL to perform DNS resolution. However, it should be noted that if this option is turned on, all remote host connection authorizations must use IP addresses, otherwise MySQL will not be able to process connection requests normally!
back_log = 384 key_buffer_size = 256M
max_allowed_packet = 4M
thread_stack = 256K
table_cache = 128K
sort_buffer_size = 6M
read_buffer_size = 4M
read_rnd_buffer_size=16M
join_buffer_size = 8M
myisam_sort_buffer_size = 64M
table_cache = 512
thread_cache _size = 64
query_cache_size = 64M
tmp_table_size = 256M
max_connections = 768
max_connect_errors = 10000000
wait_timeout = 10
thread_concurrency = 8 #The value of this parameter is the number of logical CPUs in the server 2. In this example, the server has 2 physical CPUs, and each physical CPU supports HT hyper-threading, so the actual value is 4 2=8
skipnetworking #Turn on this option to completely turn off MySQL's TCP/IP connection. If the WEB server accesses the MySQL database server through a remote connection, do not turn on this option! Otherwise, the normal connection will not be possible!
table_cache=1024
innodb_additional_mem_pool_size=4M #The default is 2M
innodb_flush_log_at_trx_commit=1
innodb_log_buffer_size=2M #The default is 1M
innodb_thread_concurrency=8 #Set as many CPUs as your server has. It is recommended to use the default value of 8
tmp_table_size=64M #The default is 16M, adjust to 64-256 and thread_cache_size
=120
query_cache_size=32M

Many situations require detailed analysis of specific circumstances!

3 Optimize database structure

3.1 Split table: separation of hot and cold data

Example 1 : 会员members表Store member login authentication information. There are many fields in this table, such as id, name, password, address, phone number, and personal description fields. Fields such as address, phone number, and personal description are not commonly used. These uncommon fields can be decomposed into another table. Name this table members_detail. There are segments such as member_id, address, telephone, and description in the table. In this way, the membership table is divided into two tables, namely members表and members_detail表 .
The SQL statements to create these two tables are as follows:

CREATE TABLE members (
id int(11) NOT NULL AUTO_INCREMENT,
username varchar(50) DEFAULT NULL,
password varchar(50) DEFAULT NULL,
last_login_time datetime DEFAULT NULL,
last_login_ip varchar(100) DEFAULT NULL,
PRIMARY KEY(Id)
);
CREATE TABLE members_detail (
Member_id int(11) NOT NULL DEFAULT 0,
address varchar(255) DEFAULT NULL,
telephone varchar(255) DEFAULT NULL,
description text
);

If you need to query a member's basic information or detailed information, you can use the member's ID to query. If you need to display the basic information and detailed information of members at the same time, you can jointly query the members table and members_detail table. The query statement is as follows:

SELECT * FROM members LEFT JOIN members_detail on members.id =
members_detail.member_id;

This decomposition can improve the query efficiency of the table. For tables with many fields and some fields that are used infrequently, database performance can be optimized through this decomposition.

3.2 Add intermediate table

Example 1 : The SQL statement for 学生信息表and 班级表is as follows:

CREATE TABLE `class` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`className` VARCHAR(30) DEFAULT NULL,
`address` VARCHAR(40) DEFAULT NULL,
`monitor` INT NULL ,
PRIMARY KEY (`id`)
) ENGINE=INNODB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
CREATE TABLE `student` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`stuno` INT NOT NULL ,
`name` VARCHAR(20) DEFAULT NULL,
`age` INT(3) DEFAULT NULL,
`classId` INT(11) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=INNODB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;

Now there is a module that needs to frequently query student information with the student's name (name), the name of the student's class (className), and the student's class monitor (monitor). A table can be created based on this situation temp_student. The temp_student table stores student name (stu_name), student class name (className) and student class monitor (monitor) information. The statement to create the table is as follows:

CREATE TABLE `temp_student` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`stu_name` INT NOT NULL ,
`className` VARCHAR(20) DEFAULT NULL,
`monitor` INT(3) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=INNODB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;

Next, query relevant information from the student information table and class table and store it in a temporary table:

insert into temp_student(stu_name,className,monitor)
		select s.name,c.className,c.monitor
		from student as s,class as c
		where s.classId = c.id

In the future, the student name, class name and class monitor can be queried directly from the temp_student table without having to perform a joint query every time. This can improve the query speed of the database.

3.3 Add redundant fields

When designing database tables, you should try to follow the conventions of paradigm theory and reduce redundant fields as much as possible to make the database design look refined and elegant. However, reasonably adding redundant fields can improve query speed.

The higher the degree of normalization of a table, the more relationships there are between tables and the more situations that require join queries. Especially when the amount of data is large and frequent connections are required, in order to improve efficiency, we can also consider adding redundant fields to reduce connections.

3.4 Optimize data types

Case 1: Optimize on integer type data.

It can be used when encountering fields of integer type INT 型. The reason for this is that INT type data has a large enough value range, so there is no need to worry about the data exceeding the value range. When you first start a project, you must first ensure the stability of the system, so it is okay to design field types. But when the amount of data is large, the definition of data types will affect the overall execution efficiency of the system to a great extent.

For 非负型data (such as auto-incrementing ID, integer IP), unsigned integer type should be used first UNSIGNEDto store. Because unsigned has a larger range of stored values ​​than signed, the same number of bytes. For example, tinyint is signed as -128-127 and unsigned as 0-255, which doubles the storage space.

Scenario 2: You can use either text type or integer type fields. You must choose to use the integer type.

Compared with text type data, large integers tend to occupy less storage space, so they can occupy less memory space when accessing and comparing. Therefore, when both are available, try to use the integer type, which can improve query efficiency. For example: convert IP address into integer data.

Case 3: Avoid using TEXT and BLOB data types

Case 4: Avoid using the ENUM type

Case 5: Using TIMESTAMP to store time

Case 6: Use DECIMAL instead of FLOAT and DOUBLE to store precise floating point numbers

In short, when encountering a project with a large amount of data, you must reasonably optimize the data type on the premise of fully understanding the business needs, so that you can fully utilize the efficiency of resources and achieve the optimal system.

3.5 Optimize the speed of inserting records

1. MyISAM engine table:

① Disable indexing

② Disable uniqueness check

③ Use batch insert

insert into student values(1,'zhangsan',18,1);
insert into student values(2,'lisi',17,1);
insert into student values(3,'wangwu',17,1);
insert into student values(4,'zhaoliu',19,1);

The situation of using one INSERT statement to insert multiple records is as follows:

insert into student values
(1,'zhangsan',18,1),
(2,'lisi',17,1),
(3,'wangwu',17,1),
(4,'zhaoliu',19,1);

The insertion speed of the second case is faster than that of the first case.

④ Use LOAD DATA INFILE to import in batches

2. InnoDB engine tables:
① Disable uniqueness check
② Disable foreign key check
③ Disable automatic submission

3.6 Use non-null constraints

When designing fields, if the business allows, it is recommended to use non-null constraints as much as possible

3.7 Analysis table, checklist and optimization table

1. Analysis table

MySQL provides the ANALYZE TABLE statement analysis table. The basic syntax of the ANALYZE TABLE statement is as follows:

ANALYZE [LOCAL | NO_WRITE_TO_BINLOG] TABLE tbl_name[,tbl_name]

By default, the MySQL service will write the ANALYZE TABLE statement to the binlog so that the slave service can synchronize data in the master-slave architecture. You can add the parameter LOCAL or NO_WRITE_TO_BINLOG to cancel writing statements to the binlog.

During the process of analyzing a table using ANALYZE TABLE, the database system will automatically add one to the table 只读锁. During analysis, only records in the table can be read, not updated or inserted. The ANALYZE TABLE statement can analyze InnoDB and MyISAM type tables, but it cannot act on views.

The statistical results after ANALYZE TABLE analysis will reflect cardinalitythe value of , which counts the number of unique values ​​in the column where a certain key is located in the table. 该值越接近表中的总行数,则在表连接查询或者索引查询时,就越优先被优化器选择用. That is, the greater the difference between the cardinality value of the index column and the total number of data in the table, the smaller the probability that the storage engine will use it when querying, even if the index is used as a query condition. Let's verify it with an example. cardinality can be viewed through SHOW INDEX FROM table name.

2. Checklist

Statements can be used in MySQL CHECK TABLEto check tables. The CHECK TABLE statement can check InnoDB and MyISAM type tables for errors. The CHECK TABLE statement will also add a read-only lock to the table during execution.

For MyISAM type tables, the CHECK TABLE statement also updates keyword statistics. Moreover, CHECK TABLE can also check whether the view has errors, such as a table referenced in the view definition that no longer exists. The basic syntax of this statement is as follows:

CHECK TABLE tbl_name [, tbl_name] ... [option] ...
option = {
   
   QUICK | FAST | MEDIUM | EXTENDED | CHANGED}

Among them, tbl_name is the table name; the option parameter has 5 values, namely QUICK, FAST, MEDIUM, EXTENDED and
CHANGED. The meaning of each option is:

QUICK: Do not scan lines and do not check for bad connections.
FAST: Only checks tables that have not been closed correctly.
CHANGED: Only checks tables that have been changed since the last check and tables that have not been closed properly.
MEDIUM: Scan lines to verify that the dropped connection is valid. It is also possible to calculate a keyword checksum for each row and use the calculated
checksum to verify this.
EXTENDED: Perform a comprehensive keyword search for all keywords in each row. This ensures that the table is 100% consistent, but
takes longer.
option is only valid for MyISAM type tables, not for InnoDB type tables. For example:
Insert image description here
this statement may generate multiple rows of information for the table being checked. The last line has a status Msg_type value, Msg_text is usually OK. If it is not OK, it usually needs to be repaired; if it is OK, the table is already up to date. The table is already up to date, which means the storage engine does not have to check the table.

3. Optimize tables

Method 1: OPTIMIZE TABLE

Statements are used in MySQL OPTIMIZE TABLEto optimize tables. However, the OPTILMIZE TABLE statement can only optimize fields of type
VARCHAR, BLOBor in the table TEXT. If a table uses the data types of these fields, you should use OPTIMIZE TABLE if 删除you have read a large portion of the table's data, or if you have made a lot of changes to a table with variable-length rows (a table with VARCHAR, BLOB, or TEXT columns). 更新to reuse unused space and organize data files 碎片.

OPTIMIZE TABLE 语句对InnoDB和MyISAM类型的表都有效。该语句在执行过程中也会给表加上只读锁
OPTILMIZE TABLE语句的基本语法如下:

OPTIMIZE [LOCAL | NO_WRITE_TO_BINLOG] TABLE tbl_name [, tbl_name] ...

LOCAL | NO_WRITE_TO_BINLOG关键字的意义和分析表相同,都是指定不写入二进制日志。
Insert image description here
执行完毕,Msg_text显示

‘numysql.SYS_APP_USER’, ‘optimize’, ‘note’, ‘Table does not support optimize, doing recreate + analyze instead’

原因是我服务器上的MySQL是InnoDB存储引擎。

到底优化了没有呢?看官网!

https://dev.mysql.com/doc/refman/8.0/en/optimize-table.html

在MyISAM中,是先分析这张表,然后会整理相关的MySQL datafile,之后回收未使用的空间;在InnoDB中,回收空间是简单通过Alter table进行整理空间。在优化期间,MySQL会创建一个临时表,优化完成之后会删除原始表,然后会将临时表rename成为原始表。

说明: 在多数的设置中,根本不需要运行OPTIMIZE TABLE。即使对可变长度的行进行了大量的更
新,也不需要经常运行, 每周一次 或 每月一次 即可,并且只需要对 特定的表 运行。

3.8 小结

上述这些方法都是有利有弊的。比如:

  • 修改数据类型,节省存储空间的同时,你要考虑到数据不能超过取值范围;
  • 增加冗余字段的时候,不要忘了确保数据一致性;
  • 把大表拆分,也意味着你的查询会增加新的连接,从而增加额外的开销和运维的成本。

因此,你一定要结合实际的业务需求进行权衡。

4 大表优化

4.1 限定查询的范围

Query statements without any conditions limiting the data range are prohibited. For example: when users query order history, we can control it
within the range of one month;

4.2 Read/write separation

In the classic database splitting scheme, the master database is responsible for writing and the slave database is responsible for reading.

  • One master and one slave mode:
    Insert image description here
  • Dual master dual slave mode:
    Insert image description here

4.3 Vertical split

When the data magnitude reaches 千万级above, sometimes we need to cut a database into multiple parts and put them on different database servers to reduce the access pressure on a single database server.
Insert image description here
垂直拆分的优点: It can make the column data smaller, reduce the number of blocks read during query, and reduce the number of I/O times. In addition, vertical partitioning can simplify the structure of the table and make it easier to maintain.
垂直拆分的缺点: The primary key will be redundant, redundant columns need to be managed, and JOIN operations will occur. In addition, vertical splits make matters more complex.

4.4 Horizontal split

Insert image description here
Below are two common solutions for database sharding
客户端代理: 分片逻辑在应用端,封装在jar包中,通过修改或者封装JDBC层来实现. Dangdang's Sharding-JDBC and Alibaba's TDDL are two commonly used implementations.
中间件代理: 在应用和数据中间加了一个代理层。分片逻辑统一维护在中间件服务中. The Mycat, 360's Atlas, NetEase's DDB, etc. we are talking about now are all implementations of this architecture.

5. Other tuning strategies

5.1 Server statement timeout processing

It can be set in MySQL 8.0 服务器语句超时的限制and the unit can be reached 毫秒级别. When the interrupted execution statement exceeds the set number of milliseconds, the server will terminate the transaction or connection that has little impact on the query, and then report the error to the client.
Setting the server statement timeout limit can be achieved by setting the system variable MAX_EXECUTION_TIME. By default, the value of MAX_EXECUTION_TIME is 0, which means there is no time limit. For example:

SET GLOBAL MAX_EXECUTION_TIME=2000;
SET SESSION MAX_EXECUTION_TIME=2000; #指定该会话中SELECT语句的超时时间

5.2 Create a global common table space

5.3 New features of MySQL 8.0: Hidden index helps tuning

Guess you like

Origin blog.csdn.net/zhufei463738313/article/details/130645797