On the MySQL database optimization

A mature database schema design did not start to have a highly available, highly scalable and other characteristics, it is with the increasing amount of users, infrastructure gradually improved. This blog mainly about problems and solutions to optimize MySQL database development cycle faced by front-end applications are not being put aside to say, roughly divided into the following five stages:

 

1, the database table design

After project approval, the Ministry of Development, according to the product requirements development projects, development engineers working on the table is part of the structural design. For the database, it is very important, if not properly designed, it will directly affect the access speed and user experience. Many factors, such as slow queries, inefficient queries without proper indexing, database blockage (deadlock) and so on. Of course, there are test engineers team will do stress tests, looking for bug. For the team did not test engineers, most engineers do not think too much about the early development of the database design is reasonable, but to achieve completion and delivery, and other items after a certain amount of access, hidden problems will be exposed as soon as possible, then it is not so easy to go to modify things.

 

2, database deployment

  The operation and maintenance engineers played, early in the project will not have much traffic, so a single deploy sufficient to meet in about 1500 QPS (queries per second rate). Considering the high availability, can be made from a MySQL master copy + Keepalived double-click the hot backup, common cluster software Keepalived, Heartbeat.

 

3, database performance optimization

  If you move the MySQL deployed to ordinary X86 server, in the case without any optimization, MySQL theoretical value normal can handle about 2000 QPS, is optimized, there may be raised to about 2500 QPS, otherwise, visits when reached about 1500 when concurrent connections, database processing performance will be slower, but still very rich hardware resources, then time to consider software problem. So how to make the database to play to maximize performance? On the one hand can be a single server run multiple MySQL instances allow play to maximize performance, on the other hand is to optimize the database, operating systems and databases often are more conservative default configuration, the database will have to play a certain limit, these configurations can appropriately adjusted, handle more connections as possible.

Specific Optimization There are three levels:

  3.1 database configuration optimization

  There are two commonly used MySQL storage engine, one is MyISAM, it does not support transactions, fast read performance processing, table-level locking. Another is InnoDB, supports transaction processing (ACID), is designed to handle large amounts of data to play maximize performance, row-level locking.

  Table lock: Small overhead, large-grained locking, deadlock high probability, is also complicated by relatively low.

  Line Lock: large overhead, lock small size, low probability of occurrence of a deadlock, concurrent also relatively high.

  Why is there a table and row locks it? Mainly to ensure data integrity, for example, a user operation in a table, other users also want to operate this table, then they would rankings a user operation is complete, users can operate other table and row lock lock is this role. Otherwise multiple users to simultaneously operate a table, the data will certainly be conflict or abnormal.

  According to the above view, use the InnoDB storage engine is the best choice, but also MySQL5.5 later in the default storage engine. Each storage engine associated parameters are more main impact database performance parameters are listed below.

公共参数默认值:
max_connections = 151  #同时处理最大连接数,推荐设置最大连接数是上限连接数的80%左右   
sort_buffer_size = 2M     #查询排序时缓冲区大小,只对order by和group by起作用,可增大此值为16M
open_files_limit = 1024   #打开文件数限制,如果show global status like 'open_files'查看的值等于或者大于open_files_limit值时,程序会无法连接数据库或卡死

  MyISAM Default parameters:
the key_buffer_size = # 16M index buffer size, typically 30-40% of physical memory provided
read_buffer_size = 128K # read buffer size, the recommended settings 16M or 32M
the query_cache_type = open the ON # query cache function
query_cache_limit = 1M # query cache limit, only 1M following query results are cached, so the resulting data to a larger buffer pool cover
query_cache_size = 16M # view the buffer size used to cache SELECT query results, the next time the same SELECT query directly from the cache pool return results, this value may be appropriately multiplied

InnoDB参数默认值:
innodb_buffer_pool_size = 128M        #索引和数据缓冲区大小,一般设置物理内存的60%-70%
innodb_buffer_pool_instances = 1      #缓冲池实例个数,推荐设置4个或8个
innodb_flush_log_at_trx_commit = 1  #关键参数,0代表大约每秒写入到日志并同步到磁盘,数据库故障会丢失1秒左右事务数据。1为每执行一条SQL后写入到日志并同步到磁盘,I/O开销大,执行完SQL要等待日志读写,效率低。2代表只把日志写入到系统缓存区,再每秒同步到磁盘,效率很高,如果服务器故障,才会丢失事务数据。对数据安全性要求不是很高的推荐设置2,性能高,修改后效果明显。
innodb_file_per_table = OFF              #默认是共享表空间,共享表空间idbdata文件不断增大,影响一定的I/O性能。推荐开启独立表空间模式,每个表的索引和数据都存在自己独立的表空间中,可以实现单表在不同数据库中移动。
innodb_log_buffer_size = 8M             #日志缓冲区大小,由于日志最长每秒钟刷新一次,所以一般不用超过16M

3.2 kernel optimization

  Most MySQL are deployed on linux system, so some parameters of the operating system will also affect the performance of MySQL, the proper optimization of the linux kernel.

net.ipv4.tcp_fin_timeout = 30  #TIME_WAIT超时时间,默认是60s
net.ipv4.tcp_tw_reuse = 1       #1表示开启复用,允许TIME_WAIT socket重新用于新的TCP连接,0表示关闭
net.ipv4.tcp_tw_recycle = 1     #1表示开启TIME_WAIT socket快速回收,0表示关闭
net.ipv4.tcp_max_tw_buckets = 4096   #系统保持TIME_WAIT socket最大数量,如果超出这个数,系统将随机清除一些TIME_WAIT并打印警告信息
net.ipv4.tcp_max_syn_backlog = 4096 #进入SYN队列最大长度,加大队列长度可容纳更多的等待连接

In linux system, if the process exceeds the number of open file handles default value of 1024, it will prompt "too many files open" message, so to adjust the open file handle limit.
# Vi /etc/security/limits.conf # add the following configuration, on behalf of all users, you can also specify the user to restart the system to take effect
 Soft nofile 65535
* Hard nofile 65535
# 65535 # ulimit -SHn take effect immediately

3.3 Hardware Configuration

Increase the physical memory, improve file system performance. When heat to store data, a delay mechanism is written via the file system, and so the condition linux kernel allocates a buffer area (system and data caches) from memory (e.g., buffer size reaches a certain percentage of the sync command, or execute) will not be synchronized to disk. That is the greater physical memory, the greater the buffer allocation, the more cache data. Of course, server failure will lose some of the cached data.

Instead of SAS SSD hard drives, RAID level is adjusted to the RAID1 + 0, RAID1 and RAID5 with respect to read and write better performance (IOPS), mainly from the database after the pressure disk I / O aspects.

 

4, database schema extensions

  With the growing volume of business, a single database server performance has been unable to meet business needs, consider adding the machine, and do a ~ cluster. The main idea is to break down a single database load, breaking the disk I / O performance, thermal data is stored in the cache, reducing the disk I / O access frequency.

  4.1 Copy the reader is separated from the main

Because the production environment, most databases are read, so the deployment of a multi-master architecture responsible for the write operation from the main database, and do double-click the hot standby, load balancing from multiple databases, responsible for the read operation, the load balancer has mainstream LVS, HAProxy, Nginx.

  How to read and write separate it? Most companies are in the code level to read and write separation efficiency is relatively high. Another ways to achieve separate read and write by the agent, enterprise applications less common agents have MySQL Proxy, Amoeba. In this database cluster architecture, high concurrency greatly increased database capacity to address a single performance bottlenecks. If the processing 2000 QPS from the database from a database, then the process can be 5 units 1w QPS, scale database easily.

  Sometimes, when faced with the application of a large number of write operations, write performance is not up to a single business needs. If you do double master, will encounter database data inconsistency, resulting in application of this because different users may have two database operations, while the operating result in two database update database data conflict or inconsistency. In a single library use MySQL storage engine mechanisms table and row locks to ensure data integrity, how to solve this issue in more than one main library it? Based on a set of perl language development master-slave replication management tool called MySQL-MMM (Master-Master replication managerfor Mysql, Mysql master-master replication manager), the biggest advantage of this tool is to provide a database write operation only at the same time, effectively ensure data consistency.

  4.2 increase the cache

  Database system to increase the cache, the cache memory to the thermal data, if the database returns the results, to improve the performance of the requested read data is not in the cache again. There are local cache and cache implementation distributed cache, the local cache is the cache memory data to a local server or a file. Distributed cache can cache massive amounts of data, scalability, mainstream distributed cache system memcached, stable redis, memcached performance data cached in memory, fast, QPS up to 8w. If you want to select data persistence with redis, the performance of not less than memcached.

  work process:

  

  4.3 Library

  The library is divided to different business related tables cut into different databases, such as web, bbs, blog libraries like. If traffic is heavy, also will call the shots after splitting from the library architecture, a single library to further avoid excessive pressure.

  Table 4.4

  Japanese TV dramas increase the amount of data in the database a table has millions of pieces of data, resulting in the insertion and query takes too long, how to solve the single-table pressure? You should consider whether to split the table into multiple small tables, a single table to reduce stress, improve processing efficiency, this is called sub-table.

  Sub-table technique is too much trouble, to modify the program code in SQL statements, but also to create other tables manually, you can also use the merge storage engine to achieve sub-meter, many relatively simple. After the points table, the program is a total operating table, the table does not always store data, only the relationship between some of the points table, and update data mode, the total table based on different queries, assigned to the pressure on different small table , thereby increasing concurrency and disk I / O performance.

  Sub-table is divided into horizontal and vertical split Split:

  Vertical Split: a lot of the original table field splitting multiple tables, to solve the problem of the width of the table. You can not separate the field into a common table, can also put a large table in a separate field or the closely related fields put a table.

  Split level: the original table split into a plurality of tables, each table structures are the same, to solve the problem of large single-table data.

  4.5 Partition

  After the data is partitioned into a plurality of blocks of a table according to the table fields in a structure (e.g., range, list, hash, etc.), these blocks may be on a disk or on a different disk partition, surface or on a table, but a plurality of positions in a data hash, this way, multiple disks simultaneously handle different requests, thereby improving the disk I / O read and write performance, relatively simple to achieve.

 

Note: Add the cache, sub-libraries, sub-tables and partitions mainly realized by the program ape.

 

5, database maintenance

  Database maintenance is a major operation and maintenance engineer or DBA work, including performance monitoring, performance analysis, such as performance tuning, database backup and recovery.

  5.1 Key performance indicators of the state

QPS,Queries Per Second:每秒查询数,一台数据库每秒能够处理的查询次数
TPS,Transactions Per Second:每秒处理事务数
通过show status查看运行状态,会有300多条状态信息记录,其中有几个值帮可以我们计算出QPS和TPS,如下:

Uptime:服务器已经运行的实际,单位秒
Questions:已经发送给数据库查询数
Com_select:查询次数,实际操作数据库的
Com_insert:插入次数
Com_delete:删除次数
Com_update:更新次数
Com_commit:事务次数
Com_rollback:回滚次数

那么,计算方法来了,基于Questions计算出QPS:
mysql> show global status like 'Questions';
mysql> show global status like 'Uptime';
QPS = Questions / Uptime

基于Com_commit和Com_rollback计算出TPS:
mysql> show global status like 'Com_commit';
mysql> show global status like 'Com_rollback';
mysql> show global status like 'Uptime';

TPS = (Com_commit + Com_rollback) / Uptime

另一计算方式:基于Com_select、Com_insert、Com_delete、Com_update计算出QPS
mysql> show global status where Variable_name in('com_select','com_insert','com_delete','com_update');
等待1秒再执行,获取间隔差值,第二次每个变量值减去第一次对应的变量值,就是QPS

TPS计算方法:    mysql> show global status where Variable_name in('com_insert','com_delete','com_update');

计算TPS,就不算查询操作了,计算出插入、删除、更新四个值即可。

经网友对这两个计算方式的测试得出,当数据库中myisam表比较多时,使用Questions计算比较准确。当数据库中innodb表比较多时,则以Com_*计算比较准确。

  5.2 turn on slow query log

MySQL开启慢查询日志,分析出哪条SQL语句比较慢,使用set设置变量,重启服务失效,可以在my.cnf添加参数永久生效。
mysql> set global slow-query-log=on  #开启慢查询功能
mysql> set global slow_query_log_file='/var/log/mysql/mysql-slow.log';  #指定慢查询日志文件位置
mysql> set global log_queries_not_using_indexes=on;   #记录没有使用索引的查询
mysql> set global long_query_time=1;   #只记录处理时间1s以上的慢查询
分析慢查询日志,可以使用MySQL自带的mysqldumpslow工具,分析的日志较为简单。
#mysqldumpslow -t 3 /var/log/mysql/mysql-slow.log    #查看最慢的前三个查询

也可以使用percona公司的pt-query-digest工具,日志分析功能全面,可分析slow log、binlog、general log。
分析慢查询日志:pt-query-digest /var/log/mysql/mysql-slow.log
分析binlog日志:mysqlbinlog mysql-bin.000001 >mysql-bin.000001.sql 
pt-query-digest --type=binlog mysql-bin.000001.sql 
分析普通日志:pt-query-digest --type=genlog localhost.log

  5.3 database backup

  Backup of the database is the most basic work, but also the most important, otherwise the consequences are serious, you know! But because the database is large, hundreds of G, backup is often very time-consuming, so the choice in respect of a high efficiency backup strategy for a large amount of data in the database, generally using incremental backups. Popular backup tools mysqldump, mysqlhotcopy, xtrabackup etc., mysqldump more suitable for small database, because it is a logical backup, so backup and recovery takes longer than that. mysqlhotcopy and xtrabackup is a physical backup, fast backup and recovery speed, without affecting database of service to a hot copy, we recommend using xtrabackup, supports incremental backups.

  Database Repair 5.4

  Sometimes MySQL server sudden power failure, abnormal shutdown, can cause damage to the table, the table data can not be read. Then you can use two tools that comes with MySQL repair, myisamchk and mysqlcheck.

myisamchk:只能修复myisam表,需要停止数据库
常用参数:
    -f --force    强制修复,覆盖老的临时文件,一般不使用
    -r --recover  恢复模式
    -q --quik     快速恢复
    -a --analyze  分析表
    -o --safe-recover 老的恢复模式,如果-r无法修复,可以使用此参数试试
    -F --fast     只检查没有正常关闭的表

快速修复weibo数据库:
#cd /var/lib/mysql/weibo 
#myisamchk -r -q *.MYI

mysqlcheck:myisam和innodb表都可以用,不需要停止数据库,如修复单个表,可在数据库后面添加表名,以空格分割
常用参数:
    -a  --all-databases  检查所有的库
    -r  --repair   修复表
    -c  --check    检查表,默认选项
    -a  --analyze  分析表
    -o  --optimize 优化表
    -q  --quik   最快检查或修复表
    -F  --fast   只检查没有正常关闭的表

快速修复weibo数据库:
    mysqlcheck -r -q -uroot -p123 weibo 

 
These are some of the major optimization program I use MySQL to sum up the past few years, limited capacity, some less comprehensive, but they can basically meet the needs of small and medium business databases.

Since the original intention of relational database design constraints, some BAT companies huge amounts of data into a relational database, the massive data query and analysis already reach better performance. So fire up NoSQL, non-relational database, large amounts of data, high performance, but also to make up for the lack of certain aspects of relational databases, most companies have gradually part of the business will be stored in the database to NoSQL, such as MongoDB, HBase and so on. Data storage distributed file systems, such as HDFS, GFS like. Data were analyzed using massive computing Hadoop, Spark, Storm and the like. These are related to the operation and maintenance of cutting-edge technology, is also a major learning objects in storage, small partners together on it! Which bloggers have a better optimization, welcome to share with you.

Guess you like

Origin blog.51cto.com/14013608/2440704