MySQL data latency issues and strategies brush plate

A, MySQL replication processes
official documents flow chart is as follows:
MySQL data latency issues and strategies brush plate

1, the absolute delay, relative synchronization

2, pure write operation, the standard line, the pressure is greater than the main library from the library, there are at least relaylog written from the library.

Analysis Question two, MySQL delay

1, master database frequently requested DML

The reason: the main library concurrent write data, and application log from the library as a single-threaded, it is likely to cause relaylog accumulation, resulting in delays.

Solution: do sharding, break up the write request. Consider upgrading to MySQL 5.7+, open parallel logic clock based replication.

2, the main library to perform large transaction

The reason: Similar to the main library takes a long time to update a large table, in the case of a master-slave configuration similar to the library, the library also need to spend almost the same amount of time to update this large table, this time delay from the library began to pile up, subsequent events Can not update.

Solution: Split a large transaction, the timely submission.

3, the main library to perform DDL statements on large tables

The reason: DDL is not started, is blocked check points in place unchanged; DDL is being executed, resulting in increased latency single-threaded applications, sites unchanged.

Solution: find blocked DDL or write queries, get rid of the queries, so that DDL properly executed from the library; business-peak hours to perform, try to use Online DDL support high version MySQL.

4, master-slave configuration examples inconsistent

The reason: the hardware: the main library server instance using SSD, SAS using ordinary disk, cpu frequency and inconsistent from the library instance of the server; the configuration: such as RAID card write policy is inconsistent, OS kernel parameter settings do not match, MySQL placing orders strategies (innodb_flush_log_at_trx_commit and sync_binlog etc.) and inconsistent

Solution: Try unified configuration DB machines (including hardware options and parameters); some even for OLAP business, from the hardware configuration database instances higher than the main libraries.

5, excessive pressure from the library itself

The reason: the implementation of a large number of select request from the library, or a service request is routed to the most select from the database instance, even a large number OLAP business, or being backed up from the library, etc., at this time may cause cpu load is too high, io high utilization, SQL Thread lead application is too slow.

Solution: build more from the library, read requests to break up and reduce the existing pressure from the library instance.

也可以调整innodb_flush_log_at_trx_commit=0和sync_binlog=0刷盘参数来缓解IO压力来降低主从延迟。

三、大促期间CPU过高问题

现象:

高并发导致CPU负载过高,处理请求时间拉长,逐步积压,最终导致服务不可用;大量的慢SQL导致CPU负载过高。

解决思路:

基本上是禁止或是慎重考虑数据库主从切换,这个解决不了根本问题,需要研发配合根治SQL问题,也可以服务降级,容器的话可以动态扩容CPU;和业务协商启动pt-kill查杀只读慢SQL;查看是否可以通过增加一般索引或是联合索引来解决慢SQL问题,但此时要考虑DDL对数据库影响。

四、InnoDB刷盘策略

MySQL的innodb_flush_method这个参数控制着innodb数据文件及redo log的打开、刷写模式,对于这个参数,文档上是这样描述的:
有三个值:fdatasync(默认),O_DSYNC,O_DIRECT
默认是fdatasync,调用fsync()去刷数据文件与redo log的buffer
为O_DSYNC时,innodb会使用O_SYNC方式打开和刷写redo log,使用fsync()刷写数据文件
为O_DIRECT时,innodb使用O_DIRECT打开数据文件,使用fsync()刷写数据文件跟redo log
首先文件的写操作包括三步:open,write,flush
上面最常提到的fsync(int fd)函数,该函数作用是flush时将与fd文件描述符所指文件有关的buffer刷写到磁盘,并且flush完元数据信息(比如修改日期、创建日期等)才算flush成功。
使用O_DSYNC方式打开redo文件表示当write日志时,数据都write到磁盘,并且元数据也需要更新,才返回成功。
O_DIRECT则表示我们的write操作是从MySQL innodb buffer里直接向磁盘上写。

这三种模式写数据方式具体如下:

fdatasync模式:写数据时,write这一步并不需要真正写到磁盘才算完成(可能写入到操作系统buffer中就会返回完成),真正完成是flush操作,buffer交给操作系统去flush,并且文件的元数据信息也都需要更新到磁盘。
O_DSYNC模式:写日志操作是在write这步完成,而数据文件的写入是在flush这步通过fsync完成
O_DIRECT模式:数据文件的写入操作是直接从mysql innodb buffer到磁盘的,并不用通过操作系统的缓冲,而真正的完成也是在flush这步,日志还是要经过OS缓冲。

MySQL data latency issues and strategies brush plate

MySQL data latency issues and strategies brush plate
1、在类unix操作系统中,文件的打开方式为O_DIRECT会最小化缓冲对io的影响,该文件的io是直接在用户空间的buffer上操作的,并且io操作是同步的,因此不管是read()系统调用还是write()系统调用,数据都保证是从磁盘上读取的;所以IO方面压力最小,对于CPU处理压力上也最小,对物理内存的占用也最小;但是由于没有操作系统缓冲的作用,对于数据写入磁盘的速度会降低明显(表现为写入响应时间的拉长),但不会明显造成整体SQL请求量的降低(这有赖于足够大的innodb_buffer_pool_size)。

2、O_DSYNC方式表示以同步io的方式打开文件,任何写操作都将阻塞到数据写入物理磁盘后才返回。这就造成CPU等待加长,SQL请求吞吐能力降低,insert时间拉长。

3、fsync(int filedes)函数只对由文件描述符filedes指定的单一文件起作用,并且等待写磁盘操作结束,然后返回。fdatasync(int filedes)函数类似于fsync,但它只影响文件的数据部分。而除数据外,fsync还会同步更新文件的元信息到磁盘。

O_DSYNC maximum pressure on the CPU, datasync followed minimum the O_DIRECT; overall processing SQL statements and response time performance look poor O_DSYNC; SQL throughput on the O_DIRECT preferably (after datasync mode), but the response time is the most long.

Datasync default mode, the overall performance is better, because the full use of the operating system buffer and processing performance innodb_buffer_pool, but the negative effects of free memory is reduced too quickly, leading to frequent page swapping, disk IO pressure, which can seriously affect stability of a large amount of concurrent data writing.

Guess you like

Origin blog.51cto.com/wangwei007/2416148