my16_sql_thread执行慢导致主从延迟高的一个情景


现象:从库延迟高,查看slave status发现sql_thread执行语句的速度比主库慢,这样的延迟会一直高下去,下面是排查的一些过程
1. 检查了从库的配置,磁盘的写入速度的确没有主库高
2. iostat -m 1 10查看磁盘写入 ,从库为2M/S,主库也就3M/S,从库磁盘没主库高,但这个档次的速度应该不影响
3. 将sync_binlog从1设置为,0,3,10,100 无效果,innodb_flush_log_at_trx_commit的值为2
4. 增加slave_parallel_workers的值,无效果
5. 然后又排查了内存相关参数,没有明显不合理的地方
6. 最后要从binlog中解析SQL,看看执行的都是哪些SQL,哪些表,在从binlog抽取SQL之前,查看了一下 show full processlist,看到了 System lock

然后就重启了一下slave,这是个作为备份用的从库,业务不访问,当时没有考虑锁的问题,应该再排查一下,有没有其他锁

解决方法为重启一下slave;

>show full processlist;
+---------+-------------+-----------+-------------+---------+---------+---------------------------------------------+--------------------------------------------------------------------------------------------+-----------+---------------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
+---------+-------------+-----------+-------------+---------+---------+---------------------------------------------+--------------------------------------------------------------------------------------------+-----------+---------------+
| 1 | system user | | NULL | Connect | 2698858 | Waiting for master to send event | NULL | 0 | 0 |
| 2 | system user | | NULL | Connect | 0 | Waiting for dependent transaction to commit | NULL | 0 | 0 |
| 3 | system user | | NULL | Connect | 67873 | System lock | UPDATE......... |


stop slave;
start slave;

>show full processlist;
+---------+-------------+-----------+-------------+---------+-------+---------------------------------------------+-----------------------+-----------+---------------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
+---------+-------------+-----------+-------------+---------+-------+---------------------------------------------+-----------------------+-----------+---------------+
| 3506710 | root | localhost | ad_dianjing | Query | 0 | starting | show full processlist | 0 | 0 |
| 3508115 | system user | | NULL | Connect | 53 | Waiting for master to send event | NULL | 0 | 0 |
| 3508116 | system user | | NULL | Connect | 0 | Waiting for dependent transaction to commit | NULL | 0 | 0 |
| 3508117 | system user | | NULL | Connect | 56827 | System lock | NULL | 0 | 0 |
| 3508118 | system user | | NULL | Connect | 56828 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 3508119 | system user | | NULL | Connect | 56828 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 3508120 | system user | | NULL | Connect | 56977 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 3508121 | system user | | NULL | Connect | 56981 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 3508122 | system user | | NULL | Connect | 57852 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 3508123 | system user | | NULL | Connect | 57852 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 3508124 | system user | | NULL | Connect | 57852 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 3508125 | system user | | NULL | Connect | 57852 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 3508126 | system user | | NULL | Connect | 57852 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 3508127 | system user | | NULL | Connect | 57852 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 3508128 | system user | | NULL | Connect | 57852 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 3508129 | system user | | NULL | Connect | 57852 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 3508130 | system user | | NULL | Connect | 57852 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 3508131 | system user | | NULL | Connect | 57852 | Waiting for an event from Coordinator | NULL | 0 | 0 |
| 3508132 | system user | | NULL | Connect | 57852 | Waiting for an event from Coordinator | NULL | 0 | 0 |
+---------+-------------+-----------+-------------+---------+-------+---------------------------------------------+-----------------------+-----------+---------------+
19 rows in set (0.00 sec)

Seconds_Behind_Master: 54622
然后就可以看到Seconds_Behind_Master的数值以每秒100的速度往下降

System lock的产生原因分析可以参考下面这篇文章
http://blog.itpub.net/7728585/viewspace-2149659

猜你喜欢

转载自www.cnblogs.com/perfei/p/9647982.html