Failed on my_net_write()

问题简介

日志报错

2018-07-16T16:48:58.391994+08:00 77 [Note] Aborted connection 77 to db: 'unconnected' user: 'ashe' host: '127.0.0.1' (Failed on my_net_write())

我们知道,mysqld是一个多线程的C/S架构的网络应用,因此少不了通过网络来读写数据,所以可能会出现写数据失败的情况。如果mysql的错误日志中出现此类错误,就说明是mysqld在向客户端发送网络包时失败导致的,当然,引申到复制场景,则说明是复制过程中,master向slave推送binlog时,写网络数据包失败。

演示tcp拥塞的情况

下面来演示一把,主从复制过程中,从机暂停读取网络包导致tcp拥塞的情况

  • 利用gdb断点到从机的read_event函数,此时从机读取网络包将会暂停
    image_1cih49t83eql1o3gpn01cfukeu9.png-368.7kB

  • 主库不停的操作,同时观察tcp链接情况
    image_1cih4efk21kufacm75qfqt1busp.png-839.9kB

  • 查看主库链接,日志
    image_1cih4k5pm10v1hl91dn513i41qs816.png-53.9kB
    image_1cih4kn821gs51k7415hg1ogn1nt01j.png-47.9kB

查看master dump thread逻辑

需要确定master在send binlog失败的情况下退出dump thread的逻辑,根据错误日志提示,进入到相关的代码查看。
错误代码在如下位置

inline int Binlog_sender::send_packet()
{
  DBUG_ENTER("Binlog_sender::send_packet");
  DBUG_PRINT("info",
             ("Sending event of type %s", Log_event::get_type_str(
                (Log_event_type)m_packet.ptr()[1 + EVENT_TYPE_OFFSET])));
  // We should always use the same buffer to guarantee that the reallocation
  // logic is not broken.
  if (DBUG_EVALUATE_IF("simulate_send_error", true,
                       my_net_write(
                         m_thd->get_protocol_classic()->get_net(),
                         (uchar*) m_packet.ptr(), m_packet.length())))
  {
    set_unknow_error("Failed on my_net_write()");
    DBUG_RETURN(1);
  }

调用关系为

(gdb) bt
#0  Binlog_sender::send_packet (this=0x7fea741655d0) at /data/mysql-server-explain_ddl/sql/rpl_binlog_sender.cc:1158
#1  0x000000000190f74e in Binlog_sender::send_packet_and_flush (this=0x7fea741655d0) at /data/mysql-server-explain_ddl/sql/rpl_binlog_sender.cc:1182
#2  0x000000000190e181 in Binlog_sender::send_heartbeat_event (this=0x7fea741655d0, log_pos=504) at /data/mysql-server-explain_ddl/sql/rpl_binlog_sender.cc:1143
#3  0x000000000190ee01 in Binlog_sender::wait_with_heartbeat (this=0x7fea741655d0, log_pos=504) at /data/mysql-server-explain_ddl/sql/rpl_binlog_sender.cc:633
#4  0x000000000190ecd7 in Binlog_sender::wait_new_events (this=0x7fea741655d0, log_pos=504) at /data/mysql-server-explain_ddl/sql/rpl_binlog_sender.cc:599
#5  0x000000000190e938 in Binlog_sender::get_binlog_end_pos (this=0x7fea741655d0, log_cache=0x7fea74165020) at /data/mysql-server-explain_ddl/sql/rpl_binlog_sender.cc:365
#6  0x000000000190c5e0 in Binlog_sender::send_binlog (this=0x7fea741655d0, log_cache=0x7fea74165020, start_pos=123) at /data/mysql-server-explain_ddl/sql/rpl_binlog_sender.cc:313
#7  0x000000000190c1b4 in Binlog_sender::run (this=0x7fea741655d0) at /data/mysql-server-explain_ddl/sql/rpl_binlog_sender.cc:225

结果层层返回到Binlog_sender::run
大致看下Binlog_sender::run的逻辑

void Binlog_sender::run()
{
    while (!has_error() && !m_thd->killed)
    {
     if (send_binlog(&log_cache, start_pos))
      break;
    }

}

解释到这里,大概就清楚了吧。

猜你喜欢

转载自blog.csdn.net/sun_ashe/article/details/81068528
今日推荐