binlog - the basis for logical replication

Ⅰ. Definition and role of binlog

1.1 Definitions

Record each logical operation of the database (including table structure changes and table data modifications)

Contains: binlog file and index file

1.2 Function

Replication: read the main library binlog from the library, and replicate locally by playback
Backup and recovery: the most recent logical backup data + binlog to achieve maximum possible recovery
Innodb recovery: When binlog is turned on, innodb transaction submission is a two-stage submission. When a crash occurs, there are two states of transactions in innodb, one is commit, and the other is prepared. The transaction in the prepared state needs to be judged according to binlog. Whether to commit or roll back to ensure master-slave data consistency

Ⅱ. Comparison of different types of binlog

-	statement	row	mixed
illustrate	SQL statement for logging operations	Record changes to each row of data	blend mode
advantage	easy to understand	High data consistency and flashback	Combining the above two modes
shortcoming	Uncertain SQL statements are not supported	Every table must have a primary key	The previous version had more bugs
use online	Not recommended	recommend	Not recommended

say it again row

Advantages: record the changes of each line of records to ensure strict consistency of the master-slave data
Disadvantages: The binlog file is large when the full table is updated and deleted, so it is not recommended to use MySQL for similar operations

If you adjust it to statement, you will find that the sql statement is recorded. If you don't say too much, it will basically not be used online.

When the amount of written data is large, in ROW format, commit will take more time, because he also has to write binlog (binlog is only written when submitting)

Assuming that a table of several million is updated, the generated binlog may be hundreds of megabytes. When committing, the amount of data written is hundreds of megabytes, so there will be a "blocking" waiting effect. But it is actually writing binlog to disk

Ⅲ. Relevant parameters and usage commands

log_bin=bin              默认不打开,和oracle一样,不管事务大小,提交速度都一样)
log_bin_basename         设置binlog名,不设置默认为机器名,直接用上面的log_bin=bin也表示二进制文件以bin开头
binlog_format            之前为statement,5.6有几个小版本用的mixed,5.7开始默认row了
max_binlog_size          限定单个binlog文件大小,默认1G
binlog_do_db
binlog_ignore_db         binlog过滤
sync_binlog              默认是0,binlog文件每次写入内容不会立刻持久化到磁盘,具体持久化是交给操作系统做,固系统崩溃会导致binlog的丢失和不一致,建议设置为1,事务写入到binlog后立即fsync到磁盘
flush binary logs;       新生成一个binlog
show master status;      查看当前的binlog

tips:

①What should I do when bin.999999 is full? add 1 in front

②The binlog file may be larger than max_binlog_size, because all events generated by a transaction must be recorded in the same binlog

Ⅳ. Content of binlog

4.1 index file

Orderly records all binlog files used by the current MySQL service

Do not modify the index file during the operation of MySQL to avoid problems

4.2 binlog file

Execute show binlog events in 'xxx';

View the contents of the binlog file, if you do not specify a file, you will see the first binlog file by default.

(root@localhost) [test]> show binlog events;
+------------+------+----------------+-----------+-------------+--------------------------------------------------+
| Log_name   | Pos  | Event_type     | Server_id | End_log_pos | Info                                             |
+------------+------+----------------+-----------+-------------+--------------------------------------------------+
| bin.000001 |    4 | Format_desc    |         3 |         123 | Server ver: 5.7.18-log, Binlog ver: 4            |
| bin.000001 |  123 | Previous_gtids |         3 |         154 |                                                  |
| bin.000001 |  154 | Anonymous_Gtid |         3 |         219 | SET @@SESSION.GTID_NEXT= 'ANONYMOUS'             |
| bin.000001 |  219 | Query          |         3 |         313 | create database test                             |
| bin.000001 |  313 | Anonymous_Gtid |         3 |         378 | SET @@SESSION.GTID_NEXT= 'ANONYMOUS'             |
| bin.000001 |  378 | Query          |         3 |         474 | use `test`; create table a (a int)               |
| bin.000001 |  474 | Anonymous_Gtid |         3 |         539 | SET @@SESSION.GTID_NEXT= 'ANONYMOUS'             |
| bin.000001 |  539 | Query          |         3 |         649 | use `test`; create table b (b int) engine=myisam |
| bin.000001 |  649 | Anonymous_Gtid |         3 |         714 | SET @@SESSION.GTID_NEXT= 'ANONYMOUS'             |
| bin.000001 |  714 | Query          |         3 |         786 | BEGIN                                            |
| bin.000001 |  786 | Table_map      |         3 |         830 | table_id: 219 (test.a)                           |
| bin.000001 |  830 | Write_rows     |         3 |         870 | table_id: 219 flags: STMT_END_F                  |
| bin.000001 |  870 | Xid            |         3 |         901 | COMMIT /* xid=18 */                              |
| bin.000001 |  901 | Anonymous_Gtid |         3 |         966 | SET @@SESSION.GTID_NEXT= 'ANONYMOUS'             |
| bin.000001 |  966 | Query          |         3 |        1038 | BEGIN                                            |
| bin.000001 | 1038 | Table_map      |         3 |        1082 | table_id: 219 (test.a)                           |
| bin.000001 | 1082 | Update_rows    |         3 |        1128 | table_id: 219 flags: STMT_END_F                  |
| bin.000001 | 1128 | Xid            |         3 |        1159 | COMMIT /* xid=21 */                              |
| bin.000001 | 1159 | Anonymous_Gtid |         3 |        1224 | SET @@SESSION.GTID_NEXT= 'ANONYMOUS'             |
| bin.000001 | 1224 | Query          |         3 |        1296 | BEGIN                                            |
| bin.000001 | 1296 | Table_map      |         3 |        1340 | table_id: 219 (test.a)                           |
| bin.000001 | 1340 | Delete_rows    |         3 |        1380 | table_id: 219 flags: STMT_END_F                  |
| bin.000001 | 1380 | Xid            |         3 |        1411 | COMMIT /* xid=22 */                              |
| bin.000001 | 1411 | Rotate         |         3 |        1452 | bin.000002;pos=4                                 |
+------------+------+----------------+-----------+-------------+--------------------------------------------------+
24 rows in set (0.00 sec)

It can be seen that binlog is composed of various events. The following analysis of event related content

field	meaning
(Log_name,Pos)	Information about the location where an event started
End_log_pos	Location information at the end of an event
Event_type	event type

①End_log_pos - Pos = the number of bytes occupied by each event

②show master status; Position represents the place where binlog writes to this offset, that is, so many bytes are written, that is, the size of the current binlog file

③The first four bytes of each binlog are reserved and no data is written

4.3 Event Type Analysis

Event_type	meaning
Format_desc	Start with a binlog file, record the version number of the server and the version number of the binary log, the 5.7 version occupies 119 bytes fixedly
Previous_gtids/Anonymous_Gtid	gtid added in version 5.7
Query	start a sql statement
Table_map	Which library table to operate on
Write_rows	Insert a record, but can't see it
Delete_rows	delete a record
Update_rows	update a record
Xid	Transaction commit, you can see the transaction number
Rotate	A binlog file ends, pointing to the start position of the next event (bin.xxx; pos=4)

Again, row records the situation of each record (each record of each operation is recorded), not the sql statement

V. Use of mysqlbinlog tool

5.1 Parsing binlog

1、[root@VM_0_5_centos src]# mysqlbinlog bin.000001
截取一段：
# at 1224
#171107 10:17:31 server id 3  end_log_pos 1296 CRC32 0xd4d80fa6     Query   thread_id=3 exec_time=0 error_code=0
SET TIMESTAMP=1510021051/*!*/;
BEGIN
/*!*/;
# at 1296
#171107 10:17:31 server id 3  end_log_pos 1340 CRC32 0x73b187fa     Table_map: `test`.`a` mapped to number 219
# at 1340
#171107 10:17:31 server id 3  end_log_pos 1380 CRC32 0x2e637fcd     Delete_rows: table id 219 flags: STMT_END_F

BINLOG '
uxcBWhMDAAAALAAAADwFAAAAANsAAAAAAAEABHRlc3QAAWEAAQMAAfqHsXM=
uxcBWiADAAAAKAAAAGQFAAAAANsAAAAAAAEAAgAB//4CAAAAzX9jLg==
'/*!*/;
# at 1380
#171107 10:17:31 server id 3  end_log_pos 1411 CRC32 0x2a6353fd     Xid = 22
COMMIT/*!*/;

这个解析出来at xxx什么的可以跟前面直接show binlog events对应起来,但是dml的内容有点小难懂,原因是为了方便传输解析出来的每行记录的内容被base64转换了

tips:
mysqlbinlog --base64-output=never xxx    非row格式下只看ddl,加密的dml不显示

2、[root@VM_0_5_centos src]# mysqlbinlog --base64-output=decode-rows -v bin.000001
row格式下可以将密文转为伪sql
同样截取一段
# at 966
#171107 10:17:23 server id 3  end_log_pos 1038 CRC32 0x00be64e0     Query   thread_id=3 exec_time=0 error_code=0
SET TIMESTAMP=1510021043/*!*/;
BEGIN
/*!*/;
# at 1038
#171107 10:17:23 server id 3  end_log_pos 1082 CRC32 0x5286fd55     Table_map: `test`.`a` mapped to number 219
# at 1082
#171107 10:17:23 server id 3  end_log_pos 1128 CRC32 0x1ed2714c     Update_rows: table id 219 flags: STMT_END_F
### UPDATE `test`.`a`
### WHERE
###   @1=1
### SET
###   @1=2
# at 1128
#171107 10:17:23 server id 3  end_log_pos 1159 CRC32 0xa254d40a     Xid = 21
COMMIT/*!*/;
# at 1159
#171107 10:17:31 server id 3  end_log_pos 1224 CRC32 0x76a7413c     Anonymous_GTID  last_committed=5    sequence_number=6
SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/;

看到的是每行记录的内容,@n表示第几列
切记这搞出来的绝对不是sql语句哈,他只管你一行记录的内容,不管你的sql

tips:
-vv 可以看到更详细内容,比如每个列的类型和属性,通常一个v够看
insert和delete记录一整行记录
update记录前项和后项。全表更新会导致二进制日志特别大

question:

binlog_format is set to row, only know the changes, but do not know the sql statement, what should I do?

solve:

Setting the parameter binlog_rows_query_log_events=1 is recommended to open

Look at the events of binlog again, there will be an event called Rows_query, which will record the sql that changes the content of the row

5.2 Common parameters

Analysis by time point

--start-datetime='xxx-xx-xx xx:xx:xx'
--stop-datetime='xxx-xx-xx xx:xx:xx'

Parse based on binary offset

--start-position=xxx
tips:
这是从xxx来解析,那从xxx+1开始呢？会报error,从这边开始读出来不是一个完整的event,xxx-1开始也是报错,读的时候,每个evnet都有个header,如果不是标准位置就会报错

ERROR: Error in Log_event::read_log_event(): 'read error', data_len: 16640, event_type: 90
ERROR: Could not read entry at offset 1158: Error in log format or read error.

--stop--position=xxx
    到xxx结束,并不包含xxx这个点
    特殊情况：如果指向了一个Table_map的events,会抛出了一个warning

    WARNING: The range of printed events ends with a row event or a table map event that does not have the STMT_END_F flag set. This might be because the last statement was not fully written to the log, or because you are using a --stop-position or --stop-datetime that refers to an event in the middle of a statement. The event(s) from the partial statement have not been written to output.

Usually find the position through datetime, and then restore it

Ⅵ, restore data through mysqlbinlog

mysqlbinlog binlog.00003 |mysql -S /tmp/mysql.sock -f
-f强制跳过错误
只恢复某一段,就加上--start-position或者--start-datetime等

Official documentation:

If there are multiple binary logs, it is not recommended to restore one by one, but use the following method

mysqlbinlog binlog.[0-9]* |mysql -u root -p

One by one recovery will report danger

illustrate:

If the operation is divided into two sessions, it will be considered to be operated in two sessions. If a temporary table is just used, one session exits, and the other session goes up and an error occurs.

Another way:

mysqlbinlog binlog.000001 > /tmp/statements.sql
mysqlbinlog binlog.000002 >> /tmp/statements.sql
mysql -u root -p -e "source /tmp/statements.sql"

VII. Clean up binlog

Here are three ways to clean up binlog:

法1：purge
purge binary logs to 'xxx';
清理xxxbinlog文件之前的内容
purge binary logs before 'xxx'
清理xxx日期之前的内容

法2：rm
step1：MySQL停止服务
step2：按顺序rm掉binlog文件
step3：编辑index文件,将rm掉的binlog文件从index中去掉

法3：配自动清理参数
[mysqld]
expire_logs_days=N
表示只保存N天的binlog,默认值是0,表示不删除

实现原理：
当binlog文件切换或者mysql服务启动时,遍历index文件,找到第一个"最后修改时间在N天之内的文件",然后将该文件之前的所有binlog全部删除

VIII. Other related issues

8.1 How to do incremental backup

Usually MySQL does not do incremental backup, unless it is a single point, because MySQL replication itself is doing incremental in real time, open binlog from the library, and back up the binlog from the library (flush binary logs; generate new logs and save the previous ones). )

Oracle incremental backup is still useful. In case of page crash, all logs need to be redone.

8.2 binlog playback in row format

A sql inserts 3 records, in fact, inserts 3 times, corresponding to 3 write_rows, parses this thing, and executes 3 sqls in disguise

A sql deletes 3 records, corresponding to a single delete_rows. When playing back, it will be played back according to the primary key. If there is no primary key, an index will be found to play back. If there is no index, the whole table will be scanned.

If there are 10w records in the table and there is no index, if you delete the whole table, each record will be scanned 10w times when you delete it, and the complexity is O(10w^2) times, but because the records are getting fewer and fewer, finally It will scan 10w + (10w * 10w-1) / 2 times, so why each table must have a primary key, here is another manifestation, the playback speed with a primary key will be much faster, especially delete and update

Note: If there is no primary key, don't talk about row_id, binlog is something of the server layer, and it has nothing to do with row_id

tips:

①MySQL5.6 introduced the following parameter to specify the scan algorithm, which can partially solve the problem of replication delay caused by tables without primary keys. The basic idea is to collect all the front mirrors in a ROWS EVENT, and then judge the HASH when scanning the whole table at one time. Update every record in

slave_rows_search_algorithms, the default value is table_scan, index_scan, another hash_scan can be configured, it is not enabled by default, and it is not recommended to use, because creating a hash table consumes a lot of money

②When using the innodb table, even if the other format binlog used in my.cnf is used, it will be forced to row when using rc

8.3 flash back

Binary log can achieve a very good function, used to save data, realize flash back, and undo is also used in oracle

For the event of the insert, if you want to flash back, you can do it as delete, delete as insert, and update the front and rear items.

I heard that 8.0 will support this tool, but now every Internet company has open sourced its own tools to implement flashback, but must use binlog_format in row format

8.4 binlog_cache

binlog is written to binlog_cache by default

The process of binlog generation

step	operate
1	binlog is written to the file handle cache corresponding to each session, which is the standard io cache
2	binlog flushes from the private cache of each session to the public cache, that is, the operating system cache
3	binlog syncs from memory to the file system, persists

The first step is that the sessions cannot see each other

The second step is that each session can see each other

Before that, as long as the machine crashes, the log will be lost accordingly

Special case: When encountering a large transaction, the binlog is very large, and the cache will be dropped if it cannot fit.

([email protected]) [(none)]> show global status like 'binlog_cache%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Binlog_cache_disk_use | 0     |   -- 记录使用临时文件记录binlog日志的次数(监控项)
| Binlog_cache_use      | 1     |   -- 记录使用缓冲写binlog日志的次数
+-----------------------+-------+
2 rows in set (0.01 sec)

([email protected]) [(none)]> show variables like 'binlog_cache_size';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| binlog_cache_size | 32768 |
+-------------------+-------+
1 row in set (0.00 sec)

默认为32K,sessioin级的内存变量,勿设置太大

In the production environment, we generally set sync_binlog to 1, so that binlog bypasses the cache and drops directly to the disk to ensure data integrity, so the above binlog_cache content can be understood.
The cache cannot write to the drop disk, and then write the binlog, that is, write the disk twice, which will slow down. If the parameter Binlog_cache_disk_use has a large number of times, you must consider increasing the binlog_cache_size, or check whether there are large transactions in the business (in the oltp scenario, try to split large transactions into small transactions)