foreword
MySQL
Logs mainly include error logs, query logs, slow query logs, transaction logs, and binary logs. Among them, the more important ones belong 二进制日志 binlog(归档日志)
to 事务日志 redo log(重做日志)
Hehe undo log(回滚日志)
.
1、redo log?
redo log
(Redo log) is InnoDB
unique to the storage engine, which enables MySQL
crash recovery.
For example, if the MySQL instance hangs or goes down, when restarting, the InnoDB storage engine will use the redo log to restore the data to ensure the persistence and integrity of the data.
MySQL
The data in the database is in units of pages. When you query a record, a page of data will be loaded from the hard disk. The loaded data is called a data page, and it will be placed in Buffer Pool
.
Subsequent queries are all searched Buffer Pool
from , and then go to the hard disk to load if there is no hit, reducing the IO overhead of the hard disk and improving performance .
The same is true when updating table data. If you find that Buffer Pool
there is data to be updated in , you can directly Buffer Pool
update it in .
Then it will record "what modification was made on a certain data page" into the redo log cache ( redo log buffer
), and then flush the disk to the redo log file .
Ideally, the flushing operation will be performed as soon as the transaction is committed, but in fact, the timing of the flushing is performed according to the strategy.
Tips: Each redo record consists of "table space number + data page number + offset + modified data length + specific modified data"
2. Timing of brushing
InnoDB
The storage engine redo log
provides innodb_flush_log_at_trx_commit
parameters for the disk flushing strategy, which supports three strategies:
- 0: When it is set to 0, it means that the disk operation will not be performed each time the transaction is committed
- 1 : When set to 1, it means that every time a transaction is committed, a disk operation will be performed (default value)
- 2: When it is set to 2, it means that only
redo log buffer
the content will be written every time the transaction is committedpage cache
innodb_flush_log_at_trx_commit
The parameter defaults to 1 , which means that when the transaction is committed, it will be called fsync
to redo log
refresh the disk
In addition, InnoDB
the storage engine has a background thread that writes the content in to the file system cache ( ) every 1
seconds , and then calls flush.redo log buffer
page cache
fsync
In other words, a record that has not committed a transaction redo log
may also be flushed.
why?
Because redo log
the records will be written in the transaction execution process redo log buffer
, these redo log
records will be flushed by the background thread.
In addition to the polling operation of the background thread once per 1
second, there is another situation where the background thread will actively refresh the disk when redo log buffer
the occupied space is about to be reached .innodb_log_buffer_size
一半
The following is a flow chart of different brushing strategies.
innodb_flush_log_at_trx_commit=0
For the time being0
, if MySQL hangs or goes down, there may be 1
a second data loss.
innodb_flush_log_at_trx_commit=1
For the time being1
, as long as the transaction is submitted successfully, redo log
the record must be stored in the hard disk, and there will be no data loss.
If MySQL hangs or goes down during transaction execution, this part of the log is lost, but the transaction is not committed, so there will be no loss if the log is lost.
innodb_flush_log_at_trx_commit=2
At the end2
, as long as the transaction commits successfully, redo log buffer
the content in is only written to the file system cache ( page cache
).
If you just MySQL
hang up, there will be no data loss, but there may be 1
a second data loss if the system goes down.
3. Log file group?
There is not only one log file stored on the hard disk redo log
, but in the form of a log file group, and the size of each redo
log file is the same.
For example, it can be configured as a group of 4
files, and the size of each file is the content that 1GB
the entire redo log
log file group can record 4G
.
It adopts the form of a circular array, starting from the beginning to write, writing to the end and returning to the beginning to write in a loop, as shown in the figure below.
There are two important attributes in a log file group, namely write pos
,checkpoint
- write pos is the position of the current record, move backward while writing
- checkpoint is the current position to be erased, and it is also moved backwards
Every time the disk is swiped redo log
and recorded in the log file group , write pos
the position will be moved backwards and updated.
Every time MySQL loads the log file group to restore data, it will clear the loaded redo log
records and checkpoint
update them backwards.
write pos
checkpoint
The empty part between and can be used to write new redo log
records.
If write pos
it catches up checkpoint
, it means that the log file group is full, and no new redo log records can be written at this time. MySQL has to stop, clear some records, and advance the checkpoint.
4. Redo log summary?
Now let's think about a question: as long as the modified data page is directly flashed every time, it will be fine. What's the matter with the redo log?
Aren't they all brushes? Where is the difference?
1 Byte = 8bit
1 KB = 1024 Byte
1 MB = 1024 KB
1 GB = 1024 MB
1 TB = 1024 GB
In fact, the size of the data page is 100% 16KB
. It is time-consuming to flash the disk. It may be possible to modify a few bytes of data in the data page. Is it necessary to flash the complete data page?
Moreover, data page flashing is random writing, because the location corresponding to a data page may be in a random location of the hard disk file, so the performance is very poor.
If it is writing redo log
, one line of records 可能就占几十 Byte
only includes the table space number, data page number, disk file offset, update value, plus sequential writing, 所以刷盘速度很快
.
Therefore, using the redo log to record the modified content will far outperform the way of refreshing the data page, which also makes the database more concurrency.
In fact, the data pages of the memory will also be flashed at a certain time . We call this 页合并
, and we will elaborate on this when we talk about Buffer Pool.
5、binlog?
redo log
It is a physical log , and the record content is " what modifications have been made on a certain data page ", which belongs to InnoDB 存储引擎
.
It binlog
is a logical log , and the record content is the original logic of the statement , which is similar to "add 1 to the c field of the line ID=2", which belongs to MySQL Server
the layer.
Regardless of the storage engine used, as long as table data updates occur, binlog
logs will be generated .
So what is the binlog used for?
It can be said that the data backup, master-standby, master-master, and master-slaveMySQL
of the database are all inseparable , and they need to be relied on to synchronize data and ensure data consistency .binlog
binlog
Binlog will record all logical operations involving updating data, and it is written sequentially.
6. Record format
binlog
There are three formats for logs, which can binlog_format
be specified by parameters.
- statement
- row
- mixed
Specifies statement
that the content of the record is the original text of the SQL statement . For example, if one is executed update T set update_time=now() where id=1
, the content of the record is as follows.
When synchronizing data, the recorded SQL
statement will be executed, but there is a problem that update_time=now()
the current system time will be obtained here, and direct execution will result in inconsistency with the data in the original database.
In order to solve this problem, we need to specify that row
the content of the record is no longer a simple SQL statement, but also contains the specific data of the operation. The content of the record is as follows.
row
The content recorded in the format cannot see the detailed information, and it needs to mysqlbinlog
be parsed by tools.
update_time=now()
It becomes a specific time update_time=1627112756247
. @1, @2, and @3 behind the condition are the original values of the first to third fields of the data in the row (assuming this table has only 3 fields).
In this way, the consistency of the synchronized data can be guaranteed, which is usually designated as row, which can bring better reliability to the recovery and synchronization of the database.
However, this format requires a larger capacity for recording, which takes up more space, and consumes more IO resources during recovery and synchronization, which affects execution speed.
So there is a compromise solution, specified as mixed
, the recorded content is a mixture of the former two.
MySQL will judge whether this SQL statement may cause data inconsistency , if 是,就用row格式
, 否则就用statement格式
.
7. Writing mechanism
binlog
The timing of writing is also very simple. During the execution of the transaction, the log is written first binlog cache
, and when the transaction is committed, it is binlog cache
written binlog
to the file.
Because a transaction binlog
cannot be disassembled, no matter how large the transaction is, it must be written once, so the system will allocate a block of memory to each thread as the binlog cache .
We can binlog_cache_size
control binlog cache
the size of a single thread through parameters. If the storage content exceeds this parameter, it must be temporarily stored to disk ( Swap
).
The binlog log flushing process is as follows
- The write in the above figure refers to writing the log to the page cache of the file system, and does not persist the data to the disk, so the speed is relatively fast
- The fsync in the above figure is the operation of persisting data to disk
write
fsync
The timing of the sum can be controlled by parameters sync_binlog
, and the default is 0
.
When it is 0, it means that each time a transaction is submitted, it only writes, and the system judges when to execute fsync.
Although the performance is improved, but the machine is down, the binlog in the page cache will be lost.
For the sake of safety, it can be set to 1
indicate that fsync will be executed every time a transaction is committed, just like the redo log log flushing process.
Finally, there is a compromise method, which can be set to N (N>1), which means that every time a transaction is submitted, it will be written, but it will only be fsynced after accumulating N transactions .
In scenarios where IO bottlenecks occur, setting sync_binlog to a relatively large value can improve performance.
Similarly, if the machine goes down, the binlog logs of the last N transactions will be lost.
8. Two-phase commit
redo log
(Redo log) LetInnoDB
the storage engine have crash recovery capabilities.binlog
(Archive logs) guaranteeMySQL
the data consistency of the cluster architecture.
Although they all belong to the guarantee of persistence, they have different emphases.
In the process of executing the update statement, two logs will be recorded redo log
, binlog
based on the basic transaction, which redo log
can be continuously written during the execution of the transaction , and binlog
only written when the transaction is committed , so the timing of writing to the redo log and binlog is different. Same.
Back to the topic, redo log
if binlog
the logic between the two logs is inconsistent, what will happen?
Let's take update
the statement as an example, assuming that id=2
the value of field c in the record is 0, update the value of field c to 1, and the SQL statement is update T set c=1 where id=2
.
Assuming that after the redo log is written during the execution process, an exception occurs during the writing of the binlog log, what will happen?
Because the binlog is abnormal before it is finished, there is no corresponding modification record in the binlog at this time. Therefore, when the binlog log is used to restore data later, this update will be omitted, and the c value of the restored row is 0, while the original database is restored due to the redo log log, and the c value of this row is 1, and the final data is inconsistent.
In order to solve the problem of logical consistency between two logs, the InnoDB storage engine uses a two-phase commit scheme .
The principle is very simple, redo log
splitting the write into two steps prepare
sumcommit
, this is the two-phase commit.
After using two-phase commit , it will not affect the exception when writing to the binlog, because when MySQL restores data based on the redo log log, it finds that the redo log is still in the prepare stage and there is no corresponding binlog log, so the transaction will be rolled back .
Let’s look at another scenario. If an exception occurs in the redo log setting commit phase, will the transaction be rolled back?
It will not roll back the transaction, it will execute the logic framed in the above figure, although the redo log is in the prepare stage, but the corresponding binlog log can be found through the transaction id, so MySQL considers it complete, and will submit the transaction to restore the data.
9、undo log
We know that if we want to ensure the atomicity of transactions, we need to roll back the operations that have been performed when an exception occurs .
In MySQL, the recovery mechanism is implemented through the rollback log (undo log) , and all modifications made by transactions will be recorded in the rollback log first, and then related operations are performed. If an exception is encountered during execution, we 回滚
can directly use the information in the log to roll back the data to the state before the modification!
Moreover, the rollback log will be persisted to disk before the data . This ensures that even if the database suddenly goes down, when the user starts the database again, the database can still roll back unfinished transactions by querying the rollback log.
In addition, the implementation of MVCC depends on: hidden fields, Read View, and undo log . In the internal implementation, InnoDB judges the visibility of the data through DB_TRX_ID
the and of the data row. If it is not visible, it finds the historical version in the undo log through the DB_ROLL_PTR of the data row.Read View
The data version read by each transaction may be different. In the same transaction, the user can only see the modification that has been submitted before the transaction creates the Read View and the modification made by the transaction itself
10. Summary
The MySQL InnoDB
engine uses to redo log(重做日志)
guarantee transactions 持久性
, and uses undo log(回滚日志)
to guarantee transactions 原子性
.
The MySQL database 数据备份、主备、主主、主从
is inseparable binlog
and needs to rely on binlog to synchronize data.保证数据一致性。