PostgreSQL database synchronous streaming replication source code analysis

I have searched some posts on the Internet before, and the big guys usually click on it. Recently, I tried to record the code that I analyzed, and I hope I can write it in as much detail as possible.

Synchronous streaming replication provides different levels of transaction reliability by configuring the synchronous_commit parameter:

off means that there is no need to wait for the local wal to persist when committing

local means that you need to wait for the wal persistence of the local database when committing

remote_write indicates that the commit needs to wait for the local wal log to persist, and the sync standby node needs to write the received log into the wal buffer

on means that the commit needs to wait for the local wal log to persist, and the wal of the sync standby node needs to be persisted

remote_apply indicates that you need to wait for the local wal log to persist, and you need to replay the wal log of the standby node

It can be seen that the reliability level in synchronous streaming replication mode is at least remote_write.

Then we will discuss the synchronous streaming replication process in detail:

The example shown in the figure is a synchronous stream replication cluster with one master and three backups. There are three client connections. For each slave database receiver process, there will be a unique sender process corresponding to it.

1. After the client receives the DML operation, after generating the execution plan through the compiler and the executor, it will start a transaction execution, modify the data in the memory to generate dirty data, and at the same time generate redo in the wal buffer of the memory. For the log, the wal write process is responsible for continuously flushing the redo log to the log file after the transaction is committed.

2. The sender process is responsible for reading the log data falling into the disk into the buffer package and sending it to the standby database. The receiver process continuously writes the log into the wal buffer, and then the wal write of the standby database flushes the log to the disk, waiting for the startup process to fall. The log of the disk is applied in the standby database. After the main database transaction commits, it needs to wait for the reply from the standby database before it can return to the client and continue to work.

(This part will be explained in detail in a later article)

So what if you can't get a reply from the backup database all the time? ?

The primary database will be suspended until the standby database responds. This article mainly talks about how the main library successfully returns to the client after the transaction is committed, which is also the core of synchronous streaming replication.

1. Take the above figure as an example, three backend processes are executed concurrently, and three logs are generated, that is, there are three LSNs, and each sender sends the three logs to the three backup databases respectively. When a transaction is committed, the corresponding backend process is added to the waiting queue.

2. The standby database replies the write or flush location to the main database, updates the write or flush of the corresponding sender, and then traverses the write and flush of each sender process in the sender array when the waiting backend process is released. Point, sort the points, take the smallest LSN, and then compare it with the global LSN. The global LSN records the log that the standby database has written or flushed. A LSN larger than the global value will remove the backend process from the waiting queue. Because the three walsenders are all in the workbench, each sender can release the backend process without waiting for the three senders to complete the release of a backend process at the same time. (Haha, I still think it's wonderful)

) Which one sender can release several backend processes at the same time?

Each backend process records a WaitLSN that executes the transaction, so in the process of releasing the sender, it will also traverse the waiting queue, compare the size of the waitLSN and the global LSN, if it is less than or equal to the global LSN, the backend will be removed from the waiting queue. Take out and return to the client.

Record several important structures and functions:

SyncRepWaitForLSN: After the transaction is committed, the backend process is added to the waiting queue to acquire the waitlatch lock.

SyncRepGetNthLatestSyncRecPtr : Sort the LSNs replied by the standby database

SyncRepWakeQueue: release the waiting queue

struct PGPROC: backend process structure

struct WalSndCtlData: A global structure that maintains synchronization data information

struct WalSnd: the structure of each sneder process

 

note: LSN refers to the serial number of the log. The log sequence number is composed of the file name + log offset (if you are interested, you can take a look at the LSN of MySQL).

 

 

 

 

 

 

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325072458&siteId=291194637