XLOG generation for PostgreSQL 9.6 source code analysis

One, WAL log overview

The wal subsystem in pg exists for failure recovery, and it is also used for point-in-time recovery and hot-standby replication based on log migration. I want to describe the design concept of a little wal log below.
A basic assumption of wal logs is that log entries must be persisted to stable storage (such as hard disks) before the data change page it describes. This ensures that replaying the log to its end will allow the database to reach a consistent state again (there are no partially executed transactions). To achieve this, every data page (heap or index page) is marked with the LSN number of the latest xlog record that affects this page.
The LSN number, the full name of Log sequence number, actually represents the location of the wal file.
During wal replay, you can check the LSN of the page to determine whether this xlog record has been replayed (if the page LSN> the location of the log entry, the log has been replayed).

Two, transaction log and WAL segment file

xlog records in detail the operation process of the service process on the database. The xlog log file is stored in memory in pages. Each page is 8kb in size, and each page has a header. The xlog log records are followed by the header information.

Each xlog file has an ID, but in fact it is divided into a fixed size (default 16MB, can be specified by -wal-segsize in initdb) XLOG segment file to store. The xlog file number and segment file number can be used to uniquely identify this segment file. When determining the address of a log record in a log file, only an XLOG file number and the offset of the log record in the file are used.

Three, xlog file initialization

When initdb initializes data and initializes template1 template library in function bootstrap_template1, AuxiliaryProcessMain in postgres program is called through popen, BootStrapXLOG function is called in AuxiliaryProcessMain to complete the initialization of XLOG segment file.
Insert picture description here

Four, xlog file name

When initdb generates the xlog file, use the following macro to generate the file name,

#define XLogFilePath(path, tli, logSegNo)	\
	snprintf(path, MAXPGPATH, XLOGDIR "/%08X%08X%08X", tli,				\
			 (uint32) ((logSegNo) / XLogSegmentsPerXLogId),				\
			 (uint32) ((logSegNo) % XLogSegmentsPerXLogId))
#define XLOG_SEG_SIZE (16 * 1024 * 1024)
#define XLogSegmentsPerXLogId	(UINT64CONST(0x100000000) / XLOG_SEG_SIZE)


note: XLogSegmentsPerXlogId value is 256, XLOG_SEG_SIZE is the segment size 16MB
. Variable in macro:
path: represents the file name
tli: represents the timeline, the initial timeline is 1
logSegNo: represents the segment number, the initial value is 1

The above value is substituted into the macro. After formatting by snprint, the path value is:
pg_xlog path + timeline + (uint32) segment number/256+ segment number %256 --%08X represents the display in 8-digit hexadecimal number. The
result is:
pg_xlog /+00000001+00000000+00000001
Note: Here, it can be seen that the maximum value of the last two digits of the segment file name is 256. When converted to hexadecimal, the maximum value of the last 8 digits of the segment file name is 000000FF.

Five, xlog storage structure

Each XLOG file has an ID. In fact, a logical xlog file is physically divided into fixed-size segments (16MB by default) for storage. The xlog file number and segment number can uniquely determine this segment file. When determining the address of a log record in a log file, you only need to use the xlog file number and the offset of the log record in the file.

For the first page of the first section of each Xlog file is a Long Header, whether the header of an xlog file is a long header can be determined by the flag xlp_info of the XLogPageHeaderData in the header, if it is Long header, then:
XLogPageHeader page;
page->xlp_info = XLP_LONG_HEADER;

1) Header information of xlog log page The
xlog log file is divided into many logical segments, and each segment file is divided into many pages, and each page is the size of a block. For each log page, you need to write a header information XLogPageHeaderData in its head, and its structure is as follows:

/*
 * Each page of XLOG file has a header like this:
 */
#define XLOG_PAGE_MAGIC 0xD093	/* can be used as WAL version indicator */

typedef struct XLogPageHeaderData
{
    
    
	uint16		xlp_magic;		/* magic value for correctness checks */
	uint16		xlp_info;		/* flag bits, see below */
	TimeLineID	xlp_tli;		/* TimeLineID of first record on page */
	XLogRecPtr	xlp_pageaddr;	/* XLOG address of this page */

	/*
	 * When there is not enough space on current page for whole record, we
	 * continue on the next page.  xlp_rem_len is the number of bytes
	 * remaining from a previous page.
	 *
	 * Note that xl_rem_len includes backup-block data; that is, it tracks
	 * xl_tot_len not xl_len in the initial header.  Also note that the
	 * continuation data isn't necessarily aligned.
	 */
	uint32		xlp_rem_len;	/* total len of remaining data for record */
} XLogPageHeaderData;

#define SizeOfXLogShortPHD	MAXALIGN(sizeof(XLogPageHeaderData))

typedef XLogPageHeaderData *XLogPageHeader;

If a page is the first page of a logical segment file, then the XLP_LONG_HEADER flag will be set in the page header information flag, then a long XLOG page header XLogLongPageHeaderData will be used based on the original page header information, and its structure is as follows :

/*
 * When the XLP_LONG_HEADER flag is set, we store additional fields in the
 * page header.  (This is ordinarily done just in the first page of an
 * XLOG file.)	The additional fields serve to identify the file accurately.
 */
typedef struct XLogLongPageHeaderData
{
    
    
	XLogPageHeaderData std;		/* standard header fields */
	uint64		xlp_sysid;		/* system identifier from pg_control */
	uint32		xlp_seg_size;	/* just as a cross-check */
	uint32		xlp_xlog_blcksz;	/* just as a cross-check */
} XLogLongPageHeaderData;

#define SizeOfXLogLongPHD	MAXALIGN(sizeof(XLogLongPageHeaderData))

typedef XLogLongPageHeaderData *XLogLongPageHeader;

2) xlog log record structure information
XLOG Record consists of two parts. The first part is the header information of XLOG Record, with a fixed size (24 Bytes), and the corresponding structure is XLogRecord; the second part is XLOG Record data.
xlog record storage format:
Fixed-size header (XLogRecord struct)
XLogRecordBlockHeader struct
XLogRecordBlockHeader struct

XLogRecordDataHeader[Short|Long] struct
block data
block data

main data

XLOG Record is divided according to the content of the stored data, which can be roughly divided into three categories:

Record for backup block: Store full-write-page block, this type of Record is to solve the problem of page partial writing. The data page is modified for the first time after the checkpoint is completed, and the entire page is written when the change is recorded and written into the transaction log file (the parameter full_page_write needs to be set, and the default is on);

Record for tuple data block: store the tuple changes in the page, use this type of Record record;

Record for Checkpoint: When a checkpoint occurs, the checkpoint information is recorded in the transaction log file.

XLogRecord records the related control information of Xlog records,

typedef struct XLogRecord
{
    
    
	uint32		xl_tot_len;		/* total len of entire record */
	TransactionId xl_xid;		/* xact id */
	XLogRecPtr	xl_prev;		/* ptr to previous record in log */
	uint8		xl_info;		/* flag bits, see below */
	RmgrId		xl_rmid;		/* resource manager for this record */
	/* 2 bytes of padding here, initialize to zero */
	pg_crc32c	xl_crc;			/* CRC for this record */

	/* XLogRecordBlockHeaders and XLogRecordDataHeader follow, no padding */

} XLogRecord;

xl_tot_len //Total length of the entire record
xl_xid //Transaction ID
xl_prev //Previous record in the log
xl_info //Information flag
xl_rmid //Resource manager ID

Among them, xl_info is used by the resource manager to indicate which type of log the log is, and its value is as follows:

/* XLOG info values for XLOG rmgr */
#define XLOG_CHECKPOINT_SHUTDOWN		0x00
#define XLOG_CHECKPOINT_ONLINE			0x10
#define XLOG_NOOP						0x20
#define XLOG_NEXTOID					0x30
#define XLOG_SWITCH						0x40
#define XLOG_BACKUP_END					0x50
#define XLOG_PARAMETER_CHANGE			0x60
#define XLOG_RESTORE_POINT				0x70
#define XLOG_FPW_CHANGE					0x80
#define XLOG_END_OF_RECOVERY			0x90
#define XLOG_FPI_FOR_HINT				0xA0
#define XLOG_FPI						0xB0

Among them, the value of xl_rmid resource manager is as follows:

/* symbol name, textual name, redo, desc, identify, startup, cleanup */
PG_RMGR(RM_XLOG_ID, "XLOG", xlog_redo, xlog_desc, xlog_identify, NULL, NULL)
PG_RMGR(RM_XACT_ID, "Transaction", xact_redo, xact_desc, xact_identify, NULL, NULL)
PG_RMGR(RM_SMGR_ID, "Storage", smgr_redo, smgr_desc, smgr_identify, NULL, NULL)
PG_RMGR(RM_CLOG_ID, "CLOG", clog_redo, clog_desc, clog_identify, NULL, NULL)
PG_RMGR(RM_DBASE_ID, "Database", dbase_redo, dbase_desc, dbase_identify, NULL, NULL)
PG_RMGR(RM_TBLSPC_ID, "Tablespace", tblspc_redo, tblspc_desc, tblspc_identify, NULL, NULL)
PG_RMGR(RM_MULTIXACT_ID, "MultiXact", multixact_redo, multixact_desc, multixact_identify, NULL, NULL)
PG_RMGR(RM_RELMAP_ID, "RelMap", relmap_redo, relmap_desc, relmap_identify, NULL, NULL)
PG_RMGR(RM_STANDBY_ID, "Standby", standby_redo, standby_desc, standby_identify, NULL, NULL)
PG_RMGR(RM_HEAP2_ID, "Heap2", heap2_redo, heap2_desc, heap2_identify, NULL, NULL)
PG_RMGR(RM_HEAP_ID, "Heap", heap_redo, heap_desc, heap_identify, NULL, NULL)
PG_RMGR(RM_BTREE_ID, "Btree", btree_redo, btree_desc, btree_identify, NULL, NULL)
PG_RMGR(RM_HASH_ID, "Hash", hash_redo, hash_desc, hash_identify, NULL, NULL)
PG_RMGR(RM_GIN_ID, "Gin", gin_redo, gin_desc, gin_identify, gin_xlog_startup, gin_xlog_cleanup)
PG_RMGR(RM_GIST_ID, "Gist", gist_redo, gist_desc, gist_identify, gist_xlog_startup, gist_xlog_cleanup)
PG_RMGR(RM_SEQ_ID, "Sequence", seq_redo, seq_desc, seq_identify, NULL, NULL)
PG_RMGR(RM_SPGIST_ID, "SPGist", spg_redo, spg_desc, spg_identify, spg_xlog_startup, spg_xlog_cleanup)
PG_RMGR(RM_BRIN_ID, "BRIN", brin_redo, brin_desc, brin_identify, NULL, NULL)
PG_RMGR(RM_COMMIT_TS_ID, "CommitTs", commit_ts_redo, commit_ts_desc, commit_ts_identify, NULL, NULL)
PG_RMGR(RM_REPLORIGIN_ID, "ReplicationOrigin", replorigin_redo, replorigin_desc, replorigin_identify, NULL, NULL)
PG_RMGR(RM_GENERIC_ID, "Generic", generic_redo, generic_desc, generic_identify, NULL, NULL)
PG_RMGR(RM_LOGICALMSG_ID, "LogicalMessage", logicalmsg_redo, logicalmsg_desc, logicalmsg_identify, NULL, NULL)


Listed below are the meanings of the more important resource manager IDs: RM_XLOG_ID: This log records a checkpoint information
RM_XACT_ID: This log records the commit or termination information of a transaction
RM_CLOG_ID: This log records the CLOG The initialization
RM_HEAP_ID of a page : This log records the information about the modification of tuples in the heap

XLOG Record data

XLOG Record data is the place where the actual data is stored and consists of the following parts:

0...N XLogRecordBlockHeader, each XLogRecordBlockHeader corresponds to a block data;

XLogRecordDataHeader[Short|Long], if the data size is less than 256 Bytes, use the Short format, otherwise use the Long format;

block data: full-write-page data and tuple data. For full-write-page data, if compression is enabled, the data is compressed and stored. After compression, the page-related metadata is stored in XLogRecordBlockCompressHeader;

main data: log data such as checkpoint.

. . .
There are a lot of details, so I will end this article first

Guess you like

Origin blog.csdn.net/sxqinjh/article/details/105311115
Recommended