Postgresql15 kernel source code analysis - buffer search interface

 


Table of contents

foreword

overview

Interface introduction

call scene analysis

Break down in detail

end


foreword

This article is based on the analysis and interpretation of the postgresql 15 code, and the demonstration is carried out on the centos8 system.


overview

In postgresql, when the SQL engine uses the data in the table file, it first reads into the shared buffer, and when modifying, it also modifies it in the buffer first, and then writes the data to the disk through the buffer replacement algorithm, timing dirty page placement and other mechanisms . The shared buffer is composed of buffer arrays, each buffer corresponds to a page in the data file, and the default configuration is 8K.

The buffer replacement algorithm and search process have been introduced earlier, and this article will share and introduce the search interface.

Interface introduction

In postgresql, there are mainly six interfaces for searching buffers, which are used in different scenarios to improve performance.

extern PrefetchBufferResult PrefetchSharedBuffer(struct SMgrRelationData *smgr_reln,

 ForkNumber forkNum,

 BlockNumber blockNum);

extern PrefetchBufferResult PrefetchBuffer(Relation reln, ForkNumber forkNum,

   BlockNumber blockNum);

extern bool ReadRecentBuffer(RelFileLocator rlocator, ForkNumber forkNum,

 BlockNumber blockNum, Buffer recent_buffer);



extern Buffer ReadBuffer(Relation reln, BlockNumber blockNum);

extern Buffer ReadBufferExtended(Relation reln, ForkNumber forkNum,

 BlockNumber blockNum, ReadBufferMode mode,

 BufferAccessStrategy strategy);

extern Buffer ReadBufferWithoutRelcache(RelFileLocator rlocator,

ForkNumber forkNum, BlockNumber blockNum,

ReadBufferMode mode, BufferAccessStrategy strategy,

bool permanent);

call scene analysis

ReadBuffer

The simplest call is commonly used for ordinary tables, because only MAIN_FORKNUM, the default access strategy, is NULL

ReadBufferExtended

More flexible, you can specify forknum, mode, strategy, ReadBuffer also calls this interface internally

ReadBufferWithoutRelcache

Generally called when copying database or table, replay wal

ReadRecentBuffer

Mainly called when replay wal, compared with ReadBuffer, this interface will not check buffer mapping hash

PrefetchBuffer

Called during lazyvacuum to scan the pages at the end of the table

PrefetchSharedBuffer

It will be called in PrefetchBuffer; in addition, it will be called when replaying wal, and the buffer involved in WAL will be loaded in advance to speed up

Break down in detail

There are several situations in the search buffer:

(1) Find it in the BufferMapping hash, indicating that the currently searched tag already has a buffer, and directly use the buffer id to check the validity of the buffer; after finding it, you need to add the pin at the first time, and then release the buffermapping hash segment lock;

The current buffer may have three situations:

One is that the buffer data has been loaded successfully, that is, BM_VALID, just add a pin to use it at this time;

The other is that the current buffer only has the tag set successfully, but the data has not been loaded successfully, and other backends are loading, so wait for the IO to complete at this time;

Another situation is that the tag is set successfully, but the data has not been loaded successfully, and there is no other backend setting IO_IN_PROGRESS, then the current process is responsible for loading the data;

(2) If it is not found in the hash, it means that the current tag does not have a buffer. You need to use the replacement algorithm to find a buffer, increase the reference count at the first time, and add a pin; then check whether the buffer needs to be flushed or does not meet the replacement; find the pending When replacing the buffer, you need to hold the descriptor lock and release it until the pin is added to avoid contention by other backends;

There are several situations for the buffer found by the replacement algorithm:

One is that the current buffer is dirty and needs to be flushed; add the content lock first, and then make a judgment after adding it. If the WAL corresponding to the buffer has not been placed on the disk, you need to find a replacement block again; if it does not meet the current replacement strategy, the current The block is removed from the search ring, and then the replacement block is searched again; if the content lock cannot be added, it means that the backend is being modified, and the search for the replacement block will be given up again;

The second is that the current buffer already has a tag, which may be a non-dirty block, and the buffermapping segment corresponding to the replacement block needs to be locked; here, both the old and new tags must be locked, pay attention to the locking sequence, and avoid enough deadlocks;

The third is that the buffer is currently invalid, and it may be replaced directly if it has not been used;

Fourth, during the replacement process, other backends add pins to use or even modify dirty pages. At this time, it is also necessary to search for the replacement block again;

(3) When replacing the old and new tags, add it to the buffermapping first. At this time, it is possible that the backend has already been added, then add the pin and wait for the disk to be loaded; if it is newly added to the buffermapping, then modify the buffer tag, and then modify the buffer The state of the descriptor; then start Io to load the disk;


end

Author email: [email protected]
If there are any mistakes or omissions, please point them out and learn from each other.

Note: Do not reprint without consent!

Guess you like

Origin blog.csdn.net/senllang/article/details/130003484