Postgresql kernel source code analysis - what is the expansion of table data

 74ac905cfa3740079f2f66a445a3d7c2.gif#pic_center

 

 


Table of contents

foreword

The origin of table data expansion

when does swelling occur

The first is update

and delete

How to Get Rid of Bloat

end


foreword

This article is based on the analysis and interpretation of the postgresql 15 code, and the demonstration is carried out on the centos8 system.

When we use the postgresql database, there will always be some data expansion, which will slow down the query and invalidate the index. Why is there data expansion? What can we do to restore the database to normal after it is generated?


 

The origin of table data expansion

Speaking of the four major characteristics of the database, ACID, postgresql uses an MVCC (Multi Version Concurrency Control) mechanism to ensure the atomicity and isolation of transactions.

 

So what is MVCC? Simply put, it uses the incrementality of transaction numbers to identify the old and new versions of tuples, so as to achieve the isolation of tuples seen in different transactions; let's take a look at an example:

 

View the current data of a table;

postgres=# select ctid,xmin,xmax,id from t1;

 ctid  | xmin | xmax | id

-------+------+------+----

 (0,1) | 1699 |    0 |  1

 (0,2) | 1700 |    0 |  2

(2 rows)

 

We perform the update in one transaction and rollback

postgres=*begin;

BEGIN

postgres=*# select txid_current();

 txid_current

--------------

         1702

(1 row)

postgres=*update t1 SET name='a' where id=1;

UPDATE 1

postgres=*rollback;

ROLLBACK

 

we are inserting data

insert into t1(id) values(3);

insert into t1(id) values(4);

 

postgres=# select ctid,xmin,xmax,id from t1;

 ctid  | xmin | xmax | id

-------+------+------+----

 (0,1) | 1699 | 1702 |  1

 (0,2) | 1700 |    0 |  2

 (0,4) | 1703 |    0 |  3

 (0,5) | 1704 |    0 |  4

(4 rows)

It was found that the position of ctid (0, 3) was skipped, because there is a tuple version with id=1, but its visibility is judged as invisible, so we can’t see it, but it is indeed occupied a position.


when does swelling occur

 

What are the common cases for generating multi-version data?

 

  • The first is update

Let's take a look at the demo of update:

After we updated a piece of data, its position changed to the end, indicating another old version; the update code analysis of the detailed table is detailed in my column;

postgres=# update t1 SET name='a' where id=1;

UPDATE 1

postgres=# select ctid,xmin,xmax,id from t1;

 ctid  | xmin | xmax | id

-------+------+------+----

 (0,2) | 1700 |    0 |  2

 (0,4) | 1703 |    0 |  3

 (0,5) | 1704 |    0 |  4

 (0,6) | 1705 |    0 |  1

(4 rows)

 

  • and delete

There will be no multiple versions here, but after delete, the tuple will not be actually deleted from the table, but will be marked. The purpose of this is the same as the multiple versions, because there may be other transactions referencing it.

Let's take a look at an example:

postgres=# delete from t1 where id = 1;

DELETE 1

postgres=# select ctid,xmin,xmax,id from t1;

 ctid  | xmin | xmax | id

-------+------+------+----

 (0,2) | 1700 |    0 |  2

 (0,4) | 1703 |    0 |  3

 (0,5) | 1704 |    0 |  4

(3 rows)

postgres=# insert into t1(id) values(5);

INSERT 0 1

postgres=# select ctid,xmin,xmax,id from t1;

 ctid  | xmin | xmax | id

-------+------+------+----

 (0,2) | 1700 |    0 |  2

 (0,4) | 1703 |    0 |  3

 (0,5) | 1704 |    0 |  4

 (0,7) | 1707 |    0 |  5

(4 rows)

After we delete, we insert and found that the idle (0, 6) is not used, but starts from (0, 7);

 

How to Get Rid of Bloat

Postgresql uses two methods during operation:

One is page cropping;

The second is autovacuum;

So how do they do it? Please see this column content.

 


end

Author email: [email protected]
If there are any mistakes or omissions, please point them out and learn from each other.

Note: Do not reprint without consent!

 

Guess you like

Origin blog.csdn.net/senllang/article/details/129193438