- Column content: postgresql kernel source code analysis
- Personal homepage: My homepage
- Motto: Tian Xingjian, the gentleman strives for self-improvement;
Table of contents
The origin of table data expansion
foreword
This article is based on the analysis and interpretation of the postgresql 15 code, and the demonstration is carried out on the centos8 system.
When we use the postgresql database, there will always be some data expansion, which will slow down the query and invalidate the index. Why is there data expansion? What can we do to restore the database to normal after it is generated?
The origin of table data expansion
Speaking of the four major characteristics of the database, ACID, postgresql uses an MVCC (Multi Version Concurrency Control) mechanism to ensure the atomicity and isolation of transactions.
So what is MVCC? Simply put, it uses the incrementality of transaction numbers to identify the old and new versions of tuples, so as to achieve the isolation of tuples seen in different transactions; let's take a look at an example:
View the current data of a table;
postgres=# select ctid,xmin,xmax,id from t1;
ctid | xmin | xmax | id
-------+------+------+----
(0,1) | 1699 | 0 | 1
(0,2) | 1700 | 0 | 2
(2 rows)
We perform the update in one transaction and rollback
postgres=*begin;
BEGIN
postgres=*# select txid_current();
txid_current
--------------
1702
(1 row)
postgres=*update t1 SET name='a' where id=1;
UPDATE 1
postgres=*rollback;
ROLLBACK
we are inserting data
insert into t1(id) values(3);
insert into t1(id) values(4);
postgres=# select ctid,xmin,xmax,id from t1;
ctid | xmin | xmax | id
-------+------+------+----
(0,1) | 1699 | 1702 | 1
(0,2) | 1700 | 0 | 2
(0,4) | 1703 | 0 | 3
(0,5) | 1704 | 0 | 4
(4 rows)
It was found that the position of ctid (0, 3) was skipped, because there is a tuple version with id=1, but its visibility is judged as invisible, so we can’t see it, but it is indeed occupied a position.
when does swelling occur
What are the common cases for generating multi-version data?
-
The first is update
Let's take a look at the demo of update:
After we updated a piece of data, its position changed to the end, indicating another old version; the update code analysis of the detailed table is detailed in my column;
postgres=# update t1 SET name='a' where id=1;
UPDATE 1
postgres=# select ctid,xmin,xmax,id from t1;
ctid | xmin | xmax | id
-------+------+------+----
(0,2) | 1700 | 0 | 2
(0,4) | 1703 | 0 | 3
(0,5) | 1704 | 0 | 4
(0,6) | 1705 | 0 | 1
(4 rows)
-
and delete
There will be no multiple versions here, but after delete, the tuple will not be actually deleted from the table, but will be marked. The purpose of this is the same as the multiple versions, because there may be other transactions referencing it.
Let's take a look at an example:
postgres=# delete from t1 where id = 1;
DELETE 1
postgres=# select ctid,xmin,xmax,id from t1;
ctid | xmin | xmax | id
-------+------+------+----
(0,2) | 1700 | 0 | 2
(0,4) | 1703 | 0 | 3
(0,5) | 1704 | 0 | 4
(3 rows)
postgres=# insert into t1(id) values(5);
INSERT 0 1
postgres=# select ctid,xmin,xmax,id from t1;
ctid | xmin | xmax | id
-------+------+------+----
(0,2) | 1700 | 0 | 2
(0,4) | 1703 | 0 | 3
(0,5) | 1704 | 0 | 4
(0,7) | 1707 | 0 | 5
(4 rows)
After we delete, we insert and found that the idle (0, 6) is not used, but starts from (0, 7);
How to Get Rid of Bloat
Postgresql uses two methods during operation:
One is page cropping;
The second is autovacuum;
So how do they do it? Please see this column content.
end
Author email: [email protected]
If there are any mistakes or omissions, please point them out and learn from each other.
Note: Do not reprint without consent!