前言

本文是基于postgresql 15的代码进行分析解读，演示是在centos8系统上进行。

在我们使用postgresql数据库时，总会产生一些数据膨胀，导致查询变慢，索引失效，为什么会有数据膨胀呢？产生后我们怎么做才能让数据库恢复正常呢？

表数据膨胀的由来

话说数据库的四大特性ACID，postgresql采用了一种MVCC(Multi Version Concurrency Control)机制来保证事务的原子性和隔离性。

那什么是MVCC呢，简单说就是利用事务号递增性来标识tuple的新旧版本，达到不同事务内看到的tuple隔离；下面我们用一个例子来看一下：

查看一张表的当前数据；

postgres=# select ctid,xmin,xmax,id from t1;

ctid | xmin | xmax | id

-------+------+------+----

(0,1) | 1699 | 0 | 1

(0,2) | 1700 | 0 | 2

(2 rows)

我们在一个事务中执行更新并回滚

postgres=*begin;

BEGIN

postgres=*# select txid_current();

txid_current

--------------

1702

(1 row)

postgres=*update t1 SET name='a' where id=1;

UPDATE 1

postgres=*rollback;

ROLLBACK

我们在插入数据

insert into t1(id) values(3);

insert into t1(id) values(4);

postgres=# select ctid,xmin,xmax,id from t1;

ctid | xmin | xmax | id

-------+------+------+----

(0,1) | 1699 | 1702 | 1

(0,2) | 1700 |    0 | 2

(0,4) | 1703 |    0 | 3

(0,5) | 1704 |    0 | 4

(4 rows)

发现ctid为（0，3）的位置被跳过了，因为有一个id=1的tuple版本占了，只是它可见性判断时被判定为不可见，所以我们看不到它，但确实是占用了一个位置。

什么时候产生膨胀

那些常见情况下为产生多版本数据呢？

首先是update

我们来看一下update的演示：

我们update一条数据后，它的位置变到了最后，说明又一条旧版本；详细表的update代码解析详见我的专栏；

postgres=# update t1 SET name='a' where id=1;

UPDATE 1

postgres=# select ctid,xmin,xmax,id from t1;

ctid | xmin | xmax | id

-------+------+------+----

(0,2) | 1700 |    0 | 2

(0,4) | 1703 |    0 | 3

(0,5) | 1704 |    0 | 4

(0,6) | 1705 |    0 | 1

(4 rows)

还有delete

这里不会产生多版本，但是delete后，tuple并不会从表里真正删掉，而是打了一个标记，这样做的目的其它和多版本是一致的，因为可能还有其它事务在引用。

下面我们看一下例子：

postgres=# delete from t1 where id = 1;

DELETE 1

postgres=# select ctid,xmin,xmax,id from t1;

ctid | xmin | xmax | id

-------+------+------+----

(0,2) | 1700 |    0 | 2

(0,4) | 1703 |    0 | 3

(0,5) | 1704 |    0 | 4

(3 rows)

postgres=# insert into t1(id) values(5);

INSERT 0 1

postgres=# select ctid,xmin,xmax,id from t1;

ctid | xmin | xmax | id

-------+------+------+----

(0,2) | 1700 |    0 | 2

(0,4) | 1703 |    0 | 3

(0,5) | 1704 |    0 | 4

(0,7) | 1707 |    0 | 5

(4 rows)

我们delete后又insert，发现空闲（0，6）没有被利用，而是从（0，7）开始；

如何消除膨胀

postgresql在运行过程中，采用了两种方式：

一是页面裁剪;

二是autovacuum;

那它们是如何做的呢？请看本专栏内容。

结尾

作者邮箱：[email protected]
如有错误或者疏漏欢迎指出，互相学习。

注：未经同意，不得转载！

Postgresql内核源码分析-表数据膨胀是怎么回事

前言

表数据膨胀的由来

什么时候产生膨胀

首先是update

还有delete

如何消除膨胀

结尾

猜你喜欢