PostgreSQL Shared Buffers详解

shared buffers即数据库服务器的共享内存缓冲区，这个参数一般建议设置成操作系统内存的25%，为什么设置这么大呢？设置的越大是不是越好呢？要解决这些问题，我们需要了解在Postgresql中shared buffers究竟是如何工作的。

再介绍shared buffers之前，我们要先介绍以下postgresql中的bgwriter进程。

bgwriter 负责周期性的将shared buffer中的dirty page刷出shared buffer。这里要注意：bgwriter使用的是buffer io(所以不涉及写盘)，所以实际上什么时候刷到后端存储设备取决于OS的调度策略。

正因为如此，所以PostgreSQL必须高度依赖操作系统缓存，它依赖于操作系统来了解文件系统、磁盘布局以及如何读写数据文件。

下图是数据在shared buffers和磁盘间的传递过程：
在这里插入图片描述
当我们在数据库中执行一条命令时，实际上数据是先加载到操作系统缓存中，然后再进入到shared buffers中。同理，脏数据的刷出也是先刷出到操作系统缓存中，然后操作系统调用fsync来将数据刷到磁盘中。

我们可以发现，shared buffers和操作系统缓存其实作用差不多，其中保存的内容都是一样的，那不是在浪费空间吗？其实不然，OS缓存使用的是简单的LRU，而不是数据库优化的时钟扫描算法（clock sweep algorithm）。一旦页面在shared_buffers上命中，读取就永远不会到达操作系统缓存，如果有任何副本，则很容易删除它们。

如前面所说，数据刷到磁盘中是由OS的调度策略决定的，那么我们可以控制数据什么时候刷到磁盘中吗？

答案是当然的，为了防止无限制的等OS的调度, bgwriter也支持bgwriter_flush_after参数, 当刷出shared buffer的page超过bgwriter_flush_after指定的个数后, 会强制调用OS将page cache的脏页刷到后端存储设备。

不仅是bgwriter，在PostgreSQL甚至checkpoint进程和用户后端进程也可以将脏缓冲区从共享缓冲区到操作系统缓存。
即使在这里，我们也可以分别使用checkpoint_flush_after和backend_flush_after命令来影响操作系统的fsync()。

正因为这样，我们一般的数据库服务器中内存基本都是分配给了数据库。因为数据库操作是在共享缓冲区中执行的，而操作系统缓存是通过同步和直接写绕过的。

那么为什么不把所有的内存都给操作系统呢?
PostgreSQL共享缓冲区缓存能比操作系统缓存做得更好的主要原因是它保持缓冲区使用计数的方式。
这允许缓冲区获得从0到5的“流行度”分数，分数越高，这些缓冲区离开缓存的可能性就越小。
每当数据库查找要删除的内容以便为其需要的数据腾出更多空间时，就会减少使用计数。使用量的每一次增加都会使该块更难摆脱。这个实现称为时钟扫描算法（clock-sweep algorithm）。
典型的操作系统缓存在数据被驱逐之前只会给任何缓冲区一到两次机会。
通常，操作系统会使用某种形式的LRU算法。如果数据库中有经常使用的数据，那么将数据保存在数据库的共享RAM中可能比保存在操作系统的共享RAM中更好。

例子：
1、查看shared buffers内容
我们可以通过pg_buffercache扩展来查看shared buffers中的内容：

bill@bill=>create extension pg_buffercache;
CREATE EXTENSION

bill@bill=>SELECT c.relname
bill-#   , pg_size_pretty(count(*) * 8192) as buffered
bill-#   , round(100.0 * count(*) / ( SELECT setting FROM pg_settings WHERE name='shared_buffers')::integer,1) AS buffers_percent
bill-#   , round(100.0 * count(*) * 8192 / pg_relation_size(c.oid),1) AS percent_of_relation
bill-#  FROM pg_class c
bill-#  INNER JOIN pg_buffercache b ON b.relfilenode = c.relfilenode
bill-#  INNER JOIN pg_database d ON (b.reldatabase = d.oid AND d.datname = current_database())
bill-#  WHERE pg_relation_size(c.oid) > 0
bill-#  GROUP BY c.oid, c.relname
bill-#  ORDER BY 3 DESC
bill-#  LIMIT 10;
              relname              |  buffered  | buffers_percent | percent_of_relation
-----------------------------------+------------+-----------------+---------------------
 pg_depend_reference_index         | 496 kB     |             0.1 |               100.0
 pg_depend                         | 728 kB     |             0.1 |               104.6
 pg_depend_depender_index          | 368 kB     |             0.1 |                79.3
 pg_toast_2618                     | 520 kB     |             0.1 |               106.6
 pg_statistic                      | 272 kB     |             0.1 |               113.3
 pg_sequence                       | 40 kB      |             0.0 |               500.0
 pg_inherits_parent_index          | 8192 bytes |             0.0 |               100.0
 pg_user_mapping_user_server_index | 8192 bytes |             0.0 |               100.0
 pg_am                             | 16 kB      |             0.0 |               200.0
 pg_default_acl_role_nsp_obj_index | 8192 bytes |             0.0 |               100.0
(10 rows)

2、查看OS缓存中的内容
要检查在操作系统级别缓存的数据，我们需要安装pgfincore扩展。
下载地址：https://github.com/klando/pgfincore

安装：
make clean
make
make install

bill@bill=>create extension pgfincore;
CREATE EXTENSION

查看：

bill@bill=>select c.relname,pg_size_pretty(count(*) * 8192) as pg_buffered,
bill-#  round(100.0 * count(*) /
bill(#            (select setting
bill(#             from pg_settings
bill(#             where name='shared_buffers')::integer,1)
bill-#        as pgbuffer_percent,
bill-#        round(100.0*count(*)*8192 / pg_table_size(c.oid),1) as percent_of_relation,
bill-#        ( select round( sum(pages_mem) * 4 /1024,0 )
bill(#          from pgfincore(c.relname::text) )
bill-#          as os_cache_MB ,
bill-#          round(100 * (
bill(#                select sum(pages_mem)*4096
bill(#                from pgfincore(c.relname::text) )/ pg_table_size(c.oid),1)
bill-#          as os_cache_percent_of_relation,
bill-#          pg_size_pretty(pg_table_size(c.oid)) as rel_size
bill-#  from pg_class c
bill-#  inner join pg_buffercache b on b.relfilenode=c.relfilenode
bill-#  inner join pg_database d on (b.reldatabase=d.oid and d.datname=current_database()
bill(#             and c.relnamespace=(select oid from pg_namespace where nspname='public'))
bill-#  group by c.oid,c.relname
bill-#  order by 3 desc limit 30;
 relname | pg_buffered | pgbuffer_percent | percent_of_relation | os_cache_mb | os_cache_percent_of_relation |  rel_size
---------+-------------+------------------+---------------------+-------------+------------------------------+------------
 t1      | 8192 bytes  |              0.0 |                16.7 |           0 |                          0.0 | 48 kB
 my_seq  | 8192 bytes  |              0.0 |               100.0 |           0 |                        100.0 | 8192 bytes
(2 rows)

pg_buffered表示：PostgreSQL缓冲缓存中缓冲了多少数据
pgbuffer_percent表示：pg_buffered/total_buffer_size *100
percent_of_relation表示：pg_buffered/total_relation_size * 100
os_cache_mb表示：在OS中缓存了多少关系

小结：
大多数情况设置shared_buffers为内存的25%，当然为了最优可以根据命中，以及缓存占比调整。
同时需要注意：设置所有的缓存需要注意不要超过总内存大小。

参考链接：
https://postgreshelp.com/postgresql_shared_buffers/

PostgreSQL Shared Buffers详解

猜你喜欢