pg_buffercache插件应用于AntDB

pg_buffercache插件应用于AntDB

最近使用AntDB作性能测试时,发现各个节点的memory使用率非常高,持续90%以上。极端情况下会出现linux 系统的OOM导致的正常进程被终止的问题。

07:50:01 AM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit  kbactive   kbinact   kbdirty
08:00:01 AM   7637488 107762712     93.38    210720  82850128  68736012     31.21  57126872  34469436       872
08:10:01 AM   7667152 107733048     93.36    210720  82824796  68697808     31.19  57133296  34433712       960
08:20:01 AM   7589444 107810756     93.42    210732  82894252  68770136     31.22  57155768  34498432       644
08:30:01 AM   7651812 107748388     93.37    210732  82869168  68745724     31.21  57160156  34466116      1372
08:40:01 AM   7551960 107848240     93.46    210748  82939800  68812128     31.24  57158176  34530912      1216
08:50:01 AM   7516872 107883328     93.49    210768  82977256  68781520     31.23  57170696  34556540       580
09:00:01 AM   7548676 107851524     93.46    210804  82940580  68744960     31.21  57180280  34515508       652
09:10:01 AM   7463740 107936460     93.53    210804  83010680  68850000     31.26  57190260  34578704       840
09:20:02 AM   7491808 107908392     93.51    210812  82974756  68794780     31.23  57214004  34535412      1340
09:30:01 AM   7516384 107883816     93.49    210812  82959668  68756636     31.22  57207748  34511324      1020

linux操作系统,上面的记录是用sar每10分钟采集一次的记录。不过我们一般习惯上还是使用free来查看内存使用情况:

[zgy@INTEL175 ~]$ free -h
              total        used        free      shared  buff/cache   available
Mem:           110G         20G        5.9G        5.4G         83G         82G

看到上面free命令输出的结果,一般会有以下三种认知:
- 第一种,free 只剩5.9G了,程序用了100多G的内存,为什么,肯定是程序写的有问题;
- 第二种,内存才用了20G左右,还有很多剩余内存可用。buffer和cache占用很大一部分,说明系统中有进程曾经读写过大文件,但是不要紧,这部分内存是当空闲来用的;
- 第三种,free 显示的是这样,好吧我知道了。你问我这些内存够不够,我当然不知道,我怎么知道你程序怎么写的?

当然本次实验不关注操作系统层面内存是如何管理的,只是去探究下AntDB集群层面的buffercache的使用情况。
PG提供了一个扩展pg_buffercache来查看缓存区的内容,当然同样可以运用在AntDB中。下面介绍下如何在AntDB中使用:

编译安装pg_buffercache


cd ~/AntDB/contrib/pg_buffercache
USE_PGXS=1 make clean
USE_PGXS=1 make
USE_PGXS=1 make install
###连接adbmgr deploy all 到每个节点
###delopy all 之前先停止AntDB集群包括agent 进程: stop all MODE F && stop agent all
psql -d postgres -U postgres -p 6432 -c "deploy all password 'password'";
###接着按照步骤启动AntDB集群和agent 进程:start agent all && start all

如何使用pg_buffercache

连接coordinator创建pg_buffercache extension

postgres=# create extension pg_buffercache ;
CREATE EXTENSION
postgres=# \dx
                      List of installed extensions
      Name      | Version |   Schema   |           Description           
----------------+---------+------------+---------------------------------
 pg_buffercache | 1.0     | public     | examine the shared buffer cache
 plpgsql        | 1.0     | pg_catalog | PL/pgSQL procedural language
(2 rows)
postgres=# \d+ public.pg_buffercache
                 View "public.pg_buffercache"
     Column     |   Type   | Modifiers | Storage | Description 
----------------+----------+-----------+---------+-------------
 bufferid       | integer  |           | plain   | 
 relfilenode    | oid      |           | plain   | 
 reltablespace  | oid      |           | plain   | 
 reldatabase    | oid      |           | plain   | 
 relforknumber  | smallint |           | plain   | 
 relblocknumber | bigint   |           | plain   | 
 isdirty        | boolean  |           | plain   | 
 usagecount     | smallint |           | plain   | 
View definition:
 SELECT p.bufferid,
    p.relfilenode,
    p.reltablespace,
    p.reldatabase,
    p.relforknumber,
    p.relblocknumber,
    p.isdirty,
    p.usagecount
   FROM pg_buffercache_pages() p(bufferid integer, relfilenode oid, reltablespace oid, reldatabase oid, relforknumber smallint, relblocknumber bigint, isdirty boolean, usagecount smallint);

常用查询

–查询当前share_buffer 中缓存的对象:

postgres=# SELECT c.relname, count(*) AS buffers
postgres-#              FROM pg_buffercache b INNER JOIN pg_class c
postgres-#              ON b.relfilenode = pg_relation_filenode(c.oid) AND
postgres-#                 b.reldatabase IN (0, (SELECT oid FROM pg_database
postgres(#                                       WHERE datname = current_database()))
postgres-#              GROUP BY c.relname
postgres-#              ORDER BY 2 DESC
postgres-#              LIMIT 10; 
            relname             | buffers 
--------------------------------+---------
 pg_proc                        |      34
 pg_attribute                   |      32
 pg_class                       |      20
 pg_operator                    |      14
 pg_statistic                   |      12
 pg_proc_proname_args_nsp_index |      12
 pg_depend_reference_index      |      11
 pg_depend                      |      10
 pg_proc_oid_index              |       9
 pg_type                        |       8
(10 rows)

– 查询当前share_buffer 中缓存的对象的占用比例:(注意修改block size,编译AntDB的时候未改变blocksize的值,默认值为8k)

postgres=# SELECT
postgres-#     c.relname,
postgres-#     pg_size_pretty(count(*) * 8192) as buffered,
postgres-#     round(100.0 * count(*) /
postgres(#     (SELECT setting FROM pg_settings
postgres(# WHERE name='shared_buffers')::integer,1)
postgres-#     AS buffers_percent,
postgres-#     round(100.0 * count(*) * 8192 /
postgres(#     pg_relation_size(c.oid),1)
postgres-#     AS percent_of_relation
postgres-# FROM pg_class c
postgres-# INNER JOIN pg_buffercache b
postgres-# ON b.relfilenode = c.relfilenode
postgres-# INNER JOIN pg_database d
postgres-# ON (b.reldatabase = d.oid AND d.datname = current_database())
postgres-# GROUP BY c.oid,c.relname
postgres-# ORDER BY 3 DESC
postgres-# LIMIT 5;
             relname              |  buffered  | buffers_percent | percent_of_relation 
----------------------------------+------------+-----------------+---------------------
 pg_extension_oid_index           | 16 kB      |             0.0 |               100.0
 pg_operator_oid_index            | 32 kB      |             0.0 |                80.0
 pgxc_class                       | 8192 bytes |             0.0 |               100.0
 pg_statistic_relid_att_inh_index | 48 kB      |             0.0 |               100.0
 pg_inherits_relid_seqno_index    | 8192 bytes |             0.0 |               100.0
(5 rows)

备注:上面的查询sql 在block_size 是默认的8k情况下,直接写成8192没问题的。但是如果在编译AntDB时候更改了blocke的值查询出来的结果就不正确了。假使不记得编译时候是否更改了blocksize的值,可以直接读取pg_settings的值,可以写成如下更为通用的sql:

postgres=# SELECT                                                  
postgres-# c.relname,
postgres-# pg_size_pretty(count(*) * (select setting from pg_settings where name='block_size')::integer ) as buffered,
postgres-# round(100.0 * count(*) /
postgres(# (SELECT setting FROM pg_settings
postgres(# WHERE name='shared_buffers')::integer,1)
postgres-# AS buffers_percent,
postgres-# round(100.0 * count(*) * (select setting from pg_settings where name='block_size')::integer /
postgres(# pg_relation_size(c.oid),1)
postgres-# AS percent_of_relation
postgres-# FROM pg_class c
postgres-# INNER JOIN pg_buffercache b
postgres-# ON b.relfilenode = c.relfilenode
postgres-# INNER JOIN pg_database d
postgres-# ON (b.reldatabase = d.oid AND d.datname = current_database())
postgres-# GROUP BY c.oid,c.relname
postgres-# ORDER BY 3 DESC
postgres-# LIMIT 5;
             relname              |  buffered  | buffers_percent | percent_of_relation 
----------------------------------+------------+-----------------+---------------------
 pg_extension_oid_index           | 16 kB      |             0.0 |               100.0
 pg_operator_oid_index            | 32 kB      |             0.0 |                80.0
 pgxc_class                       | 8192 bytes |             0.0 |               100.0
 pg_statistic_relid_att_inh_index | 48 kB      |             0.0 |               100.0
 pg_inherits_relid_seqno_index    | 8192 bytes |             0.0 |               100.0
(5 rows)

– 查询当前share_buffer 中的脏数据

postgres=# select count(*) from pg_buffercache where isdirty is true;

 count 
-------
     0
(1 row)
--当前没有脏数据,可以简单作个示例
postgres=# create table a (id int,name varchar(20));
CREATE TABLE
postgres=# insert into a values (1,'jay'),(2,'may');
INSERT 0 2
postgres=# update a set name='apple' where id=1;
UPDATE 1
postgres=# select count(*) from pg_buffercache where isdirty is true;
 count 
-------
    19
(1 row)

postgres=# select * from  pg_buffercache where isdirty is true;
 bufferid | relfilenode | reltablespace | reldatabase | relforknumber | relblocknumber | isdirty | usagecount 
----------+-------------+---------------+-------------+---------------+----------------+---------+------------
        5 |       12905 |          1663 |       13161 |             0 |              2 | t       |          5
       79 |       12908 |          1663 |       13161 |             0 |              1 | t       |          5
      141 |       12907 |          1663 |       13161 |             0 |              2 | t       |          5
      149 |       12885 |          1663 |       13161 |             0 |              1 | t       |          5
      151 |       12897 |          1663 |       13161 |             0 |             12 | t       |          5
      154 |       13079 |          1663 |       13161 |             0 |              1 | t       |          5
      155 |       13077 |          1663 |       13161 |             0 |              0 | t       |          5
      158 |       13013 |          1663 |       13161 |             0 |             24 | t       |          5
      159 |       13011 |          1663 |       13161 |             0 |             48 | t       |          5
      162 |       13014 |          1663 |       13161 |             0 |             28 | t       |          5
      163 |       13013 |          1663 |       13161 |             0 |             27 | t       |          5
      165 |       13013 |          1663 |       13161 |             0 |             21 | t       |          5
      166 |       13014 |          1663 |       13161 |             0 |              5 | t       |          5
      167 |       12884 |          1663 |       13161 |             0 |              2 | t       |          5
      168 |       12882 |          1663 |       13161 |             0 |              8 | t       |          5
      178 |       12894 |          1663 |       13161 |             0 |             52 | t       |          5
      217 |       13014 |          1663 |       13161 |             0 |             23 | t       |          5
      375 |       12896 |          1663 |       13161 |             0 |             13 | t       |          5
      379 |       12905 |          1663 |       13161 |             2 |              0 | t       |          2
(19 rows)

当然pg_buffercache 还可以衍生出来很多的查询目的,这里就不一一阐述了,感兴趣的可以去找找相关资料研究研究。由于受自身水平限制,文中若有错误的地方,欢迎各位大牛予以指正。

AntDB:
开源url:https://github.com/ADBSQL/AntDB
QQ交流群:496464280

参考:http://www.csdn123.com/html/blogs/20130507/10013.htm

猜你喜欢

转载自blog.csdn.net/u011098015/article/details/78930603