postgresql|database|Usage of database testing tool pgbench

Foreword:

The database is an important component in the project, and it is also a basic and important component. Its status is the first. I think there should not be too many problems.

Well, it goes without saying that these aspects of database design are the key first step. It mainly involves the deployment method and operation method of the database, the logical design of the table, reasonable fields, reasonable indexes, necessary roles, and security aspects. Consider functions, views, triggers, materialized views, etc. In other words, which data in the project needs to be stored in the database, and how to store related data in the database are issues that need to be solved during the database design phase.

After the database design phase is completed, we enter the operation phase of the database. Before operation, we need to clarify the state that the database should reach. Simply put, the database can have three highs (the three highs usually refer to high availability, high performance, and high concurrency). High availability is relatively easy to achieve. Generally, high availability (that is, HA) is achieved by building a cluster. It is also relatively easy to determine whether high availability has been achieved (after all, after the cluster is set up, how many times does the master-slave switchover occur? You will know whether it is indeed highly available at the first time), but high performance and high concurrency require repeated testing and combined with actual post-line operations to judge. If there are no tests and related test reports, then it is impossible to confirm whether the database has high availability. Performance and high concurrency.

Therefore, database testing is a critical but often overlooked task. For postgresql, there are many tools that can be used to test and determine whether a database meets our expectations, such as pg_profile, pg_reset, pg_stat and other internal or external plug-ins to collect and monitor the database, but the reports generated by these tools There is a lot of content and the generation efficiency is not high, nor is it particularly intuitive.

The tool pgbench can solve a large part of the pain points. The tool has a simple, direct, efficient and easy-to-use database testing process. The key is that there is no need for special installation and deployment. It is a small tool that comes with the postgresql database. It gives the impression that pgbench is equivalent to the ab tool in web testing, which is very convenient to use.

pgbench can be used to test the performance and concurrency capabilities of PostgreSQL. It simulates a simple bank transfer scenario and can simulate different loads by setting parameters. pgbench supports multi-threaded concurrency testing and can test indicators such as transaction throughput, latency, and number of concurrent connections. pgbench is simple to use, but its functionality is limited and can only perform basic load testing.

The following is a brief introduction to the use of pgbench.

one,

Where is pgbench?

pgbench is generally a built-in command installed with the database

Special note that this command is basically the same as other commands. It needs to be executed by the postgres user. It cannot be used by the root user.

[root@node1 ~]# whereis pgbench
pgbench: /usr/local/pgsql/bin/pgbench

two,

Introduction to database for testing

The operating system is centos7, two VMware virtual machines, 4G memory, and 4 CPU cores

The database uses the postgresql-12.4 version, all of which are in the default state, which means that there is no optimization. The optimization here refers to the operating parameters of the database and the optimization of the operating system kernel parameters. The database is a simple master-slave replication cluster.

Master database IP 192.168.123.11

From database IP 192.168.123.12

three,

Data preparation for test work

It is planned to generate a large table with 20 million entries, and then perform queries and write tests on the table to obtain the performance and concurrency indicators of the database. The following is the code to create the large table:

Random number function:

create or replace function gen_id(  
 a date,  
 b date  
)   
returns text as $$  
select lpad((random()*99)::int::text, 3, '0') ||   
    lpad((random()*99)::int::text, 3, '0') ||   
    lpad((random()*99)::int::text, 3, '0') ||   
    to_char(a + (random()*(b-a))::int, 'yyyymmdd') ||   
    lpad((random()*99)::int::text, 3, '0') ||   
    random()::int ||   
    (case when random()*10 >9 then 'xy' else (random()*9)::int::text end ) ;  
$$ language sql strict;

Create test table structure:

CREATE SEQUENCE test START 1;
create table if not exists testpg (
	"id" int8 not null DEFAULT nextval('test'::regclass),
	CONSTRAINT "user_vendorcode_pkey" PRIMARY KEY ("id"),
	"suijishuzi" VARCHAR ( 255 ) COLLATE "pg_catalog"."default"
);

Insert 2000W pieces of data:

Depending on machine performance, it takes about 5 to 10 minutes

insert into testpg SELECT generate_series(1,20000000) as xm, gen_id('1949-01-01', '2023-10-16') as num;

Four,

View test table

five,

pgbench initialization

Note that before initialization, you need to create the pgbench database. How to create it is not nonsense here.

pgbench -U postgres -i pgbench

After creation, you will see several tables under the pgbench database. The functions of the tables are not yet clear:

postgres=# \c pgbench 
You are now connected to database "pgbench" as user "postgres".
pgbench=# \dt
              List of relations
 Schema |       Name       | Type  |  Owner   
--------+------------------+-------+----------
 public | pgbench_accounts | table | postgres
 public | pgbench_branches | table | postgres
 public | pgbench_history  | table | postgres
 public | pgbench_tellers  | table | postgres
(4 rows)

six,

Several modes of pgbench

pgbench has two types: built-in mode and external mode. Built-in is to directly test the contents of the four tables just created by pgbench. It is generally used for benchmark testing (benchmark testing refers to basic and accuracy tests). The external mode uses customized SQL statements for testing, and is generally used for stress performance testing.

Built-in modes:

There are three specific refinements of the built-in mode. Based on the name, I can probably guess that the first is a simple test of comprehensive performance, the second is a simple test of write performance, and the third is a simple test of read performance. They are all Tested using the four tables that come with pgbench and its own logic.

[postgres@node1 ~]$ pgbench -b list
Available builtin scripts:
	tpcb-like
	simple-update
	select-only

 The first small mode (tpcb-like):

 pgbench  -U postgres -T 10 -c 10 -h 192.168.123.11 -d pgbench   > 1111.txt  2>&1 >>1111.txt

Intercepting part of the output results, you can see that pgbench has update, insert, and select actions, which are all completed in the above four tables. This process is uncontrollable and is basically not a very accurate test. 

client 5 executing script "<builtin: TPC-B (sort of)>"
client 5 executing \set aid
client 5 executing \set bid
client 5 executing \set tid
client 5 executing \set delta
client 5 sending BEGIN;
client 5 receiving
client 0 receiving
client 0 sending END;
client 0 receiving
client 5 receiving
client 5 sending UPDATE pgbench_accounts SET abalance = abalance + -1444 WHERE aid = 99838;
client 5 receiving
client 9 receiving
client 9 sending UPDATE pgbench_tellers SET tbalance = tbalance + -1294 WHERE tid = 6;
client 9 receiving
client 0 receiving
client 5 receiving
client 5 sending SELECT abalance FROM pgbench_accounts WHERE aid = 99838;
client 5 receiving
client 8 receiving
client 8 sending INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (1, 1, 78380, -2573, CURRENT_TIMESTAMP);
client 8 receiving
client 0 executing script "<builtin: TPC-B (sort of)>"
client 0 executing \set aid
client 0 executing \set bid
client 0 executing \set tid
client 0 executing \set delta
client 0 sending BEGIN;
client 0 receiving
client 0 receiving
client 0 sending UPDATE pgbench_accounts SET abalance = abalance + -2452 WHERE aid = 40167;
client 0 receiving
client 5 receiving
client 5 sending UPDATE pgbench_tellers SET tbalance = tbalance + -1444 WHERE tid = 10;
client 5 receiving
client 8 receiving
client 8 sending END;
client 8 receiving
client 5 receiving
client 5 sending UPDATE pgbench_branches SET bbalance = bbalance + -1444 WHERE bid = 1;
client 5 receiving

The second small mode (select-only);

pgbench  -U postgres -b select-only  -c 10 -h 192.168.123.11 -d pgbench    > 1111.txt  2>&1 >>1111.txt

The third small mode ( simple-update )

pgbench  -U postgres -b simple-update  -c 10 -h 192.168.123.11 -d pgbench    > 1111.txt  2>&1 >>1111.txt

External mode:

pgbench -M prepared -v -r -P 1 -f ./ro.sql -c 60 -j 60 -T 120 -D scale=10000 -D range=500000 -Upostgres test -P 5 -h 192.168.123.222 -p 15433

seven,

Parameter description of pgbench command:

Parameter Description:

-r After the benchmark ends, report the average per-statement wait time (execution time from the client's perspective) for each command.

-j Number of worker threads in pgbench. Using more than one thread can be useful on multi-CPU machines. Clients are distributed as evenly as possible across available threads. Default is 1.

-c The number of simulated clients, that is, the number of concurrent database sessions. Default is 1.

-t Number of transactions to run per client. Default is 10.

-T runs the test for this many seconds instead of running a fixed number of transactions per client.

-D VARNAME  = VALUE  passes the variable value in the test script

define variable for use by custom script 

- v vacuum all four standard tables before tests . Generally , in order to remove the impact of the last test results during testing, you need to vacuum the pgbench database.

Report Description:

transaction type indicates the test type used in this test

scaling factor indicates the scaling factor for the amount of data set by pgbench during initialization

query mode indicates the specified query mode, including simple query mode (default), extended query mode and prepared query mode

number of clients indicates the number of specified client connections

number of threads indicates the number of threads for each client during testing

number of transactions actually processed The number of transactions actually processed at the end of the test

latency average average response time of the test process

tps Number of transactions executed per unit time

To be continued! ! !

Guess you like

Origin blog.csdn.net/alwaysbefine/article/details/133255834