Label
PostgreSQL , unlogged table , 批量 , dblink
background
Importing data in batches, how to tap the potential of the system's comparative limit?
Where are the bottlenecks usually?
1、WAL lock
2、INDEX lock
3、EXTEND LOCK
4. Autovacuum interference
So the best way is to rule out the above problems, for example
1. Use multiple tables to solve the problem of single table EXTEND LOCK
2. Use unlogged table (data will be lost in the event of an exception, remember to use only in scenarios) multiple tables to solve the WAL LOCK problem
3. Do not use indexes to solve the INDEX LOCK problem
4. Do not use autovacuum when importing to solve the problem of autovacuum interference
Basically, the maximum potential of the machine can be tapped.
single table test
1. Create a test table
postgres=# create unlogged table ut(c1 int8) with (autovacuum_enabled=off, toast.autovacuum_enabled=off);
CREATE TABLE
Time: 12.723 ms
2. Generate 100 million data
postgres=# insert into ut select generate_series(1,100000000);
INSERT 0 100000000
Time: 43378.465 ms (00:43.378)
postgres=# copy ut to '/data01/pg/ut.csv';
COPY 100000000
Time: 20292.684 ms (00:20.293)
# ll -ht /data01/pg/ut.csv
-rw-r--r-- 1 digoal digoal 848M Apr 27 22:02 /data01/pg/ut.csv
3. Create a plugin
create extension dblink;
4. Create a function that repeatedly establishes a connection without reporting an error