Postgresql database TimescaleDB regular compression super table delete super table (block)
Article Directory
I use the postgresql database TimescaleDB timing library to store real-time data in my work. The amount of collected data is too large and the memory provided is insufficient, so consider how to save the data
The super table in
TimescaleDB database TimescaleDB database itself can be realized with functions
Functions carried by TimescaleDB database
1. Compression SELECT compress_chunk()
In order not to lose data as much as possible, to avoid deleting data, I first consider compressing the data, using its own function SELECT compress_chunk()
1 query time show_chunks()
CREATE OR REPLACE FUNCTION "hrmw"."show_chunks"("hypertable" regclass=NULL::regclass, "older_than" any=NULL::unknown, "newer_than" any=NULL::unknown)
RETURNS SETOF "pg_catalog"."regclass" AS '$libdir/timescaledb-1.7.1', 'ts_chunk_show_chunks'
LANGUAGE c STABLE
COST 1
ROWS 1000
show_shunks() usage
select show_shunks(); --查看所有块
select show_shunks(超表名); --查看某个超表底下的所有块
SELECT show_chunks(older_than => INTERVAL '10 days', newer_than => INTERVAL '20 days');
-- 查询10天到20天的的块
To query 180 days of data
SELECT show_chunks('超表名',older_than => INTERVAL '180 days', newer_than => INTERVAL '182 days');
2.compress_chunk() compression function
CREATE OR REPLACE FUNCTION "hrmw"."compress_chunk"("uncompressed_chunk" regclass, "if_not_compressed" bool=false)
RETURNS "pg_catalog"."regclass" AS '$libdir/timescaledb-1.7.1', 'ts_compress_chunk'
LANGUAGE c VOLATILE STRICT
COST 1
2.1 First make the super table compressible
ALTER TABLE '超表名' SET (
timescaledb.compress,
timescaledb.compress_segmentby = '主键(字段名)',
timescaledb.compress_orderby = '时间字段 DESC');
2.2 Compressed partition
-Compress SELECT compress_chunk();
compress 180 days of data
SELECT compress_chunk( '_timescaledb_internal._hyper_4_238_chunk');
-- SELECT compress_chunk( '_timescaledb_internal.分区名(块)');
-Query space status after compression
SELECT * FROM timescaledb_information.compressed_chunk_stats;
-unzip
SELECT decompress_chunk('_timescaledb_internal._hyper_4_26_chunk');
-- SELECT decompress_chunk('_timescaledb_internal.分区名(块)');
3. Use function to automatically compress 180 days
CREATE
OR REPLACE FUNCTION "hrmw"."target_compress_chunk" ( ) RETURNS "pg_catalog"."void" AS $BODY$ DECLARE--定义变量
t_accid VARCHAR;--变量
strSQL VARCHAR ( 1000 );
BEGIN--函数开始
t_accid := ( SELECT show_chunks ( '超表名', older_than => INTERVAL '180 days', newer_than => INTERVAL '182 days' ) );
strSQL := 'select compress_chunk(''' || t_accid || ''' ,true);';
EXECUTE strSQL;
END;--结束
$BODY$ LANGUAGE plpgsql VOLATILE COST 100
4. Add timed tasks
(Automatically compress 180-day partitions (blocks) every day at 2:30)
Use the pgadmin tool that comes with the postgresql database to create timed tasks
. Code to be added:
SET search_path TO hrmw;
select hrmw.target_compress_chunk();--执行函数target_compress_chunk()
Step 1 Step
2
Step 3
Two delete partition
Because there is still a lot of data after compression, the data can only be deleted half a year ago.
Batch deletion can use the drop_chunks() function.
I am lazy. I used the automation strategy add_drop_chunks_policy()
#创建策略 只保留保留最近半年的数据(直接删除块)
SELECT add_drop_chunks_policy('conditions', INTERVAL '6 months');
Query strategy
select * from timescaledb_information.drop_chunks_policies;
There can only be one strategy per super table
Field name | description |
---|---|
hypertable | (REGCLASS) The name of the super table to which the strategy is applied |
older_than | (Interval) When running this strategy, blocks much longer than this time will be discarded |
cascade | (Boolean value) Whether to run the strategy with the cascade option turned on, which will cause dependent objects and blocks to be discarded. |
job_id | (INTEGER) I of the background job set to implement the drop_chunks strategy |
schedule_interval | (Interval) The interval at which the job runs |
max_runtime | (Interval) The maximum time that the background job scheduler will allow the job to run before stopping it |
max_retries | (Integer) If the job fails, the number of times the job will be retried |
retry_period | (Interval) The time the scheduler waits between failed retries |
It's basically over here. If you have any questions, please leave a message or chat with me privately.
By the way, TimescaleDB has too little information to complete it, thank you