[Cassandra] Cassandra3.x清理过期数据

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/wawa8899/article/details/86496803

Cassandra3.x官方文档:https://docs.datastax.com/en/cql/3.3/cql/cql_using/useExpire.html

Cassandra数据库支持TTL(Time To Live),来控制数据的生存时长(expire time)。

1.设定生存时长

1.1 设定行数据生存时长

建表时不指定过期时间,插入数据时为每条数据单独指定过期时间。

语法

INSERT INTO table_name ... USING TTL 100;

其中,TTL后面的数字的单位是秒,写入的记录将在100s后过期。

示例:

创建表,不指定TTL

cassandra@cqlsh:test> CREATE TABLE test_ttl_01 (
        ... name text,
        ... age text,
        ... systime timestamp,
        ... PRIMARY KEY (name,systime));
cassandra@cqlsh:test>

写入数据时指定单条记录的TTL

cassandra@cqlsh:test> insert into test_ttl_01  json '{"name":"name01","age":"11","systime":"2019-01-16 01:00:00+0000"}';
cassandra@cqlsh:test> insert into test_ttl_01  json '{"name":"name02","age":"12","systime":"2019-01-16 02:00:00+0000"}';
cassandra1@cqlsh:test> insert into test_ttl_01  json '{"name":"name03","age":"13","systime":"2019-01-16 03:00:00+0000"}';
cassandra@cqlsh:test> insert into test_ttl_01  json '{"name":"name04","age":"14","systime":"2019-01-16 04:00:00+0000"}';
cassandra@cqlsh:test> insert into test_ttl_01  json '{"name":"name05","age":"15","systime":"2019-01-16 05:00:00+0000"}';
cassandra@cqlsh:test> insert into test_ttl_01  json '{"name":"name06","age":"16","systime":"2019-01-16 06:00:00+0000"}' USING TTL 100;
cassandra@cqlsh:test> insert into test_ttl_01  json '{"name":"name07","age":"17","systime":"2019-01-16 07:00:00+0000"}' USING TTL 100;
cassandra@cqlsh:test> insert into test_ttl_01  json '{"name":"name08","age":"18","systime":"2019-01-16 08:00:00+0000"}' USING TTL 100;
cassandra@cqlsh:test> insert into test_ttl_01  json '{"name":"name09","age":"19","systime":"2019-01-16 09:00:00+0000"}' USING TTL 100;
cassandra@cqlsh:test> insert into test_ttl_01  json '{"name":"name10","age":"20","systime":"2019-01-16 10:00:00+0000"}' USING TTL 100;
cassandra@cqlsh:test> select * from test_ttl_01;
 name   | systime                         | age
--------+---------------------------------+-----
 name02 | 2019-01-16 02:00:00.000000+0000 |  12
 name04 | 2019-01-16 04:00:00.000000+0000 |  14
 name05 | 2019-01-16 05:00:00.000000+0000 |  15
 name06 | 2019-01-16 06:00:00.000000+0000 |  16
 name03 | 2019-01-16 03:00:00.000000+0000 |  13
 name08 | 2019-01-16 08:00:00.000000+0000 |  18
 name10 | 2019-01-16 10:00:00.000000+0000 |  20
 name09 | 2019-01-16 09:00:00.000000+0000 |  19
 name01 | 2019-01-16 01:00:00.000000+0000 |  11
 name07 | 2019-01-16 07:00:00.000000+0000 |  17

(10 rows)
cassandra@cqlsh:test> 

其中,指定TTL的数据会在TTL到期后被删除;写入时未指定TTL的数据将会一直存在。

cassandra@cqlsh:test> /* sleep 100s */
cassandra@cqlsh:test> select * from test_ttl_01;

 name   | systime                         | age
--------+---------------------------------+-----
 name02 | 2019-01-16 02:00:00.000000+0000 |  12
 name04 | 2019-01-16 04:00:00.000000+0000 |  14
 name05 | 2019-01-16 05:00:00.000000+0000 |  15
 name03 | 2019-01-16 03:00:00.000000+0000 |  13
 name01 | 2019-01-16 01:00:00.000000+0000 |  11

(5 rows)
cassandra@cqlsh:test> 

1.2 设定表级别生存时长

表创建时指定表数据默认生存时长。

语法:

CREATE TABLE table_name (...) WITH default_time_to_live = 100;

其中,default_time_to_live为TTL参数,等号后面的数字为过期时间,单位是秒。写入此表中的记录若没有在写入时单独指定TTL,则默认将会在设定的TTL时间后过期。

示例:

创建表,指定TTL属性(default_time_to_live)

cassandra@cqlsh:test> CREATE TABLE test_ttl_02 (
        ... name text,
        ... age text,
        ... systime timestamp,
        ... PRIMARY KEY (name,systime))
        ... with default_time_to_live = 100;
cassandra@cqlsh:test>

写入数据,并等待数据过期

cassandra@cqlsh:test> insert into test_ttl_02 json '{"name":"name03","age":"13","systime":"2019-01-16 03:00:00+0000"}';
cassandra@cqlsh:test> insert into test_ttl_02 json '{"name":"name04","age":"14","systime":"2019-01-16 04:00:00+0000"}';
cassandra@cqlsh:test> insert into test_ttl_02 json '{"name":"name05","age":"15","systime":"2019-01-16 05:00:00+0000"}';
cassandra@cqlsh:test> insert into test_ttl_02 json '{"name":"name06","age":"16","systime":"2019-01-16 06:00:00+0000"}' USING TTL 20;
cassandra@cqlsh:test> insert into test_ttl_02 json '{"name":"name07","age":"17","systime":"2019-01-16 07:00:00+0000"}' USING TTL 20;
cassandra@cqlsh:test> insert into test_ttl_02 json '{"name":"name08","age":"18","systime":"2019-01-16 08:00:00+0000"}' USING TTL 20;
cassandra@cqlsh:test> insert into test_ttl_02 json '{"name":"name09","age":"19","systime":"2019-01-16 09:00:00+0000"}' USING TTL 1000;
cassandra@cqlsh:test> insert into test_ttl_02 json '{"name":"name10","age":"20","systime":"2019-01-16 10:00:00+0000"}' USING TTL 1000;
cassandra@cqlsh:test>

以上表的默认生存周期为100s,

  • 写入的前5行数据没指定TTL,数据会在表默认生存周期100s后过期;
  • 6、7、8三行指定TTL为20s(小于默认值100s),数据会在20s后过期;
  • 9、10两行指定TTL为100s(大于默认值100s),数据会在1000s后过期;

20s后,6、7、8行过期

cassandra@cqlsh:test> /* sleep 20s */
cassandra@cqlsh:test> select * from  test_ttl_02; 

 name   | systime                         | age
--------+---------------------------------+-----
 name02 | 2019-01-16 02:00:00.000000+0000 |  12
 name04 | 2019-01-16 04:00:00.000000+0000 |  14
 name05 | 2019-01-16 05:00:00.000000+0000 |  15
 name03 | 2019-01-16 03:00:00.000000+0000 |  13
 name10 | 2019-01-16 10:00:00.000000+0000 |  20
 name09 | 2019-01-16 09:00:00.000000+0000 |  19
 name01 | 2019-01-16 01:00:00.000000+0000 |  11

(7 rows)
cassandra@cqlsh:test>

100s后,前5行过期

cassandra@cqlsh:test> /* sleep 100s */
cassandra@cqlsh:test> select * from  test_ttl_02;

 name   | systime                         | age
--------+---------------------------------+-----
 name10 | 2019-01-16 10:00:00.000000+0000 |  20
 name09 | 2019-01-16 09:00:00.000000+0000 |  19

(2 rows)
cassandra@cqlsh:test>

最后两行数据会在1000s后过期

cassandra@cqlsh:test> /* sleep 100s */
cassandra@cqlsh:test> select * from test_ttl_02;

 name | systime | age
------+---------+-----

(0 rows)
cassandra@cqlsh:test> 

注意:建表时指定了表级别生存周期TTL不为0的表,写入数据(INSERT)时若再次指定该条记录的TTL,则写入数据时指定的TTL优先级大于建表时指定的TTL。且在修改表的TTL时,不影响该条记录的TTL。

2. 修改生存时长

2.1 修改行数据生存时长

语法:

UPDATE table_name USING TTL 100

SET column_name = ...

WHERE ... ;

该语句的作用是设定表中指定列的TTL的值为100,过期的效果是该指定列的值为null。并且WHERE条件需要指定完整的主键。

示例:

修改主键为("name01","2019-01-16 01:00:00+0000")的数据的TTL为10s,10s后该条记录过期

cassandra@cqlsh:test> UPDATE test_ttl_01 USING TTL 10 SET age = '21' where name = 'name01' and systime = '2019-01-16 01:00:00+0000';
cassandra@cqlsh:test> /*sleep 10s*/
cassandra@cqlsh:test> select * from test_ttl_01;

 name   | systime                         | age
--------+---------------------------------+------
 name02 | 2019-01-16 02:00:00.000000+0000 |   12
 name04 | 2019-01-16 04:00:00.000000+0000 |   14
 name05 | 2019-01-16 05:00:00.000000+0000 |   15
 name03 | 2019-01-16 03:00:00.000000+0000 |   13
 name01 | 2019-01-16 01:00:00.000000+0000 | null

(5 rows)
cassandra@cqlsh:test> 

注意:此语法的作用只将SET后面的列的数据过期,过期值为null。

2.2 修改表级别生存时长

语法:

ALTER TABLE table_name WITH default_time_to_live = 1000;

示例:

建表时指定TTL为100s,修改为1000s

cassandra@cqlsh:test> ALTER TABLE test_ttl_02 WITH default_time_to_live = 1000;
cassandra@cqlsh:test> desc test_ttl_02;

CREATE TABLE test.test_ttl_02 (
    name text,
    systime timestamp,
    age text,
    PRIMARY KEY (name, systime)
) WITH CLUSTERING ORDER BY (systime ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 1000
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

cassandra@cqlsh:test> 

注意:建表时指定了表级别生存周期TTL不为0的表,写入数据(INSERT)时若再次指定该条记录的TTL,则写入数据时指定的TTL优先级大于建表时指定的TTL。且在修改表的TTL时,不影响该条记录的TTL。

其他:

Cassandra2.x官方文档:cassandra2.x: https://docs.datastax.com/en/archived/cql/3.1/cql/cql_using/use_expire_c.html

猜你喜欢

转载自blog.csdn.net/wawa8899/article/details/86496803