CrateDB Preliminary Study (4): Optimistic Concurrency Control (Optimistic Concurrency Control)

Table of contents

system columns

 _seq_no

_primary_term

example

optimistic update/delete


 Other articles in this series:

A Preliminary Study of CrateDB (1): Docker Deployment of CrateDB Cluster

CrateDB Preliminary Study (2): PARTITION, SHARDING AND REPLICATION

A Preliminary Study of CrateDB (3): JDBC


In optimistic concurrency control, data is not locked when users read it. When a user updates data, the system checks to see if another user has changed the data since the user read it. If another user updates the data, an error will be generated. Normally, users who receive an error message will roll back the transaction and start over. This approach is called optimistic concurrency control because it is primarily used in environments where data contention is low and the cost of occasionally rolling back a transaction is lower than the cost of locking data while it is being read.

Even if CrateDB does not support transactions, Optimistic Concurrency Control (Optimistic Concurrency Control) can be implemented through the two system columns _seq_no and _primary_term. This article gives a brief introduction to CrateDB's optimistic concurrency control, see the official documentation for details .

system columns

_seq_no and _primary_term are one of the system columns of the table. They reflect the order of data operations and the configuration changes of the cluster. Through select, you can observe their changes after each data operation.

 _seq_no

When a row of data in the table is modified, newly inserted or deleted, primary shards will modify the _seq_no of the row (equivalent to the version number of the row). A similar system column also has _version, but _version is not recommended for optimistic concurrency control.

The CrateDB primary shards will increment a sequence number for every insert, update and delete operation executed against a row. The current sequence number of a row is exposed under this column. This column can be used in conjunction with the _primary_term column for Optimistic Concurrency Control, see Optimistic Concurrency Control for usage details.

_primary_term

Reflect cluster configuration changes

The sequence numbers give us an order of operations that happen at a primary shard, but they don't help us distinguish between old and new primaries. For example, if a primary is isolated in a minority partition, a possible up to date replica shard on the majority partition will be promoted to be the new primary shard and continue to process write operations, subject to the write.wait_for_active_shards setting. When this partition heals we need a reliable way to know that the operations that come from the other shard are from an old primary and, equally, the operations that we send to the shard re-joining the cluster are from the newer primary. The cluster needs to have a consensus on which shards are the current serving primaries. In order to achieve this we use the primary terms which are generational counters that are incremented when a primary is promoted. Used in conjunction with _seq_no we can obtain a total order of operations across shards and Optimistic Concurrency Control.

example

initial state

After updating the row with id=1, the row ’s _seq_no+1, _primary_term: 1->6 (the change in _primary_term is guessed because the cluster was restarted for some reason from the initial state to this update)

optimistic update/delete

Query to obtain the values ​​of _seq_no and _primary_term before updating or deleting, check _seq_no/_primary_term during execution, and do not execute if modified. In this way, data conflicts in applications are avoided without causing data loss.

Querying for the correct _seq_no and _primary_term ensures that no concurrent update and cluster configuration change has taken place

The specific method is to specify _seq_no and _primary_term when updating or deleting

update parted_table set width = 200 where id = 1 and _seq_no = 6 and _primary_term = 6;

However, if the primary key is not included in the WHERE condition , an error will be reported (the table tested here does not have a primary key, and an error will also be reported)

For experimentation, create a new table (set the joint primary key id, day), and insert data

CREATE TABLE parted_table2 (
           id bigint PRIMARY KEY,
           title text,
           content text,
           width double precision,
           day timestamp with time zone PRIMARY KEY
         ) PARTITIONED BY (day);
insert into parted_tables (select * from parted_table);

The WHERE condition needs to include (all) primary keys

To sum up:

1. Need to specify (all) primary keys

The _seq_no and _primary_term columns can only be used when specifying the whole primary key in a query. 

2. _seq_no and _primary_term 需要同时指定

In order to use the optimistic concurrency control mechanism both the _seq_no and _primary_term columns need to be specified

Guess you like

Origin blog.csdn.net/gxf1027/article/details/104948354
Recommended