About the data consistency problem of cassandra cluster

The cassandra cluster requires strict time synchronization, and a little synchronization will cause problems like this. I have explained this in the strict time synchronization required by the cassandra cluster , so time synchronization is the premise of the cassandra cluster.

Cassandra uses the last consistency model, which means that the data updated concurrently at the beginning may be inconsistent, but after this inconsistent time, the system will reach final consistency. Let each client see the same result.

The strength of this eventual consistency in Cassandra is determined by the consistency model you choose. Usually using cassandra, we choose the QUORUM level, which means that when half of the replicas receive the request, they will return the client response, so as to ensure that the inserted data can be queried for sure. However, there is a problem here. Regarding concurrency, assuming that the client updates the same record, how does Cassandra judge the order of requests? Only time, cassandra will be based on the time the request arrives at the server. E.g:

QueryOptions options = new QueryOptions();
options.setConsistencyLevel(ConsistencyLevel.QUORUM);
 
Cluster cluster = Cluster.builder()
.addContactPoint("192.168.1.101")
.withCredentials("cassandra", "cassandra")
.withQueryOptions(options)
.build();
 
Session session = cluster.connect();
RegularStatement update10 = QueryBuilder.update("myKeysapce","tableName")
.with(QueryBuilder.set("col2", 10))
.where(QueryBuilder.eq("key1", 1));
session.execute(update10);
 
RegularStatement update20 = QueryBuilder.update("myKeysapce","tableName")
.with(QueryBuilder.set("col2", 20))
.where(QueryBuilder.eq("key1", 1));
session.execute(update20);

 But there are multiple machines in the cassandra cluster, and the client is sent to different machines of the server? Oops, the data is messed up. Yes, when you use the datastax driver, you will find that you quickly update the same record twice, and the final result is sometimes not the result of the second request to update, like the example above, each update The result could be 20 or it could be 10. This can happen even if your consistency level is ALL. Because the time interval between the two requests is really short, and all the machines in the cluster cannot be completely time synchronized, even if ntp synchronization is used, the time difference will be at the ms level, and the two requests are sent to different machines, which will happen. The problem.

How to do it? When we switch to another cassandra client, Astyana, we find that the situation described above does not happen. Why? Is there a problem with the client? After investigation, it was found that the two requests sent by the Astyanax client were sent to the same node in the cluster, while the datastax official driver client was sent to different nodes.

It turns out that the Astyanax client has a concept of a request strategy. It has three strategies (TOKEN_AWARE, ROUND_ROBIN and BAG). Among them, TOKEN_AWARE is to request the same client based on the primary key token.
Does the native datastax client have such a concept? After investigation, it is found that there are also some. It is called LoadBalancingPolicy, which can be specified by Cluster.builder().withLoadBalancingPolicy(policy). It also has three policies, namely:

DCAwareRoundRobinPolicy
RoundRobinPolicy
TokenAwarePolicy

Among them, TokenAwarePolicy sends the request for the same record to the same node according to the token. Looking at the code, we find that the default policy used by datastax is TokenAwarePolicy, so why does it not have the same effect as Astyana?

By reading its code, I found the reason, that is, when updating, it is necessary to specify the tablemetadata of the table, otherwise datastatx cannot know which fields are the primary keys. Well, it seems that this client is too stupid. . .
Change the above example to the following, and everything will be fine.

TableMetadata metaData = cluster.getMetadata().getKeyspace("myKeyspace").getTable("tableName");
RegularStatement update10 = QueryBuilder.update(metaData)
.with(QueryBuilder.set("col2", 10))
.where(QueryBuilder.eq("key1", 1));

 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326830621&siteId=291194637