The use of optimistic locking for ElasticSearch concurrent operations

The last article introduced the addition, deletion and modification of ES nested indexes. This article will continue to talk about the topic of the previous article. The addition and update operations in the previous article are actually unsafe. All database db systems will have concurrency problems like relationships. Type databases MySQL , Oracle , and SQL Server use pessimistic locking by default.

The optimistic locking used in ElasticSearch, let's familiarize yourself with what is optimistic locking and pessimistic locking:

Pessimistic Lock, as the name suggests, is very pessimistic. Every time I go to get the data, I think others will modify it, so every time I get the data, it will be locked, so that if others want to get the data, it will block until it gets it. to the lock. Many such lock mechanisms are used in traditional relational databases, such as row locks, table locks, read locks, write locks, etc., all of which are locked before operations are performed.

Optimistic Lock, as the name implies, is very optimistic. Every time I go to get data, I think that others will not modify it, so it will not be locked, but when it is updated, it will judge whether others have updated it during this period. Data, you can use mechanisms such as version numbers. Optimistic locks are suitable for multi-read application types, which can improve throughput. For example, if the database provides a mechanism similar to write_condition, it is actually an optimistic lock.

Both types of locks have their own advantages and disadvantages, and one cannot be considered better than the other. For example, optimistic locks are suitable for the case where there are few writes, that is, when conflicts really rarely occur, which can save the overhead of locks, plus increase the overall throughput of the system. However, if conflicts occur frequently, the upper-layer application will continue to retry, which will reduce performance, so it is more appropriate to use pessimistic locks in this case.

From the above introduction, it is not difficult to find why es adopts optimistic locking, because es is a system with more reads and fewer writes in most scenarios. If the pessimistic locking strategy is used, the throughput of es will be greatly reduced. Of course, the concurrency problem It is real. Let me share with you the concurrency problems encountered in actual work.

The best way is to exclude concurrency problems in design. For example, one of our projects consumes kafka, and the calculated data is stored in es. If we do not design a strategy for entering kafka, we may encounter concurrent inserts and updates. The problem, when sparkstreaming integrates kafka, how many partitions kafka has, you need to set a corresponding number of Executors processes for Spark , such as 10 kafka partitions, now there are 10 sparkstreaming processes processing data, the same usser user data at the same time, If the update is calculated and updated to es on different machines that are distributed, then concurrency problems will be encountered. For example, for a number accumulation operation, the original is 100. Process A and process B read this data at the same time and update the data. Process A adds 10 and process B adds 20. The correct result should be 130, but due to concurrent updates, it may be If the accumulation operation of process A is lost, the final result is 120, or the accumulation operation of process B is lost, then the final result is 110. No matter how the update is done, it will cause data problems without considering the lock. Then if I can send the data of the same user to the same partition in kafka, then it is easy. If they are all in the same partition, the data processing in one partition is serial so that concurrency problems can be avoided.

Of course, if it cannot be avoided, we need to solve the concurrency problem through the optimistic lock class of es. Let's take a look at how to use optimistic locking to deal with concurrency problems in es. First, let's look at the problem of concurrent insertion. Multiple processes get the data of a user at the same time, and then insert es at the same time. If the lock is not added, the later data will be overwritten. Drop the previous data, what we actually want is that if there are concurrent inserts, then the second data should be added in an updated way, not overwritten.

How to achieve?

When inserting, use the create(true) method provided by es to mark the data inserted at the same time. Only one piece of data is inserted successfully. If the insertion fails, the exception that the document already exists will be thrown. Then the application side catches the exception in the code Controls retry insertion. When retrying, it will judge whether the data already exists, and if so, it will be updated.

The Scala code is as follows:

The above is the concurrency problem solving strategy when inserting. Next, let's see how to deal with concurrency problems when updating. There are two main ideas:

(1)如果是针对某个数值做累加或者减,可以使用es服务端冲突重试机制解决,这个方式比较简单,不需要 我们在程序中处理并发逻辑,我们所需要做的就是评估同一条数据的并发程度,然后设置合理重试次数就行,在重试之后如果仍然失败就会抛出异常,然后我们针对做处理。

核心代码如下:

(2)此外,我们还可以通过es内部维护的version字段来自定义实现灵活控制的乐观锁。

我们知道当我们第一次插入一条数据成功时,es返回的reponse里面会给出当前这条数据的_version=1,如果我们更新这条数据前,读取这条数据当前的version=1,然后在更新时候只有携带的version=1时才能更新成功,如果更新成功version会加1,同一时刻当有两个进程都携带version=1去更新数据,最终只会有一条数据更新成功,只要更新成功version会累加=2,然后其他进程会更新失败,报版本冲突,因为最新是2,其他的都是1,所以更新失败,会抛出冲突异常:

内部维护的version可以在更新和删除的api时使用

下面我们看一下使用外部version来控制乐观锁,上面的version每次更新成功的+1操作都是es内部维护的,除此之外我们还可以使用外部自定义维护的版本进行插入,删除,更新操作:

比如

结果:

现在我们指定version=10去更新后,返回的新响应如下:

如果再次执行上面的那个请求就会失败,因为新版本必须大于已经存在的版本号

利用这个特性,我们也可以将时间戳当做版本,传进去,能保证当前的数据只有是最新的数据才能插入更新

总结:

This article mainly introduces the use of optimistic locking in es. If only incremental accumulation or decrement operations are performed, no order is concerned, and attention is paid to the final result, we can use the es server to ensure that the conflict is retried, which is very convenient to solve. In order to solve the problem of concurrency conflicts, if you pay attention to the incremental order, such as indexing and update operations, the last data used by default overwrites the previous data. If there is a conflict, we can use the version field to deal with the conflict problem. In addition, the version value can be used internally maintained by es. , you can also use the value passed from our external application and specify the version to use optimistic locking for updates.

 

 

http://mp.weixin.qq.com/s/yapfvRIIlXvHsjOEdpCy1g

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326600371&siteId=291194637