Talking about pessimistic locking and optimistic locking

1. Why do pessimistic locks and optimistic locks occur?

When there may be concurrency in the program, it is necessary to ensure the accuracy of the data under concurrent conditions, so as to ensure that when the current user operates together with other users, the results obtained are the same as those obtained when he operates alone. This is called concurrency control. The purpose of concurrency control is to ensure that the work of one user does not unreasonably affect the work of another user.
Without good concurrency control, problems such as dirty reads, phantom reads, and non-repeatable reads may occur.
The concurrency control that is often said is generally related to the database management system (DBMS). The task of concurrency control in DBMS is to ensure that when multiple transactions add, delete, modify and query the same data at the same time, the isolation, consistency and database unity of transactions are not destroyed.
The main means to achieve concurrency control are divided into optimistic concurrency control and pessimistic concurrency control.
Both pessimistic locks and optimistic locks are concepts defined by developers and can be considered as ideas. In fact, there are not only optimistic locking and pessimistic locking concepts in relational database systems, but also similar concepts such as hibernate, tair, and memcache. Therefore, optimistic locking, pessimistic locking and other database locks should not be compared.

  • Optimistic locking is more suitable for reading more and writing less ( multiple reading scenarios )
  • Pessimistic locking is more suitable for writing more and reading less ( multiple writing scenarios ).

insert image description here

Before explaining pessimistic locking and optimistic locking, we need to have a brief understanding of dirty reads, phantom reads and non-repeatable reads

2. Dirty reads, phantom reads, and non-repeatable reads

2.1 Dirty reads

During the execution of transaction A, transaction B reads the modification of transaction A. However, due to some reasons, transaction A may not complete the submission, and if a RollBack operation occurs, the data read by transaction B will be incorrect. This uncommitted data is a dirty read. The process of dirty read generation is as follows:
insert image description here

2.2 Phantom reading

Transaction B reads data twice, and transaction A adds data during the two reading processes, and the two read sets of transaction B are different. The process of phantom reading is as follows:
insert image description here

2.3 Non-repeatable read

Transaction B reads the data twice. During the two reading processes, transaction A modifies the data, and the data read by transaction B is different. The result of this read by transaction B is Nonrepeatable Read. The process of generating non-repeatable reads is as follows:
insert image description here

We can see that the concepts of non-repeatable read and phantom read are very similar, but the essential difference between the two is that non-repeatable read is the modification of data, and it is different from the original data when read again, and phantom read refers to reading The number of sets is different. Maybe 12 pieces of data are acquired for the first time, and one piece of data is deleted for some reason when reading the second time, resulting in only 9 pieces of data being read, as if this piece of data has never been read. The same happened.

2.4 Database isolation level

Read Uncommited

At this isolation level, all transactions can see the execution results of other uncommitted transactions. This isolation level is the lowest isolation level. Although it has super high concurrent processing capability and low system overhead, it is rarely used in practical applications. Because this isolation level can only prevent the first type of update loss problem, it cannot solve the problems of dirty reads, non-repeatable reads and phantom reads.

Read Committed

This is the default isolation level for most database systems (but not the MySQL default). It satisfies the simple definition of isolation: a transaction can only see changes made by committed transactions.
This isolation level can prevent dirty read problems, but there will be non-repeatable reads and phantom reads.

Repeatable Read

This is MySQL's default transaction isolation level, which ensures that multiple instances of the same transaction will see the same rows of data when reading data concurrently. This isolation level prevents problems other than phantom reads .

Serializable

This is the highest isolation level, which solves the problem of phantom reads and lost updates of the second type by forcing transaction ordering so that they cannot conflict with each other. At this level, all the concurrency problems mentioned above can be solved, but it may lead to a large number of timeouts and lock competition. Usually the database will not use this isolation level. We need other mechanisms to solve these problems: optimistic locking and pessimistic locking.
insert image description here

3. Pessimistic Lock

When modifying a piece of data in the database, in order to avoid being modified by others at the same time, the best way is to directly lock the data to prevent concurrency.
This method of locking before modifying data with the help of the database lock mechanism is called pessimistic concurrency control.

Pessimistic lock, with strong exclusive and exclusive characteristics . It refers to a conservative attitude towards data being modified by the outside world (including other current transactions in the system, as well as transactions from external systems). Therefore, during the entire data processing process, the data is kept in a locked state.

3.1 Implementation of pessimistic locking

The implementation of pessimistic locks often relies on the locking mechanism provided by the database (and only the locking mechanism provided by the database layer can truly guarantee the exclusivity of data access. Otherwise, even if the locking mechanism is implemented in this system, it cannot be guaranteed that the external system will not be modified. data).
The general process of pessimistic locking is as follows:
insert image description here

The reason why it is called pessimistic locking is because it is a concurrency control method with a pessimistic attitude towards data modification. Always assume the worst case, every time the data is read, other threads will change the data by default, so a locking operation is required, and when other threads want to access the data, they need to block and suspend. The implementation of pessimistic locking:

Common pessimistic lock implementations:

  • Traditional relational databases use this locking mechanism, such as row locks, table locks, read locks, and write locks, all of which are locked before operations.
  • The implementation of the synchronized keyword in Java.

Pessimistic locks are mainly divided into shared locks and exclusive locks:

  • 1. Shared locks are also known as read locks, or S locks for short. As the name implies, a shared lock means that multiple transactions can share a lock on the same data, and can access the data, but can only read and cannot modify it .
  • 2. Exclusive locks [exclusive locks] are also known as write locks, or X locks for short. As the name implies, an exclusive lock cannot coexist with other locks. If a transaction acquires an exclusive lock on a data row, other transactions cannot acquire other locks on the row, including shared locks and exclusive locks. A transaction that acquires an exclusive lock can read and modify rows of data.

Pessimistic concurrency control is actually a conservative strategy of " lock first and then access ", which provides a guarantee for the security of data processing. However, in terms of efficiency, the mechanism for handling locking will generate additional overhead for the database and increase the chance of deadlocks. In addition, parallelism will be reduced. If a transaction locks a row of data, other transactions must wait for the transaction to complete before processing that row of data.

4. Optimistic Locking

Optimistic locking is relative to pessimistic locking. Optimistic locking assumes that data will not cause conflicts in general, so when the data is submitted and updated, the data conflict will be formally detected. If there is a conflict, an exception will be returned to the user. information, allowing users to decide what to do. Optimistic locking is suitable for scenarios with more reads and fewer writes , which can improve program throughput.
insert image description here

Optimistic locking adopts a more relaxed locking mechanism. It is also a mechanism to avoid data processing errors caused by database phantom reading and long business processing time. However, optimistic locking does not deliberately use the locking mechanism of the database itself, but ensures the correctness of the data based on the data itself.

4.1 Implementation of optimistic locking

  • CAS implementation: The atomic variables under the java.util.concurrent.atomic package in Java use a CAS implementation of optimistic locking.
  • Version number control: Generally, a data version number version field is added to the data table to indicate the number of times the data has been modified. When the data is modified, the version value will be +1. When thread A wants to update the data, it will also read the version value while reading the data. When submitting the update, it will only update if the version value just read is equal to the version value in the current database, otherwise retry the update operation until the update is successful.

Optimistic concurrency control believes that the probability of data race between transactions is relatively small, so do it as directly as possible, and do not lock until the time of submission, so there will be no locks and deadlocks.

5. Concrete realization

5.1 Implementation of pessimistic locking

The implementation of pessimistic locking often relies on the locking mechanism provided by the database. In the database, the process of pessimistic locking is as follows:

  • 1. Before modifying the record, try to add exclusive locks to the record.
  • 2. If the lock fails, indicating that the record is being modified, the current query may have to wait or throw an exception. The specific response method is determined by the developer according to actual needs.
  • 3. If the lock is successful, the record can be modified, and the transaction will be unlocked after the transaction is completed.
  • 4. During the period, if there are other operations that modify or add exclusive locks to the record, they will wait for unlocking or throw an exception directly.

Specifically, we take the MySql Innodb engine as an example to illustrate the application of pessimistic locks in SQL.
To use pessimistic locks, the autocommit attribute set autocommit=0 of the MySQL database must be turned off. Because MySQL uses autocommit mode by default, that is, when an update operation is performed, MySQL will immediately submit the result.
The specific implementation is achieved through select ... for update
insert image description here

"select...for update will lock the table or lock the row"

5.2 Implementation of optimistic locking

Optimistic locking does not need to rely on the locking mechanism of the database.
There are two main steps:

  • collision detection
  • Data Update.

A typical one is CAS (Compare and Swap).
CAS means compare and swap.

It is a mechanism to solve the performance loss caused by the use of locks in the case of multi-threaded parallelism. The CAS operation contains three operands - memory location (V), expected original value (A) and new value (B). If the value of the memory location (V) matches the expected original value (A), the processor automatically updates the location value to the new value (B). Otherwise, the processor does nothing. In either case, it returns the value at that location before the CAS instruction. CAS effectively says "I think the position (V) should contain the value (A). If it does, put the new value (B) at this position; otherwise, don't change the position, just tell me what the position is now value." In Java, the sun.misc.Unsafe class provides hardware-level atomic operations to implement this CAS. A large number of classes under the java.util.concurrent package use the CAS operations of this Unsafe.java class.

When multiple threads try to use CAS to update the same variable at the same time, only one of the threads can update the value of the variable, while the other threads fail. The failed thread will not be suspended, but will be told to fail in this competition. and can try again. For example, in the previous problem of deducting inventory, we can achieve this through optimistic locking.
insert image description here

Before updating, first check the current inventory quantity in the inventory table, and then use the inventory quantity as a modification condition when doing update.
When submitting the update, it is judged that the current inventory number of the corresponding record in the database table is compared with the inventory number taken out for the first time. If the current inventory number in the database table is equal to the inventory number taken out for the first time, it will be updated, otherwise Considered expired data.
Do you feel the above operation is ok?
In fact, the above operation will encounter a very common problem ABA:
insert image description here

  • Thread 1 retrieves inventory number 3 from the database, and thread 2 also retrieves inventory number 3 from the database at this time, and thread 2 performs some operations to become 2.
  • Then thread 2 changes the inventory number to 3 again. At this time, thread 1 performs a CAS operation and finds that the database is still 3, and then thread 1 succeeds.
  • Although the CAS operation of thread one is successful, it does not mean that the process is without problems.

A better solution is to pass a separate sequentially incremented version field. The optimization is as follows:
insert image description here

Each time the optimistic lock performs a data modification operation, it will bring a version number. Once the version number is consistent with the data version number, the modification operation can be performed and the version number will be +1. Otherwise, the execution will fail. Because the version number increases with each operation, there are no ABA issues. In addition to version, timestamps can also be used because timestamps are inherently sequentially increasing.
But you feel like this is ok again?
You are wrong again!!!
That is, once you encounter high concurrency, only one thread can modify successfully, then there will be a lot of failures. For e-commerce websites like Taobao, high concurrency is common, and it is obviously unreasonable for users to perceive failure. Therefore, it is still necessary to find a way to reduce the granularity of optimistic locking. A better suggestion is to reduce the strength of optimistic locking, maximize the throughput rate, and improve the concurrency ability! as follows:
insert image description here

In the above SQL statement, if the number of orders placed by the user is 1, optimistic lock control is performed by means of quantity - 1 > 0. During execution, the value of quantity is queried and subtracted by 1 in an atomic operation.

Lock granularity control is an important knowledge in high concurrency environment. Choosing a good lock can greatly improve the throughput rate and performance while ensuring data security.

6. Understand the underlying CAS

insert image description here

If there are 3 threads concurrently to modify the value of an AtomicInteger, the underlying mechanism is as follows:

  • First, each thread gets the current value and then performs an atomic CAS operation.
    Atomic means that this CAS operation must be completely executed by itself and will not be interrupted by others.
  • Then in the CAS operation, it will be compared to see if the current value is the value just obtained. If it is, it means that no one has changed this value, and then set it to a value after adding 1.
  • Similarly, if someone finds that the previously obtained value is different from the current value when executing CAS, it will cause CAS to fail. On failure, enter an infinite loop, fetch the value again, and then perform the CAS operation.

7. CAS typical application – atomic

Most of the classes under the java.util.concurrent.atomic package are implemented using CAS operations, such as AtomicInteger, AtomicBoolean, and AtomicLong.
Generally, when the competition is not very fierce, the atomic operation performance of using this package is much more efficient than using the synchronized keyword (see getAndSet(), we can see that if the resource competition is very fierce, this for loop may last for a long time and cannot be used for a long time. Successfully jumped out. However, in this case, it may be necessary to consider reducing resource competition).
A very typical application is the counting application

public class Increment {
    
    
    private int count = 0;
    public void add() {
    
    
        count++;
    }
}

What situation do we need to face in the above code: it
is unsafe to increment count in a concurrent environment, why is it unsafe and how to solve this problem?

In a concurrent environment, is the count auto-increment operation unsafe? Because count++ is not an atomic operation, but a combination of three atomic operations:

  • Read the count value in memory and assign it to the local variable temp;
  • Execute temp+1 operation;
  • Assign temp to count.

Therefore, if two threads execute count++ at the same time, there is no guarantee that thread 1 will execute the above three steps in sequence before thread 2 starts executing.

The solution to the unsafe problem of count++ in concurrent environment

synchronized lock

Only one thread can lock at the same time, and other threads need to wait for the lock, so that the problem of inaccurate count will not occur:

public class Increment {
    
    
    private int count = 0;
    public synchronized void add() {
    
    
        count++;
    }
}

You think that everything will be fine if you solve thread safety. You not only need the function to be completed normally, but also ensure that it can be used normally.
However, the introduction of synchronized will cause the problem of multiple threads queuing, which is equivalent to serializing each thread, one by one. Queue, lock, process data, release locks, and come in next. Only one thread executes at the same time, such a lock is a bit "heavyweight".
This is similar to the implementation of pessimistic locks. If you need to acquire this resource, you will lock it, and other threads cannot access the resource until the lock on the resource is released after the operation. Although many optimizations have been made to synchronized with the update of the Java version, it is still "too heavy" to deal with such a simple accumulation operation.

Atomic atomic class

For the operation of count++, it can be done in a different way. The Java concurrency package provides a series of Atomic atomic classes, such as AtomicInteger:

//import java.util.concurrent.atomic.AtomicInteger;
public static void main(String[] args) {
    
    
    public static AtomicInteger count = new AtomicInteger(0);
    public static void increase() {
    
    
        count.incrementAndGet();
    }
}

Multiple threads can execute incrementAndGet() of AtomicInteger concurrently, which means to increment the value of count by 1, and then return the latest value after accumulation. In fact, the bottom layer of the Atomic atomic class is not the traditional lock mechanism, but the lock-free CAS mechanism, which ensures the security of multi-threading to modify a value through the CAS mechanism.

8. CAS performance optimization

A large number of threads concurrently modify an AtomicInteger at the same time, and there may be many threads that spin continuously and enter an infinitely repeating loop. These threads keep getting the value, and then initiate the CAS operation, but find that the value has been changed by others, so they enter the next cycle again, get the value, initiate the CAS operation and fail again, and enter the next cycle again. When a large number of threads update AtomicInteger with high concurrency, this problem may be more obvious, resulting in a large number of thread empty loops, self-spin, performance and efficiency are not particularly good. So how to optimize it?
Java8 has a new class, LongAdder, which tries to use segmented CAS and automatic segment migration to greatly improve the performance of multi-threaded high-concurrency CAS operations. How does this class optimize performance? As shown in the figure:
insert image description here

The core idea of ​​LongAdder is hotspot separation , which is similar to the design idea of ​​ConcurrentHashMap.
It is to separate the value into an array, and when multi-threaded access, it is counted by a number mapped to it through the hash algorithm. The final result is the summation and accumulation of these arrays. In this way, the granularity of the lock is reduced.

9. How to choose pessimistic locking or optimistic locking

In the choice of optimistic lock and pessimistic lock, mainly look at the difference between the two and the applicable scenarios.

  • Response efficiency: If a very high response speed is required, it is recommended to use an optimistic locking scheme, which executes successfully if it is successful, and fails if it fails, and does not need to wait for other concurrency to release the lock. Optimistic locking does not really lock, and it is efficient. Once the granularity of the lock is not well grasped, the probability of update failure will be relatively high, and business failure will easily occur.
  • Conflict frequency: If the conflict frequency is very high, it is recommended to use pessimistic locking to ensure the success rate. The frequency of conflicts is high, and choosing an optimistic lock will require multiple retries to succeed, and the cost is relatively high.
  • Retry cost: If the retry cost is high, it is recommended to use pessimistic locking. Pessimistic locks rely on database locks and are inefficient. The probability of an update failure is relatively low.
  • Optimistic locking If someone updates before you, your update should be rejected, allowing the user to re-operate. Pessimistic locks wait for the previous update to complete. This is also the difference.

Guess you like

Origin blog.csdn.net/zhiyikeji/article/details/123562209