foreword

Optimistic locking and pessimistic locking are frequently asked interview questions. This article will gradually introduce their basic concepts, implementation methods (including examples), applicable scenarios, and possible interviewer questions from the shallower to the deeper

1. Basic concepts

Optimistic locking and pessimistic locking are two ideas used to solve data competition problems in concurrent scenarios.

Optimistic lock : Optimistic lock is very optimistic when operating data, thinking that others will not modify data at the same time. Therefore, the optimistic lock will not lock, but only judge whether others have modified the data during the update: if others have modified the data, give up the operation, otherwise perform the operation.
Pessimistic lock : Pessimistic lock is more pessimistic when operating data, thinking that others will modify the data at the same time. Therefore, the data is directly locked when operating the data, and the lock will not be released until the operation is completed; during the locking period, other people cannot modify the data.

2. Implementation method (including examples)

Before explaining the implementation method, it needs to be clear: optimistic locking and pessimistic locking are two kinds of ideas, and their use is very extensive, not limited to a certain programming language or database.

The implementation of pessimistic locking is locking. Locking can be either locking code blocks (such as the synchronized keyword in Java) or locking data (such as exclusive locks in MySQL).

There are two main ways to implement optimistic locking: the CAS mechanism and the version number mechanism, which are described in detail below.

1、CAS（Compare And Swap）

The CAS operation includes 3 operands:

The memory location that needs to be read and written (V)
The expected value (A) to compare
new value to write (B)

The logic of the CAS operation is as follows: if the value of the memory location V is equal to the expected value of A, update the location to the new value B, otherwise do not perform any operation. Many CAS operations are spinning: if the operation is unsuccessful, it will be retried until the operation is successful.

Here leads to a new question, since CAS includes two operations of Compare and Swap, how does it guarantee atomicity? The answer is: CAS is an atomic operation supported by the CPU, and its atomicity is guaranteed at the hardware level.

Let's take the self-increment operation (i++) in Java as an example to see how pessimistic locks and CAS respectively ensure thread safety. We know that the self-increment operation in Java is not an atomic operation, it actually consists of three independent operations: (1) read the value of i; (2) add 1; (3) write the new value back to i

Therefore, if the self-increment operation is performed concurrently, the calculation result may be inaccurate. In the following code example: value1 does not have any thread safety protection, value2 uses optimistic locking (CAS), and value3 uses pessimistic locking (synchronized). Run the program and use 1000 threads to perform self-increment operations on value1, value2 and value3 at the same time. It can be found that the values of value2 and value3 are always equal to 1000, while the value of value1 is often less than 1000.

public class Test {
     
    //value1：线程不安全
    private static int value1 = 0;
    //value2：使用乐观锁
    private static AtomicInteger value2 = new AtomicInteger(0);
    //value3：使用悲观锁
    private static int value3 = 0;
    private static synchronized void increaseValue3(){
        value3++;
    }
     
    public static void main(String[] args) throws Exception {
        //开启1000个线程，并执行自增操作
        for(int i = 0; i < 1000; ++i){
            new Thread(new Runnable() {
                @Override
                public void run() {
                    try {
                        Thread.sleep(100);
                    } catch (InterruptedException e) {
                        e.printStackTrace();
                    }
                    value1++;
                    value2.getAndIncrement();
                    increaseValue3();
                }
            }).start();
        }
        //打印结果
        Thread.sleep(1000);
        System.out.println("线程不安全：" + value1);
        System.out.println("乐观锁(AtomicInteger)：" + value2);
        System.out.println("悲观锁(synchronized)：" + value3);
    }
}

First, let's introduce AtomicInteger. AtomicInteger is an atomic class provided by the java.util.concurrent.atomic package. It uses the CAS operation provided by the CPU to ensure atomicity. In addition to AtomicInteger, there are many atomic classes such as AtomicBoolean, AtomicLong, and AtomicReference.

Let's take a look at the source code of AtomicInteger to understand how its auto-increment operation getAndIncrement() is implemented (the source code takes Java7 as an example, Java8 is different, but the idea is similar).

public class AtomicInteger extends Number implements java.io.Serializable {

    //存储整数值，volatile保证可视性

    private volatile int value;

    //Unsafe用于实现对底层资源的访问

    private static final Unsafe unsafe = Unsafe.getUnsafe();



    //valueOffset是value在内存中的偏移量

    private static final long valueOffset;

    //通过Unsafe获得valueOffset

    static {

        try {

            valueOffset = unsafe.objectFieldOffset(AtomicInteger.class.getDeclaredField("value"));

        } catch (Exception ex) { throw new Error(ex); }
    }

    public final boolean compareAndSet(int expect, int update) {

        return unsafe.compareAndSwapInt(this, valueOffset, expect, update);

    }

    public final int getAndIncrement() {

        for (;;) {

            int current = get();

            int next = current + 1;

            if (compareAndSet(current, next))

                return current;
        }
    }
}

The source code analysis is as follows:

(1) The self-increment operation implemented by getAndIncrement() is a spin CAS operation: perform compareAndSet in a loop, and exit if the execution is successful, otherwise it will continue to execute.

(2) Among them, compareAndSet is the core of CAS operation, which is realized by using Unsafe object.

(3) Who is Unsafe? Unsafe is a class used to help Java access the underlying resources of the operating system (such as allocating memory and releasing memory). Through Unsafe, Java has the underlying operating capabilities, which can improve operating efficiency; the powerful underlying resource operating capabilities also bring security risks. (The class name Unsafe also reminds us of this), so it is not available to users under normal circumstances. AtomicInteger uses the CAS function provided by Unsafe here.

(4) valueOffset can be understood as the offset of value in memory, which corresponds to V in the three operands (V/A/B) of CAS; the offset is also obtained through Unsafe.

(5) The volatile modifier of the value field: To ensure thread safety in Java concurrent programming, it is necessary to ensure atomicity, visibility, and order; CAS operations can guarantee atomicity, and volatile can guarantee visibility and a certain degree of certainty. order; in AtomicInteger, volatile and CAS together ensure thread safety. The description of the working principle of volatile involves the Java Memory Model (JMM), which will not be expanded here in detail.

After talking about AtomicInteger, let's talk about synchronized. Synchronized ensures thread safety by locking the code block: at the same time, only one thread can execute the code in the code block. Synchronized is a heavyweight operation, not only because locking requires additional resources, but also because the switching of thread state will involve the conversion of operating system core state and user state; Spin locks, lightweight locks, lock coarsening, etc.), the performance of synchronized has been getting better and better.

2. Version number mechanism

In addition to CAS, the version number mechanism can also be used to implement optimistic locking. The basic idea of the version number mechanism is to add a field version to the data, indicating the version number of the data. Whenever the data is modified, the version number is incremented by 1. When a thread queries data, check out the version number of the data together; when the thread updates data, judge whether the current version number is consistent with the previously read version number, and only operate if they are consistent.

It should be noted that the version number is used here as a marker for judging data changes. In fact, other fields that can mark data versions, such as timestamps, can be selected according to the actual situation.

Let's take "update the number of players' gold coins" as an example (the database is MySQL, and other databases are the same), to see how the pessimistic lock and version number mechanism deal with concurrency issues.

Consider such a scenario: the game system needs to update the number of gold coins of the player, and the updated number of gold coins depends on the current state (such as the number of gold coins, level, etc.), so the current state of the player needs to be queried before updating.

The implementation below does not provide any thread safety protection. If other threads update the player's information between query and update, the player's gold coins will be inaccurate.

@Transactional

public void updateCoins(Integer playerId){

    //根据player_id查询玩家信息

    Player player = query("select coins, level from player where player_id = {0}", playerId);

    //根据玩家当前信息及其他信息，计算新的金币数

    Long newCoins = ……;

    //更新金币数

    update("update player set coins = {0} where player_id = {1}", newCoins, playerId);

}

In order to avoid this problem, pessimistic locking solves this problem by locking, the code is as follows. When querying player information, use select ... for update to query; this query statement will add an exclusive lock to the player data, and the exclusive lock will not be released until the transaction is committed or rolled back; during this period, if other threads Trying to update the player information or execute select for update will be blocked.

@Transactional

public void updateCoins(Integer playerId){

    //根据player_id查询玩家信息（加排它锁）

    Player player = queryForUpdate("select coins, level from player where player_id = {0} for update", playerId);

    //根据玩家当前信息及其他信息，计算新的金币数

    Long newCoins = ……;

    //更新金币数

    update("update player set coins = {0} where player_id = {1}", newCoins, playerId);

}

The version number mechanism is another way of thinking, which adds a field to player information: version. When the player information is queried for the first time, the version information is queried at the same time; when the update operation is performed, it is checked whether the version has changed, and if the version changes, the update will not be performed.

@Transactional

public void updateCoins(Integer playerId){

    //根据player_id查询玩家信息，包含version信息

    Player player = query("select coins, level, version from player where player_id = {0}", playerId);

    //根据玩家当前信息及其他信息，计算新的金币数

    Long newCoins = ……;

    //更新金币数，条件中增加对version的校验

    update("update player set coins = {0}, version = version + 1 where player_id = {1} and version = {2}", newCoins, playerId, player.version);

}

3. Advantages and disadvantages and applicable scenarios

There is no difference between optimistic locking and pessimistic locking. They have their own suitable scenarios; the following two aspects are explained.

1. Functional limitations

Compared with pessimistic locking, the applicable scenarios of optimistic locking are more restricted, whether it is CAS or version number mechanism.

For example, CAS can only guarantee the atomicity of a single variable operation. When multiple variables are involved, CAS is powerless, while synchronized can handle it by locking the entire code block. Another example is the version number mechanism. If the query is for table 1 and the update is for table 2, it is difficult to implement optimistic locking through a simple version number.

2. The intensity of competition

If both pessimistic and optimistic locking can be used, then the choice should consider the intensity of competition:

When the competition is not intense (the probability of concurrency conflicts is small), optimistic locking is more advantageous, because pessimistic locking will lock code blocks or data, and other threads cannot access them at the same time, which affects concurrency, and additional locking and releasing locks are required. H.
When the competition is fierce (the probability of concurrency conflicts is high), pessimistic locks are more advantageous, because optimistic locks frequently fail when performing updates, and need to be retried continuously, wasting CPU resources.

4. The interviewer asked: Is optimistic locking locked?

During the interview, the author once encountered such a question from the interviewer. Here is my understanding of the problem:

(1) Optimistic lock itself is not locked, just to judge whether the data has been updated by other threads when updating; AtomicInteger is an example.

(2) Sometimes optimistic locking may cooperate with locking operations. For example, in the aforementioned updateCoins() example, MySQL will add an exclusive lock when executing update. But this is just an example of the cooperation between optimistic locking and locking operations, and it cannot change the fact that "optimistic locking itself does not lock".

5. The interviewer asked: What are the disadvantages of CAS?

At this point in the interview, the interviewer may already like you. But the interviewer is going to launch a final attack on you: Do you know what are the shortcomings of this implementation of CAS?

Here are some not-so-perfect places for CAS:

1. ABA problems

Suppose there are two threads - thread 1 and thread 2, and the two threads perform the following operations in order:

(1) Thread 1 reads the data in the memory as A;

(2) Thread 2 modifies the data to B;

(3) Thread 2 modifies the data to A;

(4) Thread 1 performs CAS operation on data

In step (4), since the data in the memory is still A, the CAS operation is successful, but in fact the data has been modified by thread 2. This is the ABA problem.

In the case of AtomicInteger, ABA doesn't seem to do any harm. However, in some scenarios, ABA will bring hidden dangers, such as the top of the stack problem: the top of a stack has been changed twice (or more times) and then restored to its original value, but the stack may have changed.

For the ABA problem, a more effective solution is to introduce the version number. Every time the value in the memory changes, the version number will be +1; when performing CAS operations, not only the value in the memory but also the version number will be compared. Only when two CAS can only be executed successfully when neither of them has changed. The AtomicStampedReference class in Java uses version numbers to solve the ABA problem.

2. Overhead problems under high competition

In a highly competitive environment with a high probability of concurrent conflicts, if CAS keeps failing, it will keep retrying, and the CPU overhead will be high. One idea to solve this problem is to introduce an exit mechanism, such as failing to exit after the number of retries exceeds a certain threshold. Of course, it is more important to avoid using optimistic locks in high contention environments.

3. Functional limitations

The function of CAS is relatively limited. For example, CAS can only guarantee the atomicity of a single variable (or a single memory value) operation, which means: (1) Atomicity does not necessarily guarantee thread safety. For example, in Java, it needs to be used with volatile to ensure thread safety; (2) When multiple variables (memory values) are involved, CAS is powerless.

In addition, the implementation of CAS requires the support of the processor at the hardware level. In Java, ordinary users cannot directly use it, but can only use it with the help of atomic classes under the atomic package, and the flexibility is limited.

Isn't there an MVCC implementation in MySQL's transactions? Why do you have to add the version yourself when updating the data?

MVCC and optimistic lock/pessimistic lock achieve different effects. Take the previous example of updating gold coins:
1. If you do not use optimistic lock/pessimistic lock and only rely on MVCC, then in the first example, if there are other The thread updates the player's information between query and update, and the update will succeed, which will lead to inaccurate player gold coins. Because the main function of MVCC is to ensure that the data read multiple times in a transaction is consistent, it is not useful in this problem.
2. The way we add the version ourselves is to ensure that when other threads modify the version, the update will not succeed, so that the number of gold coins can be guaranteed to be accurate.

Pessimistic lock and optimistic lock (detailed explanation)