Concurrency issues in Kotlin coroutines: I obviously locked it with a mutex, why didn't it work?

foreword

In a project I took over recently, the supervisor sent me a bug that had been left for a long time, and asked me to check it out and fix it.

The problem of the project is probably that in a certain business, it is necessary to insert data into the database, and it is necessary to ensure that the same type of data is only inserted once, but now the data is repeatedly inserted.

I clicked on the code and saw that the last brother who ran away wrote it very carefully. The logic of judging repetition was nested layer by layer. First, the local database was queried once without repetition, and then the server was requested to query again. Finally, in the Query the local database again before inserting. A total of three layers of judgment logic were written. But why is it repeated?

Take a closer look, oh, it turns out that asynchronous query with coroutine is used, no wonder.

But, no, didn't you lock it with Mutex? How can it be repeated?

Mutex what are you doing? What did you lock? Look at what you are guarding.

At this time, the Mutex is like me, unable to protect anything.

But is Mutex really to blame? In this article, we will briefly analyze the problem that the use of Mutex to realize the concurrency of coroutines may lead to failure, and clear up the grievances of our honest Mutex.

Prerequisite knowledge: about coroutines and concurrency

It is well known that with multi-threaded programs, synchronization problems can arise, for example, the following classic example:

fun main() {
    
    
    var count = 0

    runBlocking {
    
    
        repeat(1000) {
    
    
            launch(Dispatchers.IO) {
    
    
                count++
            }
        }
    }

    println(count)
}

What do you think the above code will output?

I don't know, and I have no way of knowing, yes, it does.

Because in the above code, we loop 1000 times, start a new coroutine every time, and then perform countself-increment operation on in the coroutine.

The problem is that we can't guarantee that countthe operations on are synchronous, because we don't know when these coroutines will be executed, and we can't guarantee that the values ​​of these coroutines counthave not been modified by other coroutines during execution.

As a result, countthe value will end up being undefined.

Another well-known fact is that coroutines in kotlin can be simply understood as the encapsulation of threads, so in fact different coroutines may run in the same thread or in different threads.

We add a print thread to the above code:

fun main() {
    
    
    var count = 0

    runBlocking {
    
    
        repeat(1000) {
    
    
            launch(Dispatchers.IO) {
    
    
                println("Running on ${
      
      Thread.currentThread().name}")
                count++
            }
        }
    }

    println(count)
}

Capture part of the output:

Running on DefaultDispatcher-worker-1
Running on DefaultDispatcher-worker-4
Running on DefaultDispatcher-worker-3
Running on DefaultDispatcher-worker-2
Running on DefaultDispatcher-worker-5
Running on DefaultDispatcher-worker-5
Running on DefaultDispatcher-worker-2
Running on DefaultDispatcher-worker-6
Running on DefaultDispatcher-worker-2
Running on DefaultDispatcher-worker-2
Running on DefaultDispatcher-worker-7
Running on DefaultDispatcher-worker-7
Running on DefaultDispatcher-worker-7
Running on DefaultDispatcher-worker-7

……

It can be seen that different coroutines may run on different threads, or the same thread may be used to run different coroutines. Due to this feature, coroutines also have multi-threaded concurrency issues.

So, what is concurrency ?

Simply understand, it is to execute multiple tasks in the same time period . At this time, in order to achieve this purpose, different tasks may be split and executed interspersed.

Correspondingly, there is also a concept of parallelism , which simply means that multiple tasks are executed together at the same point in time :

1.png

In short, whether it is parallel or concurrent, it will involve "competition" for resources, because at the same time there may be multiple threads that need to operate on the same resource. At this time, the above example will appear. Since multiple threads are operating on count, the final countvalue of will be less than 1000, which is also easy to understand. For example, countit is 1 at this time. After being read by thread 1, the thread 1 began to perform +1 operation on it, but before thread 1 finished writing, thread 2 came, also read it and countfound that it was 1, and also performed +1 operation on it. At this point, no matter who finishes writing thread 1 or 2 first, countit will only be 2 in the end. Obviously, according to our needs, we should want it to be 3.

Then it's easy to solve this, let's not let there be so many threads, as long as there is only one thread?

Indeed, we specify that all coroutines execute on only one thread:

fun main() {
    
    
    // 创建一个单线程上下文,并作为启动调度器
    val dispatcher = newSingleThreadContext("singleThread")

    var count = 0

    runBlocking {
    
    
        repeat(1000) {
    
    
            // 这里也可以直接不指定调度器,这样就会使用默认的线程执行这个协程,换言之,都是在同一个线程执行
            launch(dispatcher) {
    
    
                println("Running on ${
      
      Thread.currentThread().name}")
                count++
            }
        }
    }

    println(count)
}

The final output of the interception is as follows:

……
Running on singleThread
Running on singleThread
Running on singleThread
Running on singleThread
Running on singleThread
Running on singleThread
Running on singleThread
Running on singleThread
Running on singleThread
Running on singleThread
Running on singleThread
1000

Process finished with exit code 0

It can be seen that the output countresult is finally correct, so why is there still a problem with my article?

Haha, in fact, you were caught by me.

What is the purpose of our use of coroutines (threads)? Isn't it just to be able to execute time-consuming tasks or to allow multiple tasks to be executed at the same time to reduce execution time? Now that you are using a single thread, what's the point?

After all, the code we exemplified here only operates counton this variable. There is really no need to open multi-threading, but there must be more than one such operation in actual work. Should we not continue because a certain variable is occupied by other threads? gone? Just block in place and wait? Obviously unrealistic, wake up, the world is not only count, there is still a lot of data waiting for us to process. So the purpose of using multithreading is to be able to process other unoccupied resources when a certain variable (resource) is not available, thereby shortening the total execution time.

However, what if other codes are executed to a certain extent and cannot be bypassed and must use the occupied resources?

Regardless of whether the occupying thread is unoccupied or not, directly take this resource and continue processing? Obviously unrealistic, because this will cause the situation described in our preface to happen.

So if we encounter the need to use the occupied resources, we should suspend the current thread until the occupation is released.

There are usually three ways to solve this problem in java:

  1. synchronized
  2. AtomicInteger
  3. ReentrantLock

But it is not suitable to use them in kotlin's coroutine, because the coroutine is non-blocking, when we need the coroutine to "pause" (for example), the coroutine is usually delay(1000)suspended, and the suspended coroutine is not It will block the thread where it is located, and at this time the thread can be freed to perform other tasks.

In java, when a thread needs to be suspended (for example Thread.sleep(1000)), the thread is usually blocked directly, and the thread will be restricted until the blocking ends.

In kotlin, a lightweight synchronization lock is provided: Mutex

What is Mutex

Mutex is a class used in kotlin coroutines to replace synchronizedor in java threads ReentrantLock. It is used to lock code that should not be executed by multiple coroutines at the same time, such as countlocking the self-increment code in the previous example, so that it can be guaranteed At the same point in time, only one coroutine will be executed, thereby avoiding data modification problems caused by multithreading.

Mutex has two core methods: lock()and unlock(), which are used to lock and unlock respectively:

fun main() {
    
    
    var count = 0
    val mutex = Mutex()

    runBlocking {
    
    
        repeat(1000) {
    
    
            launch(Dispatchers.IO) {
    
    
                println("Running on ${
      
      Thread.currentThread().name}")
                mutex.lock()
                count++
                mutex.unlock()
            }
        }
    }

    println(count)
}

The output of the above code is intercepted as follows:

……
Running on DefaultDispatcher-worker-47
Running on DefaultDispatcher-worker-20
Running on DefaultDispatcher-worker-38
Running on DefaultDispatcher-worker-15
Running on DefaultDispatcher-worker-14
Running on DefaultDispatcher-worker-19
Running on DefaultDispatcher-worker-48
1000

Process finished with exit code 0

It can be seen that although the coroutine runs in different threads, it can still countmodify correctly.

This is because we countcalled when modifying the value. mutex.lock()At this time, it is guaranteed that the following code block is only allowed to be executed by the current coroutine. Until the call mutex.unlock()unlocks, other coroutines can continue to execute this code block.

lockThe and principle of Mutex unlockcan be simply understood as, when calling lock, if the lock is not held by other coroutines, then hold the lock and execute the following code; if the lock is already held by other coroutines, the current coroutine The process enters the suspended state until the lock is released and the lock is acquired. When suspended, the thread it is in will not be blocked, but can perform other tasks. The detailed principle can be found in Reference 2.

In actual use, we generally do not use lock()and directly unlock(), because if an exception occurs in the code executed after locking, the held lock will never be released, which will cause a deadlock. The coroutine will never wait for the lock to be released, so it will be suspended forever:

fun main() {
    
    
    var count = 0
    val mutex = Mutex()

    runBlocking {
    
    
        repeat(1000) {
    
    
            launch(Dispatchers.IO) {
    
    
                try {
    
    
                    mutex.lock()
                    println("Running on ${
      
      Thread.currentThread().name}")
                    count++
                    count / 0
                    mutex.unlock()
                } catch (tr: Throwable) {
    
    
                    println(tr)
                }
            }
        }
    }

    println(count)
}

The above code outputs:

Running on DefaultDispatcher-worker-1
java.lang.ArithmeticException: / by zero

And the program will continue to execute and cannot be terminated.

In fact, it is very simple to solve this problem. We only need to add it finallyso that the code must release the lock no matter whether it is executed successfully or not:

fun main() {
    
    
    var count = 0
    val mutex = Mutex()

    runBlocking {
    
    
        repeat(1000) {
    
    
            launch(Dispatchers.IO) {
    
    
                try {
    
    
                    mutex.lock()
                    println("Running on ${
      
      Thread.currentThread().name}")
                    count++
                    count / 0
                    mutex.unlock()
                } catch (tr: Throwable) {
    
    
                    println(tr)
                } finally {
    
    
                    mutex.unlock()
                }
            }
        }
    }

    println(count)
}

The output of the above code is intercepted as follows:

……

Running on DefaultDispatcher-worker-45
java.lang.ArithmeticException: / by zero
Running on DefaultDispatcher-worker-63
java.lang.ArithmeticException: / by zero
Running on DefaultDispatcher-worker-63
java.lang.ArithmeticException: / by zero
Running on DefaultDispatcher-worker-63
java.lang.ArithmeticException: / by zero
1000

Process finished with exit code 0

It can be seen that although each coroutine reports an error, the program can be executed and will not be completely suspended.

In fact, here we can directly use the extension function of Mutex withLock:

fun main() {
    
    
    var count = 0
    val mutex = Mutex()

    runBlocking {
    
    
        repeat(1000) {
    
    
            launch(Dispatchers.IO) {
    
    
                mutex.withLock {
    
    
                    try {
    
    
                        println("Running on ${
      
      Thread.currentThread().name}")
                        count++
                        count / 0
                    } catch (tr: Throwable) {
    
    
                        println(tr)
                    }
                }
            }
        }
    }

    println(count)
}

The output of the above code is intercepted as follows:

……
Running on DefaultDispatcher-worker-31
java.lang.ArithmeticException: / by zero
Running on DefaultDispatcher-worker-31
java.lang.ArithmeticException: / by zero
Running on DefaultDispatcher-worker-51
java.lang.ArithmeticException: / by zero
Running on DefaultDispatcher-worker-51
java.lang.ArithmeticException: / by zero
Running on DefaultDispatcher-worker-51
java.lang.ArithmeticException: / by zero
1000

It can be seen that withLockafter using , we don't need to handle locking and unlocking ourselves, we only need to put the code that needs to be executed only once into the higher-order function in its parameters.

Here's a look at withLockthe source code:

public suspend inline fun <T> Mutex.withLock(owner: Any? = null, action: () -> T): T {
    
    
    // ……

    lock(owner)
    try {
    
    
        return action()
    } finally {
    
    
        unlock(owner)
    }
}

In fact, it is also very simple, that is, call in actionbefore executing the function we passed in and call in lock()after execution .finallyunlock()

Having said so much, readers may want to ask, you have been talking here for a long time, have you deviated from the topic? Where's your title? Why don't you say it?

Don't worry, don't worry, isn't it coming?

Why is it useless for me to use mutex.withLock?

Going back to our title and the scene in the preface, why is mutex.Unlockthe duplication check code locked after being clearly used in the project, or will there be repeated insertions?

I know you are in a hurry, but don't worry, let me show you another example:

fun main() {
    
    
    var count = 0
    val mutex = Mutex()

    runBlocking {
    
    
        mutex.withLock {
    
    
            repeat(10000) {
    
    
                launch(Dispatchers.IO) {
    
    
                    count++
                }
            }
        }
    }

    println(count)
}

Can you guess that this code can output 10000? Look at another piece of code:

fun main() {
    
    
    var count = 0
    val mutex = Mutex()

    runBlocking {
    
    
        mutex.withLock {
    
    
            repeat(100) {
    
    
                launch(Dispatchers.IO) {
    
    
                    repeat(100) {
    
    
                        launch(Dispatchers.IO) {
    
    
                            count++
                        }
                    }
                }
            }
        }
    }

    println(count)
}

What about this paragraph? Can you guess that it can output 10000?

In fact, as long as we think about it for a while, it is obviously impossible to output 10000.

Although we added it at the top mutex.lockWith. However, we have opened a lot of new coroutines in it, which means that, in fact, this lock is equal to no addition.

Remember mutex.lockWiththe source code we looked at above?

This is equivalent to just locklaunching a new coroutine, which is straightforward unlock, but the code that actually needs to be locked should be the code in the newly started coroutine.

Therefore, we should reduce the granularity of locking as much as possible when locking, and only lock the required code:

fun main() {
    
    
    var count = 0
    val mutex = Mutex()

    runBlocking {
    
    
        repeat(100) {
    
    
            launch(Dispatchers.IO) {
    
    
                repeat(100) {
    
    
                    launch(Dispatchers.IO) {
    
    
                        mutex.withLock {
    
    
                            count++
                        }
                    }
                }
            }
        }
    }

    println(count)
}

Here, what we need to lock is actually countthe operation on , so we only need to add the locking code to count++, run the code, and output 10000 perfectly.

With the above foreshadowing, let's take a look at the simplified code prototype of the project I took over:

fun main() {
    
    
    val mutex = Mutex()
    
    runBlocking {
    
     
        mutex.withLock {
    
    
        	// 模拟同时调用了很多次插入函数
            insertData("1")
            insertData("1")
            insertData("1")
            insertData("1")
            insertData("1")
        }
    }
}

fun insertData(data: String) {
    
    
    CoroutineScope(Dispatchers.IO).launch {
    
    
        // 这里写一些无关数据的业务逻辑
        // xxxxxxx
        
        // 这里进行查重 查重结果 couldInsert
        if (couldInsert) {
    
    
            launch(Dispatchers.IO) {
    
     
                // 这里将数据插入数据库
            }
        }
    }
}

Guess how many times the database will be inserted at this time 1?

The answer is obviously unpredictable, one, two, three, four, five times are possible.

Let's take a guess, this buddy's mental journey when writing this code:


产品:这里的插入数据需要注意一个类型只让插入一个数据啊

开发:好嘞,这还不简单,我在插入前加个查重就行了

提测后

测试:开发兄弟,你这里有问题啊,这个数据可以被重复插入啊

开发:哦?我看看,哦,这里查询数据库用了协程异步执行,那不就是并发问题吗?我搜搜看 kotlin 的协程这么解决并发,哦,用 mutex 啊,那简单啊。

于是开发一顿操作,直接在调用查重和插入数据的最上级函数中加了个 mutex.withlock 将整个处理逻辑全部上锁。并且觉得这样就万无一失了,高枕无忧了,末了还不忘给 kotlin 点个赞,加锁居然这么方便,不像 java 还得自己写一堆处理代码。

So, how do I solve this problem? The best solution, in fact, should be able to refine the locking granularity to specific database operations, but remember what I said above, this project has nested layer after layer of query code, want to It is obviously not easy to insert the lock code into it. I don't want to directly cause the whole mountain to collapse because of inserting a lock in it.

So my choice is to launchadd a bunch of locks to each place where a new coroutine is added...

This mountain has become taller because of me, hahahahaha.

Therefore, it is not actually a problem with the mutex, but only the people who use it.

References

  1. Kotlin coroutine - several solutions and performance comparison of concurrent security
  2. Solutions to Concurrency Problems in Kotlin Coroutines
  3. Coroutine concurrent synchronization Mutex Actor
  4. Read Concurrency and Parallelism in One Article
  5. Shared mutable state and concurrency

Guess you like

Origin blog.csdn.net/sinat_17133389/article/details/130894330