Talking about thread safety

In concurrent programming, if multiple threads access the same resources, we need to ensure that when access will not conflict with, data modification error does not occur, this is what we often say that the thread-safe .

Under what circumstances is it safe to access data? Under what circumstances is it unsafe to access data? How to know if your code is thread safe? How to access data to ensure data security?

This article will answer your questions one by one.

1. What is thread insecurity?

To understand what thread safety is, you must first understand what thread insecurity is.

For example, in the following code, two threads are started, and the global variable number is incremented 100,000 times, and incremented by 1 each time.

from threading import Thread, Lock

number = 0

def target():
    global number
    for _ in range(1000000):
        number += 1

thread_01 = Thread(target=target)
thread_02 = Thread(target=target)
thread_01.start()
thread_02.start()

thread_01.join()
thread_02.join()

print(number)

Normally our expected output results, one thread will increase by 1 million, and two threads will increase by 2 million. The output must be 2 million.

But the fact is not what you think. No matter how many times you run it, the output result will be different each time, and these output results have a characteristic that they are all less than 2 million.

The following is the result of three executions

1459782
1379891
1432921

This phenomenon is thread insecurity. The root cause is actually our operation number += 1, not atomic operation, which leads to thread insecurity.

2. What is an atomic operation?

Atomic operation refers to an operation that will not be interrupted by the thread scheduling mechanism. Once this operation starts, it will run to the end without switching to other threads.

It is somewhat similar to database transaction .

On the official Python documentation , some common atomic operations are listed

L.append(x)
L1.extend(L2)
x = L[i]
x = L.pop()
L1[i:j] = L2
L.sort()
x = y
x.field = y
D[x] = y
D1.update(D2)
D.keys()

The following are not atomic operations

i = i+1
L.append(L[-1])
L[i] = L[j]
D[x] = D[x] + 1

Like the above, I used an auto-increment operation number += 1, which is actually equivalent to number = number + 1that, it can be divided into multiple steps (reading and adding and then assigning), and it is not an atomic operation.

As a result, when multiple threads read at the same time, it is possible to read the same number value, read it twice, but only add it once, and eventually cause the number of self-increments to be less than expected.

When we were unable to determine whether our code are atomic, they can try disto see where the dis function module

When we execute this code, you can see number += 1this line of code to achieve the two-byte code.

  • BINARY_ADD : Add two values
  • STORE_GLOBAL: Re-assign the added value

Each bytecode instruction is a whole and cannot be divided. The effect it achieves is what we call atomic operations.

When a line of code is divided into multiple bytecode instructions, it means that only one bytecode instruction may be executed when the thread is switched. At this time, if there are variables or resources shared by multiple threads in this line of code If there is a write operation to the shared variable in the split multiple instructions, data conflicts will occur, resulting in inaccurate data.

For comparison, let's take one of the atomic operations from the list above and try to see if it is really the atomic operation mentioned on the official website.

Here I take the dictionary update operation as an example, the code and execution process are as follows

As you can see from the screenshot, info.update(new)although there are several operations

  • LOAD_GLOBAL: Load global variables
  • LOAD_ATTR: Load properties, get update method
  • LOAD_FAST: Load the new variable
  • CALL_FUNCTION:Call functions
  • POP_TOP: Perform an update operation

But we need to know that what really leads to data conflicts is actually not a read operation, but a write operation.

With so many bytecode instructions above, there is only one write operation ( POP_TOP ), so the dictionary update method is an atomic operation.

3. Realize artificial atomic operations

In multithreading, we cannot guarantee that our code is atomic, so how to make our code "atomic" is a very important matter.

The method is also very simple, that is, when you are accessing a resource shared between multiple threads, locking can achieve an effect similar to an atomic operation. If a code is not executed, it must be executed if it is executed before it can accept thread scheduling.

Therefore, we use the method of locking and make some modifications to the first example to make it atomic.

from threading import Thread, Lock


number = 0
lock = Lock()


def target():
    global number
    for _ in range(1000000):
        with lock:
            number += 1

thread_01 = Thread(target=target)
thread_02 = Thread(target=target)
thread_01.start()
thread_02.start()

thread_01.join()
thread_02.join()

print(number)

At this point, no matter how many times you execute it, the output is 2000000.

4. Why is Queue thread safe?

There are three main message communication mechanisms in Python's threading module:

  1. Event
  2. Condition
  3. Queue

The most used is Queue, and we all know that it is thread-safe. When we write and extract operations to it will not be interrupted and cause errors, which is why we do not need to add additional locks when using the queue.

How did he do it?

The fundamental reason is that Queue implements the lock primitive, so it can implement artificial atomic operations as in Section 3.

Primitive refers to a section of program that is composed of several machine instructions and completes a certain function. It is indivisible; that is, the execution of primitive must be continuous and cannot be interrupted during execution.

Reference article:

https://zhuanlan.zhihu.com/p/34150765

https://juejin.im/post/5b129a1be51d45068a6c91d4#comment

Guess you like

Origin blog.csdn.net/weixin_36338224/article/details/109299610