Python multi-thread basis

The official reference documentation

https://docs.python.org/zh-cn/3.7/library/threading.html#module-threading

Thread directly create a child thread

import threading
import time


def work(internal):
    name = threading.current_thread().name
    print(f"{name} start")
    time.sleep(internal)
    print(f"{name} end")


print("Main: ", threading.current_thread().name)
for i in range(5):
    thread_instance = threading.Thread(target=work, args=(i, ))
    thread_instance.start()

print("Main: end")

Here produced a total of three threads, which are two main thread and the child thread MainThread Thread-1, Thread-2. In addition, we observed that the first end of the run the main thread,
the Thread-. 1, the Thread-2 successive runs until the end of each interval of 1 second and 4 seconds. This shows that the main thread does not wait for the child thread has finished running before the end of the run, but directly pulled out, a little common sense.

The provisions of sub-main thread after thread exit

import threading
import time


def work(internal):
    name = threading.current_thread().name
    print(f"{name} start")
    time.sleep(internal)
    print(f"{name} end")


print("Main: ", threading.current_thread().name)
for i in range(5):
    thread_instance = threading.Thread(target=work, args=(i, ))
    thread_instance.start()
    # 规定主线程在子线程后退出 
    thread_instance.join()

print("Main: end")

There are about join

If you step on our test run time, you can discover whether it is run separately, or multi-threaded run, join the run time was about 10s.
(10 = 1 + 2 + 3 + 4) seem to have lost the meaning of multi-threaded operation, in fact, it is not correct to use the results of the join.

What, then, join the real meaning of it?
join the main thread will get stuck, and let the current sub-thread already start continue to run until you call the .join this thread has finished running.
So, we just need to join one of the longest thread can be.

import threading
import time


now = lambda :time.time()


def work(internal):
    name = threading.current_thread().name
    print(f"{name} start")
    time.sleep(internal)
    print(f"{name} end")


t1 = now()
print("Main: ", threading.current_thread().name)
for i in range(5):
    thread_instance = threading.Thread(target=work, args=(i, ))
    thread_instance.start()
    # 可规定主线程在子线程后退出
    if i == 4:
        thread_instance.join()

print(f"Main: end, Time: {now() - t1}")

Of course, this is the first we know which thread runs out, the case of which the thread running back finished.
In case we do not know which thread to run to completion, after which the future, need to join to each one.

We envision a scenario. Your use 10 threads reptile crawling 100 URL, the main thread needs to wait until all URL have been taken after the climb up, you can analyze the data. At this time, you can join the main thread stuck first,
until the end of the run all the 10 sub-thread, the main thread and then subsequent operations.
If I do not know which thread to run to completion, how do after running that thread? This time it should join each thread to perform the operation.
In this case, each thread is reasonable to use join:

thread_list = []
for _ in range(10):
    thread = threading.Thread(target=xxx, args=(xxx, xxx)) 换行thread.start()
    thread_list.append(thread)

for thread in thread_list:
    thread.join()

Create multi-threaded through inheritance way

import threading
import time


class MyThread(threading.Thread):
    def __init__(self, interval):
        super(MyThread, self).__init__()
        self.interval = interval

    def run(self):
        name = threading.current_thread().name
        print(f"{name} start")
        time.sleep(self.interval)
        print(f"{name} end")


print("Main: ", threading.current_thread().name)
for i in range(5):
    thread_instance = MyThread(i)
    thread_instance.start()
    # 可规定主线程在子线程后退出
    # 可规定主线程在子线程后退出
    if i == 4:
        thread_instance.join() 
print("Main: end")

Effects of the two implementations are the same.

Daemon thread

There is a concept called a daemon threads in the thread, if a thread is set as a daemon thread, it means that this thread is "important", which means that, if the main thread is over and the guardian of the thread has not finished running ,
then it will be forced to end. In Python we can be a daemon thread by thread to setDaemon method.

import threading
import time


now = lambda:time.time()


def work(internal):
    name = threading.current_thread().name
    print(f"{name} start")
    time.sleep(internal)
    print(f"{name} end")


thread_1 = threading.Thread(target=work, args=(1, ))
thread_2 = threading.Thread(target=work, args=(5, ))
thread_2.setDaemon(True)
thread_1.start()
thread_2.start()

print("Main End.")

Mutex

Multiple threads in a process of shared resources, such as in a process, there is a global variable count to count, and now we declare multiple threads, plus one count gave each thread is running,
let's look how effect, the code is implemented as follows:

import threading
import time

count = 0


class MyThread(threading.Thread):
    def __init__(self):
        super(MyThread, self).__init__()

    def run(self):
        global count
        temp = count + 1
        time.sleep(0.001)
        count = temp


def main():
    threads = []
    for _ in range(1000):
        thread_ = MyThread()
        thread_.start()
        threads.append(thread_)

    for t in threads:
        t.join()

    print("Final count: ", count)


main()

That way, according to common sense, the final count value should be 1000. But it is not true, let's run it and see.
Results are as follows:
Final COUNT: 69

Why is this? Because the count value is shared, each thread can execute to get the current value of temp count = count when this line of code, but these threads concurrently or some threads may be executed in parallel,
which leads to a different thread the same might be to get a count value, leading some thread count is incremented by 1 and the operation is not in force, leading to the final result too small.

So, if multiple threads simultaneously for a data read or modify, unpredictable results will appear. To avoid this, we need to synchronize multiple threads to achieve synchronization,
we can lock data protection need to operate, there is a need to use threading.Lock.

Lock protection What does it mean? That is, a thread before the data, you need to lock, so that other threads found after being locked up, we can not continue down, will always wait for a lock to be released,
only the locking thread lock release, other threads can continue to lock data and make changes, modifications finished and then release the lock. This ensures that only one thread at the same time operating data, multiple threads will not be simultaneous read and modify the same data,
so the final result is a pair of runs.

import threading
import time

count = 0
lock = threading.Lock()


class MyThread(threading.Thread):
    def __init__(self):
        super(MyThread, self).__init__()

    def run(self):
        global count
        # 获取锁
        lock.acquire()
        temp = count + 1
        time.sleep(0.001)
        count = temp
        # 释放锁
        lock.release()


def main():
    threads = []
    for _ in range(1000):
        thread_ = MyThread()
        thread_.start()
        threads.append(thread_)

    for t in threads:
        t.join()

    print("Final count: ", count)


main()

About Python Multithreaded

Due to limitations in Python GIL, resulting in either single-core or multi-core conditions, at the same time you can only run one thread, resulting in multi-threaded Python unable to take advantage of multi-core parallel.
GIL full name GlobalInterpreterLock, the Chinese translation for the global interpreter lock, which was originally designed for data security and consideration. In Python multiple threads, each thread implementation is as follows:

Get GIL
Corresponding to the code execution thread
GIL release
seen a thread wants to perform, you must first get GIL, we can be seen as GIL pass, and in a Python process, GIL only one. Get the thread passes, the implementation is not allowed.
This will lead to even under conditions of multi-core, more than one thread in Python process, the same time can only execute one thread.

But for IO-intensive tasks such reptiles, this problem is not too great. For compute-intensive tasks, because of the GIL, multi-threaded operating efficiency compared to overall may actually be lower than single-threaded.

furuiyang_

Published 291 original articles · won praise 104 · views 410 000 +

His message board concerns