multi-threaded understanding
Multithreading is a way of running multiple tasks at the same time. For example, in a loop, each loop is regarded as a task. We hope that the second loop can be started before the first loop runs, so as to save time.
The purpose of this simultaneous operation in python is to maximize the use of the computing power of the CPU and make use of a lot of waiting time. This also shows that if the program is time-consuming not because of waiting time, but because there are too many tasks, that is, it takes so long to calculate, then multi-threading cannot improve the running time.
For more information on multithreading understanding, please refer to the following materials
- Liao Xuefeng Tutorial
- Know the answer
- There are a lot of explanations on Baidu, so I won't repeat them here.
Simple to use
Look at the following function
import time def myfun(): time.sleep(1) a = 1 + 1 print(a)
If we want to run this function 10 times, its running time mainly depends on sleep
the second each time, and 1 + 1
the calculation will not take much time. In this case, multithreading can be used to improve efficiency.
Let's take a look at the time-consuming without multi-threading and the time-consuming with multi-threading
Do not use multithreading
t = time.time() for _ in range(5): myfun () print(time.time() - t)
The result is5.002434492111206
Below we use multithreading
from threading import Thread for _ in range(5): th = Thread(target = myfun) th.start()
In this way, you can actually use multi-threading. You will find that about 1 second, 5 2
will come out at the same time, indicating that the 5 loops are actually running almost at the same time.
Multithreading here only includes two steps
- To add a thread, here is to use
Thread
each loop as a new thread, and one thread executes amyfun
function. start()
To start running this thread, each thread needs to be explicitly enabled in this way to run . After a thread is started in this way, you can continue to run the following program without waiting for it to finish running, that is, the next cycle (and then create a second thread, and start the third one before the operation ends...)
One thing to note here: multi-threading is placed inside the loop, and it cannot be turned into multi-threading from the outside after the loop is defined.
The reader may notice that time is calculated programmatically without multithreading, but not with multithreading. This is because some code needs to be added to calculate the time, and the simplest multi-threading cannot be shown, so the time is not calculated first. Next we will talk about join()
the usage and calculate the time.
use of join
The thread join()
method means that after the thread is finished running, the program will run again. Let's look at the following example
from threading import Thread t = time.time() for _ in range(5): th = Thread(target = myfun) th.start() th.join() print (time.time() - t) #The result is 5.0047078132629395 seconds
start()
Immediately after this join()
, it means that each thread must run to the end before the next cycle can be performed, so it is no different from not using multithreading. However, if you want to calculate the multi-thread running time, you need to use thisjoin()
Let's take a look at join()
the unneeded
from threading import Thread t = time.time() for _ in range(5): th = Thread(target = myfun) th.start() print (time.time() - t) #The result is 0.0009980201721191406 seconds
It didn't even wait 1 second before outputting the result, and the 5 2s were output after printing this. This is because it print(time.time() - t)
is different from the 6th thread other than the 5th loop thread, it does not wait for the 5th thread to finish running before it starts running. Therefore, it is impossible to obtain the running time of the above 5 threads. We need to join()
wait for all 5 threads to finish running.
code show as below
from threading import Thread t = time.time() ths = [] for _ in range(5): th = Thread(target = myfun) th.start() ths.append(th) for th in ths: th.join() print (time.time() - t) #The result is 1.0038363933563232
The above definition ths
list stores these threads, and finally uses a loop to ensure that each thread has finished running and then calculates the time difference.
join()
Not just for this situation. When a step of code execution depends on the previous code execution to complete, a join()
command is added.
Now that we have learned the general usage of multithreading, we can use it in most scenarios. Here are some details
other
(1) Thread name
Let's look directly at the code below
import threading print(threading.current_thread().getName()) def myfun(): time.sleep(1) print(threading.current_thread().name) a = 1 + 1 for i in range(5): th = threading.Thread(target = myfun, name = 'thread {}'.format(i)) th.start() # 输出结果 MainThread thread 0 thread 1 thread 4 thread 3 thread 2
解释一下
threading.current_thread()
表示当前线程,可以调用name
或getName()
获取线程名称- 任何进程都会默认启动一个线程,默认名称为
MainThread
,也就是主程序占一个线程,这个线程和之后用Thread
新加的线程是相互独立的,主线程不会等待其余线程运行结束就会继续往下运行。之前不用join()
无法计算运行时间就是因为主线程先运行完了。 Thread
表示运行这个函数启动一个新的线程,在其中加一个name
参数指定这个函数线程名,则在这个函数内打印线程名就显示这里name
参数对应值- 在循环中打印有两种。第一种
print(threading.current_thread().name)
则是MainThread
;第二种print(th.name)
则是thread 1
等
(2)Thread函数
上面我们使用了Thread函数的target name
参数,下面来说一下它的其他参数
args
指定target
对应函数的参数,用元组传入,比如args = (3, )
daemon
主线程默认是False
,如果没有指定则继承父线程的值。True
则如果主线程运行结束,该线程也停止运行;False
则该线程会继续运行直到运行结束,无视主线程如何。(要看这个参数的效果要在py文件中编写代码,在cmd里运行,不能在jupyter notebook里,因为这里会多出一些线程干扰)group
是预留的一个参数,用于以后扩展ThreadGroup
类,现在没用
(3)Thread对象
上面threading.Thread
和threading.current_thread()
都创建了一个Thread对象,Thread对象有如下属性和方法
getName() .name
获取线程名setName()
设置线程名start() join()
这两个之前说过了join()
有一个timeout
参数,表示等待这个线程结束时,如果等待时间超过这个时间,就不再等,继续进行下面的代码,但是这个线程不会被中断run()
也是运行这个线程,但是必须等到这个线程运行结束才会继续执行之后的代码(如果将上面的start
全换成run
则相当于没有开多线程)is_alive()
如果该线程还没运行完,就是True
否则False
daemon
返回该线程的daemon
setDaemon(True)
设置线程的daemon
(4)threading
一些直接调用的变量
threading.currentThread()
: 返回当前的线程变量threading.enumerate()
: 返回一个包含正在运行的线程的listthreading.activeCount()
: 返回正在运行的线程数量,与len(threading.enumerate())
有相同的结果