线程
, the smallest execution unit;进程
, the smallest resource allocation unit, consisting of at least one thread;
Article directory
1. Multi-process
Unix
/ Linux
Operating system fork()
can copy the current process (parent process) (child process), and then return in the parent and child processes respectively, where the child process returns 0, the parent process returns the child process ID, and the child process getppid()
obtains the parent process through process-id;
os
The system calls encapsulated by the module include fork
;
import os
print(f'Process {
os.getpid()} start...')
pid = os.fork()
if pid = 0:
print(f'child process {
os.getpid()}, parent is {
os.getppid()}')
else:
print(f'process {
os.getpid()} created a child process {
pid}')
Process 1798 start...
child process 1799, parent is 1798
process 1798 created a child process 1799
multiprocessing
Cross-platform multi-process templates;
from multiprocessing import Process
import os
import time
def proc(name):
# time.sleep(5)
print(f'run child process {
name} ({
os.getpid()})')
if __name__ == "__main__":
print(f'parent process {
os.getpid()}')
# 创建 Process 实例
p = Process(target=proc, args=('test', ))
print('child process will start.')
# 启动 p 子进程
p.start()
# 阻塞,等待 p 子进程结束后才继续往下执行
p.join()
print('child process end.')
parent process 4872
child process will start.
run child process test (14360)
child process end.
The main process will exit after the child process ends;
Pool
Process pool, used to start sub-processes in batches;
from multiprocessing import Pool
import os, time, random
def proc(name):
print(f'run task {
name} ({
os.getpid()})')
start = time.time()
time.sleep(random.random() * 3)
end = time.time()
print(f'task {
name} runs: {
end-start}')
if __name__ == "__main__":
print(f'parent process {
os.getpid()}')
# 正在运行的子进程个数
p = Pool(4)
for i in range(5):
p.apply_async(proc, args=(i, ))
print('waiting for all subprocesses done...')
# close 后不能继续添加新的 process
p.close()
# 等待所有子进程执行完毕,必须在 close 后调用
p.join()
print('all subprocess done.')
parent process 14360
waiting for all subprocesses done...
run task 0 (10800)
run task 1 (7996)
run task 2 (4084)
run task 3 (3492)
task 0 runs: 1.7698190212249756
run task 4 (10800)
task 1 runs: 1.9010038375854492
task 2 runs: 2.1044681072235107
task 3 runs: 2.431309700012207
task 4 runs: 2.9220056533813477
all subprocess done.
subprocess
Create an external process and control its input and output, such as calling commands nslookup
and interacting with it;
# 调用 snlookup 子进程
p = subprocess.call(['nslookup', 'www.python.org'])
# 启动 nslookup 子进程,并设置其输入输出流
p = subprocess.Popen(['nslookup'], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
# 向 p 子进程输入命令,并接收输出
output, err = p.communicate(b'set q=mx\npython.org\nexit\n')
# 打印输出
print(output.decode('gbk'))
interprocess pass
Python multiprocessing
modules provide ways to exchange data , such as Queue
, etc.;Pipes
from multiprocessing import Process, Queue
import os, time, random
def write(q):
print(f'process to write: {
os.getpid()}')
for value in ['A', 'B', 'C']:
print(f'put {
value} to queue...')
q.put(value)
time.sleep(random.random())
def read(q):
print(f'process to read: {
os.getpid()}')
while True:
value = q.get(True)
print(f'get {
value} from queue.')
if __name__ == "__main__":
q = Queue()
pw = Process(target=write, args=(q, ))
pr = Process(target=read, args=(q, ))
pw.start()
pr.start()
pw.join()
# 写完即可关闭读进程
pr.terminate()
process to write: 9472
put A to queue...
process to read: 10084
get A from queue.
put B to queue...
get B from queue.
put C to queue...
get C from queue.
2. Multithreading
threading
The Python standard library provides two thread modules: _thread
(low-level module), threading
(high-level module, _thread
encapsulation);
import time, threading
def loop():
print(f'thread {
threading.current_thread().name} is running...')
n = 0
while n < 5:
n = n + 1
print(f'thread {
threading.current_thread().name} >>> {
n}')
time.sleep(1)
print(f'thread {
threading.current_thread().name} ended.')
print(f'thread {
threading.current_thread().name} is running...')
# 传入一个函数以创建 Thread 实例
t = threading.Thread(target=loop, name='LoopThread')
# 启动子线程
t.start()
# 阻塞等待子线程执行结束
t.join()
print(f'thread {
threading.current_thread().name} ended.')
thread MainThread is running...
thread LoopThread is running...
thread LoopThread >>> 1
thread LoopThread >>> 2
thread LoopThread >>> 3
thread LoopThread >>> 4
thread LoopThread >>> 5
thread LoopThread ended.
thread MainThread ended.
Lock
In multiple processes, the same variable will be copied to each process without affecting each other;
In multithreading, all variables are shared by all threads;
A statement in a high-level language may be several statements when the CPU executes it. When multiple threads operate on the same variable at the same time, the variable may have been modified in the cache state. At this time, the execution result cannot satisfy "consistency" ( 事务
analogous 一致性
) ;
import threading
balance = 0
lock = threading.Lock()
def run_thread(n):
for i in range(10000):
# 获取锁
lock.acquire()
try:
change_it(n)
finally:
# 释放锁
lock.release()
锁
, can avoid the above problems. At most one thread holds the same lock at the same time. Only when the lock is successfully acquired can the code continue to execute. Other threads trying to acquire the lock will be blocked, and the thread that knows the lock will release the lock;
harm
Prevent concurrent execution of multiple threads;
deadlock
There are multiple locks, and multiple threads attempt to acquire the locks held by each other, which will cause multiple threads to hang at the same time, and can only be terminated by the operating system;
GIL
Global Interpreter Lock
Global interpreter lock, any Python thread must be obtained before execution GIL
, and it is automatically released every time 100 bytecodes are executed GIL
, which leads to the fact that multiple threads can never actually use multi-core CPUs at the same time, and can only be executed alternately;
Multi-threaded concurrency in Python cannot take advantage of multi-core CPUs;
3. ThreadLocal
In a multi-threaded environment, global variables must be locked, which is better than using local variables;
Local variables need to be passed as parameters in the function call, which is very troublesome. ThreadLocal
Variables can be introduced at this time;
ThreadLocal
, although the variable is a global variable, each thread can only read its own independent copy of the thread without interfering with each other;
import threading
# 创建全局ThreadLocal对象:
local_school = threading.local()
def process_student():
# 获取当前线程关联的student:
std = local_school.student
print('Hello, %s (in %s)' % (std, threading.current_thread().name))
def process_thread(name):
# 绑定ThreadLocal的student:
local_school.student = name
process_student()
t1 = threading.Thread(target=process_thread, args=('Alice', ), name='Thread-A')
t2 = threading.Thread(target=process_thread, args=('Bob', ), name='Thread-B')
t1.start()
t2.start()
t1.join()
t2.join()
4. Process vs. Thread
multi-Progress
- Advantages: good stability, sub-process hang up does not affect others;
- Disadvantages: operation overhead such as creation and switching is huge;
child thread
- Advantages: slightly less overhead;
- Disadvantages: All threads share the memory of their process, one thread hangs up, and the whole process crashes;
thread switch
Multi-threaded parallelism is actually the effect of a single core quickly switching between multiple threads. When there are too many threads to a certain extent, thread switching will consume a lot of system resources and cause a sharp drop in efficiency;
Compute-intensive vs. IO-intensive
Computationally intensive
The consumption is all in the CPU, and the CPU is used with the highest efficiency. The number of tasks performed at the same time should be equal to the number of CPU cores;
Python has low operating efficiency and is not suitable for computational programs. It is best to use C language;
I/O intensive
Consumed in waiting for IO, less CPU consumption, more tasks, higher CPU efficiency;
The running efficiency of Python and C language on IO-intensive programs is not much different, and Python development efficiency is often higher;
Asynchronous I/O
Asynchronous IO support provided by the operating system, event-driven model;
The single-threaded asynchronous programming model in Python is called 协程
;
5. Distributed process
Process can be distributed to multiple machines (distributed), and Thread can only be distributed to multiple CPUs of the same machine at most;
multiprocessing
The sub-modules of Python modules managers
support distribution;
# task_master.py
import random, queue
from multiprocessing.managers import BaseManager
# 发送任务的队列:
task_queue = queue.Queue()
# 接收结果的队列:
result_queue = queue.Queue()
def get_task_queue():
return task_queue
def get_result_queue():
return result_queue
# 从BaseManager继承的QueueManager:
class QueueManager(BaseManager):
pass
if __name__ == "__main__":
# 把两个Queue都注册到网络上, callable参数关联了Queue对象:
QueueManager.register('get_task_queue', callable=get_task_queue)
QueueManager.register('get_result_queue', callable=get_result_queue)
# 绑定端口5000, 设置验证码'abc':
manager = QueueManager(address=('127.0.0.1', 5000), authkey=b'abc')
# 启动Queue:
manager.start()
# 获得通过网络访问的Queue对象:
task = manager.get_task_queue()
result = manager.get_result_queue()
# 放几个任务进去:
for i in range(10):
n = random.randint(0, 10000)
print('Put task %d...' % n)
task.put(n)
# 从result队列读取结果:
print('Try get results...')
for i in range(10):
r = result.get(timeout=100)
print('Result: %s' % r)
# 关闭:
manager.shutdown()
print('master exit.')
import time, queue
from multiprocessing.managers import BaseManager
# 创建类似的QueueManager:
class QueueManager(BaseManager):
pass
if __name__ == "__main__":
# 由于这个QueueManager只从网络上获取Queue,所以注册时只提供名字:
QueueManager.register('get_task_queue')
QueueManager.register('get_result_queue')
# 连接到服务器,也就是运行task_master.py的机器:
server_addr = '127.0.0.1'
print('Connect to server %s...' % server_addr)
# 端口和验证码注意保持与task_master.py设置的完全一致:
manager = QueueManager(address=(server_addr, 5000), authkey=b'abc')
# 从网络连接:
manager.connect()
# 获取Queue的对象:
task = manager.get_task_queue()
result = manager.get_result_queue()
# 从task队列取任务,并把结果写入result队列:
for i in range(10):
try:
n = task.get(timeout=1)
print('run task %d * %d...' % (n, n))
r = '%d * %d = %d' % (n, n, n * n)
time.sleep(1)
result.put(r)
except queue.Queue.Empty:
print('task queue is empty.')
# 处理结束:
print('worker exit.')
Through managers
the module, it is Queue
exposed to the network for access by processes of other machines. It worker
is not saved in Queue
, but Queue
stored in master
;
Queue
It is used to transmit tasks and receive results. The description data of tasks should be as small as possible. If you want to transmit large-scale data, you can read it by yourself through shared disks, etc worker
.;
PS: Thank you to everyone who has the same goals for reading, welcome to pay attention, comment, and praise!