"Python Basics" process and thread

  • 线程, the smallest execution unit;
  • 进程, the smallest resource allocation unit, consisting of at least one thread;

1. Multi-process

Unix/ LinuxOperating system fork()can copy the current process (parent process) (child process), and then return in the parent and child processes respectively, where the child process returns 0, the parent process returns the child process ID, and the child process getppid()obtains the parent process through process-id;

osThe system calls encapsulated by the module include fork;

import os

print(f'Process {
      
      os.getpid()} start...')
pid = os.fork()
if pid = 0:
    print(f'child process {
      
      os.getpid()}, parent is {
      
      os.getppid()}')
else:
    print(f'process {
      
      os.getpid()} created a child process {
      
      pid}')
Process 1798 start...
child process 1799, parent is 1798
process 1798 created a child process 1799

multiprocessing

Cross-platform multi-process templates;

from multiprocessing import Process

import os
import time


def proc(name):
    # time.sleep(5)
    print(f'run child process {
      
      name} ({
      
      os.getpid()})')


if __name__ == "__main__":
    print(f'parent process {
      
      os.getpid()}')
    # 创建 Process 实例
    p = Process(target=proc, args=('test', ))
    print('child process will start.')
    # 启动 p 子进程
    p.start()
    # 阻塞,等待 p 子进程结束后才继续往下执行
    p.join()
    print('child process end.')
parent process 4872
child process will start.
run child process test (14360)
child process end.

The main process will exit after the child process ends;

Pool

Process pool, used to start sub-processes in batches;

from multiprocessing import Pool
import os, time, random


def proc(name):
    print(f'run task {
      
      name} ({
      
      os.getpid()})')
    start = time.time()
    time.sleep(random.random() * 3)
    end = time.time()
    print(f'task {
      
      name} runs: {
      
      end-start}')


if __name__ == "__main__":
    print(f'parent process {
      
      os.getpid()}')
    # 正在运行的子进程个数
    p = Pool(4)
    for i in range(5):
        p.apply_async(proc, args=(i, ))
    print('waiting for all subprocesses done...')
    # close 后不能继续添加新的 process
    p.close()
    # 等待所有子进程执行完毕,必须在 close 后调用
    p.join()
    print('all subprocess done.')
parent process 14360
waiting for all subprocesses done...
run task 0 (10800)
run task 1 (7996)
run task 2 (4084)
run task 3 (3492)
task 0 runs: 1.7698190212249756
run task 4 (10800)
task 1 runs: 1.9010038375854492
task 2 runs: 2.1044681072235107
task 3 runs: 2.431309700012207
task 4 runs: 2.9220056533813477
all subprocess done.

subprocess

Create an external process and control its input and output, such as calling commands nslookupand interacting with it;

# 调用 snlookup 子进程
p = subprocess.call(['nslookup', 'www.python.org'])

# 启动 nslookup 子进程,并设置其输入输出流
p = subprocess.Popen(['nslookup'], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
# 向 p 子进程输入命令,并接收输出
output, err = p.communicate(b'set q=mx\npython.org\nexit\n')
# 打印输出
print(output.decode('gbk'))

interprocess pass

Python multiprocessingmodules provide ways to exchange data , such as Queue, etc.;Pipes

from multiprocessing import Process, Queue
import os, time, random


def write(q):
    print(f'process to write: {
      
      os.getpid()}')
    for value in ['A', 'B', 'C']:
        print(f'put {
      
      value} to queue...')
        q.put(value)
        time.sleep(random.random())


def read(q):
    print(f'process to read: {
      
      os.getpid()}')
    while True:
        value = q.get(True)
        print(f'get {
      
      value} from queue.')


if __name__ == "__main__":
    q = Queue()
    pw = Process(target=write, args=(q, ))
    pr = Process(target=read, args=(q, ))
    pw.start()
    pr.start()
    pw.join()
    # 写完即可关闭读进程
    pr.terminate()
process to write: 9472
put A to queue...
process to read: 10084
get A from queue.
put B to queue...
get B from queue.
put C to queue...
get C from queue.

2. Multithreading

threading

The Python standard library provides two thread modules: _thread(low-level module), threading(high-level module, _threadencapsulation);

import time, threading


def loop():
    print(f'thread {
      
      threading.current_thread().name} is running...')
    n = 0
    while n < 5:
        n = n + 1
        print(f'thread {
      
      threading.current_thread().name} >>> {
      
      n}')
        time.sleep(1)
    print(f'thread {
      
      threading.current_thread().name} ended.')


print(f'thread {
      
      threading.current_thread().name} is running...')
# 传入一个函数以创建 Thread 实例
t = threading.Thread(target=loop, name='LoopThread')
# 启动子线程
t.start()
# 阻塞等待子线程执行结束
t.join()
print(f'thread {
      
      threading.current_thread().name} ended.')
thread MainThread is running...
thread LoopThread is running...
thread LoopThread >>> 1
thread LoopThread >>> 2
thread LoopThread >>> 3
thread LoopThread >>> 4
thread LoopThread >>> 5
thread LoopThread ended.
thread MainThread ended.

Lock

In multiple processes, the same variable will be copied to each process without affecting each other;

In multithreading, all variables are shared by all threads;

A statement in a high-level language may be several statements when the CPU executes it. When multiple threads operate on the same variable at the same time, the variable may have been modified in the cache state. At this time, the execution result cannot satisfy "consistency" ( 事务analogous 一致性) ;

import threading
balance = 0
lock = threading.Lock()

def run_thread(n):
    for i in range(10000):
        # 获取锁
        lock.acquire()
        try:
            change_it(n)
        finally:
            # 释放锁
            lock.release()
  • , can avoid the above problems. At most one thread holds the same lock at the same time. Only when the lock is successfully acquired can the code continue to execute. Other threads trying to acquire the lock will be blocked, and the thread that knows the lock will release the lock;

harm

Prevent concurrent execution of multiple threads;

deadlock

There are multiple locks, and multiple threads attempt to acquire the locks held by each other, which will cause multiple threads to hang at the same time, and can only be terminated by the operating system;

GIL

Global Interpreter LockGlobal interpreter lock, any Python thread must be obtained before execution GIL, and it is automatically released every time 100 bytecodes are executed GIL, which leads to the fact that multiple threads can never actually use multi-core CPUs at the same time, and can only be executed alternately;

Multi-threaded concurrency in Python cannot take advantage of multi-core CPUs;

3. ThreadLocal

In a multi-threaded environment, global variables must be locked, which is better than using local variables;

Local variables need to be passed as parameters in the function call, which is very troublesome. ThreadLocalVariables can be introduced at this time;

  • ThreadLocal, although the variable is a global variable, each thread can only read its own independent copy of the thread without interfering with each other;
import threading

# 创建全局ThreadLocal对象:
local_school = threading.local()


def process_student():
    # 获取当前线程关联的student:
    std = local_school.student
    print('Hello, %s (in %s)' % (std, threading.current_thread().name))


def process_thread(name):
    # 绑定ThreadLocal的student:
    local_school.student = name
    process_student()


t1 = threading.Thread(target=process_thread, args=('Alice', ), name='Thread-A')
t2 = threading.Thread(target=process_thread, args=('Bob', ), name='Thread-B')
t1.start()
t2.start()
t1.join()
t2.join()

4. Process vs. Thread

multi-Progress

  • Advantages: good stability, sub-process hang up does not affect others;
  • Disadvantages: operation overhead such as creation and switching is huge;

child thread

  • Advantages: slightly less overhead;
  • Disadvantages: All threads share the memory of their process, one thread hangs up, and the whole process crashes;

thread switch

Multi-threaded parallelism is actually the effect of a single core quickly switching between multiple threads. When there are too many threads to a certain extent, thread switching will consume a lot of system resources and cause a sharp drop in efficiency;

Compute-intensive vs. IO-intensive

Computationally intensive

The consumption is all in the CPU, and the CPU is used with the highest efficiency. The number of tasks performed at the same time should be equal to the number of CPU cores;

Python has low operating efficiency and is not suitable for computational programs. It is best to use C language;

I/O intensive

Consumed in waiting for IO, less CPU consumption, more tasks, higher CPU efficiency;

The running efficiency of Python and C language on IO-intensive programs is not much different, and Python development efficiency is often higher;

Asynchronous I/O

Asynchronous IO support provided by the operating system, event-driven model;

The single-threaded asynchronous programming model in Python is called 协程;

5. Distributed process

Process can be distributed to multiple machines (distributed), and Thread can only be distributed to multiple CPUs of the same machine at most;

multiprocessingThe sub-modules of Python modules managerssupport distribution;

# task_master.py

import random, queue
from multiprocessing.managers import BaseManager

# 发送任务的队列:
task_queue = queue.Queue()
# 接收结果的队列:
result_queue = queue.Queue()


def get_task_queue():
    return task_queue


def get_result_queue():
    return result_queue


# 从BaseManager继承的QueueManager:
class QueueManager(BaseManager):
    pass


if __name__ == "__main__":

    # 把两个Queue都注册到网络上, callable参数关联了Queue对象:
    QueueManager.register('get_task_queue', callable=get_task_queue)
    QueueManager.register('get_result_queue', callable=get_result_queue)
    # 绑定端口5000, 设置验证码'abc':
    manager = QueueManager(address=('127.0.0.1', 5000), authkey=b'abc')
    # 启动Queue:
    manager.start()
    # 获得通过网络访问的Queue对象:
    task = manager.get_task_queue()
    result = manager.get_result_queue()
    # 放几个任务进去:
    for i in range(10):
        n = random.randint(0, 10000)
        print('Put task %d...' % n)
        task.put(n)
    # 从result队列读取结果:
    print('Try get results...')
    for i in range(10):
        r = result.get(timeout=100)
        print('Result: %s' % r)
    # 关闭:
    manager.shutdown()
    print('master exit.')
import time, queue
from multiprocessing.managers import BaseManager


# 创建类似的QueueManager:
class QueueManager(BaseManager):
    pass


if __name__ == "__main__":

    # 由于这个QueueManager只从网络上获取Queue,所以注册时只提供名字:
    QueueManager.register('get_task_queue')
    QueueManager.register('get_result_queue')

    # 连接到服务器,也就是运行task_master.py的机器:
    server_addr = '127.0.0.1'
    print('Connect to server %s...' % server_addr)
    # 端口和验证码注意保持与task_master.py设置的完全一致:
    manager = QueueManager(address=(server_addr, 5000), authkey=b'abc')
    # 从网络连接:
    manager.connect()
    # 获取Queue的对象:
    task = manager.get_task_queue()
    result = manager.get_result_queue()
    # 从task队列取任务,并把结果写入result队列:
    for i in range(10):
        try:
            n = task.get(timeout=1)
            print('run task %d * %d...' % (n, n))
            r = '%d * %d = %d' % (n, n, n * n)
            time.sleep(1)
            result.put(r)
        except queue.Queue.Empty:
            print('task queue is empty.')
    # 处理结束:
    print('worker exit.')

Through managersthe module, it is Queueexposed to the network for access by processes of other machines. It workeris not saved in Queue, but Queuestored in master;

QueueIt is used to transmit tasks and receive results. The description data of tasks should be as small as possible. If you want to transmit large-scale data, you can read it by yourself through shared disks, etc worker.;


PS: Thank you to everyone who has the same goals for reading, welcome to pay attention, comment, and praise!

Guess you like

Origin blog.csdn.net/ChaoMing_H/article/details/129455599