「Python の基本」のプロセスとスレッド

线程、最小の実行単位。
进程、少なくとも 1 つのスレッドで構成される最小のリソース割り当て単位。

記事ディレクトリ

1.マルチプロセス

Unix/Linuxオペレーティングシステムは、現在のプロセス（親プロセス）（子プロセス）をコピーしてから、親プロセスと子プロセスにそれぞれ戻ることfork()ができます。子プロセスは0を返し、親プロセスは子プロセスIDを返し、子プロセスgetppid()はプロセス ID を介した親プロセス。

osモジュールによってカプセル化されたシステムコールには、次のものが含まれますfork。

import os

print(f'Process {
      
      os.getpid()} start...')
pid = os.fork()
if pid = 0:
    print(f'child process {
      
      os.getpid()}, parent is {
      
      os.getppid()}')
else:
    print(f'process {
      
      os.getpid()} created a child process {
      
      pid}')

Process 1798 start...
child process 1799, parent is 1798
process 1798 created a child process 1799

マルチプロセッシング

クロスプラットフォームのマルチプロセステンプレート。

from multiprocessing import Process

import os
import time


def proc(name):
    # time.sleep(5)
    print(f'run child process {
      
      name} ({
      
      os.getpid()})')


if __name__ == "__main__":
    print(f'parent process {
      
      os.getpid()}')
    # 创建 Process 实例
    p = Process(target=proc, args=('test', ))
    print('child process will start.')
    # 启动 p 子进程
    p.start()
    # 阻塞，等待 p 子进程结束后才继续往下执行
    p.join()
    print('child process end.')

parent process 4872
child process will start.
run child process test (14360)
child process end.

子プロセスが終了すると、メインプロセスが終了します。

プール

サブプロセスをバッチで開始するために使用されるプロセスプール。

from multiprocessing import Pool
import os, time, random


def proc(name):
    print(f'run task {
      
      name} ({
      
      os.getpid()})')
    start = time.time()
    time.sleep(random.random() * 3)
    end = time.time()
    print(f'task {
      
      name} runs: {
      
      end-start}')


if __name__ == "__main__":
    print(f'parent process {
      
      os.getpid()}')
    # 正在运行的子进程个数
    p = Pool(4)
    for i in range(5):
        p.apply_async(proc, args=(i, ))
    print('waiting for all subprocesses done...')
    # close 后不能继续添加新的 process
    p.close()
    # 等待所有子进程执行完毕，必须在 close 后调用
    p.join()
    print('all subprocess done.')

parent process 14360
waiting for all subprocesses done...
run task 0 (10800)
run task 1 (7996)
run task 2 (4084)
run task 3 (3492)
task 0 runs: 1.7698190212249756
run task 4 (10800)
task 1 runs: 1.9010038375854492
task 2 runs: 2.1044681072235107
task 3 runs: 2.431309700012207
task 4 runs: 2.9220056533813477
all subprocess done.

サブプロセス

外部プロセスを作成し、コマンドの呼び出しやnslookup操作など、その入出力を制御します。

# 调用 snlookup 子进程
p = subprocess.call(['nslookup', 'www.python.org'])

# 启动 nslookup 子进程，并设置其输入输出流
p = subprocess.Popen(['nslookup'], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
# 向 p 子进程输入命令，并接收输出
output, err = p.communicate(b'set q=mx\npython.org\nexit\n')
# 打印输出
print(output.decode('gbk'))

プロセス間パス

Pythonmultiprocessingモジュールは、QueueなどのPipesデータを交換する方法を提供します。

from multiprocessing import Process, Queue
import os, time, random


def write(q):
    print(f'process to write: {
      
      os.getpid()}')
    for value in ['A', 'B', 'C']:
        print(f'put {
      
      value} to queue...')
        q.put(value)
        time.sleep(random.random())


def read(q):
    print(f'process to read: {
      
      os.getpid()}')
    while True:
        value = q.get(True)
        print(f'get {
      
      value} from queue.')


if __name__ == "__main__":
    q = Queue()
    pw = Process(target=write, args=(q, ))
    pr = Process(target=read, args=(q, ))
    pw.start()
    pr.start()
    pw.join()
    # 写完即可关闭读进程
    pr.terminate()

process to write: 9472
put A to queue...
process to read: 10084
get A from queue.
put B to queue...
get B from queue.
put C to queue...
get C from queue.

2.マルチスレッド

ねじ切り

Python 標準ライブラリは、_thread(低レベルモジュール)、threading(高レベルモジュール、_threadカプセル化) の 2 つのスレッドモジュールを提供します。

import time, threading


def loop():
    print(f'thread {
      
      threading.current_thread().name} is running...')
    n = 0
    while n < 5:
        n = n + 1
        print(f'thread {
      
      threading.current_thread().name} >>> {
      
      n}')
        time.sleep(1)
    print(f'thread {
      
      threading.current_thread().name} ended.')


print(f'thread {
      
      threading.current_thread().name} is running...')
# 传入一个函数以创建 Thread 实例
t = threading.Thread(target=loop, name='LoopThread')
# 启动子线程
t.start()
# 阻塞等待子线程执行结束
t.join()
print(f'thread {
      
      threading.current_thread().name} ended.')

thread MainThread is running...
thread LoopThread is running...
thread LoopThread >>> 1
thread LoopThread >>> 2
thread LoopThread >>> 3
thread LoopThread >>> 4
thread LoopThread >>> 5
thread LoopThread ended.
thread MainThread ended.

ロック

複数のプロセスでは、同じ変数が相互に影響を与えることなく各プロセスにコピーされます。

マルチスレッドでは、すべての変数がすべてのスレッドで共有されます。

高級言語のステートメントは、CPU が実行するときに複数のステートメントになる場合があります.複数のスレッドが同時に同じ変数に対して操作を行った場合、キャッシュ状態で変数が変更されている可能性があります.このとき、実行結果は「一貫性」を満たします (事务類似一致性)。

import threading
balance = 0
lock = threading.Lock()

def run_thread(n):
    for i in range(10000):
        # 获取锁
        lock.acquire()
        try:
            change_it(n)
        finally:
            # 释放锁
            lock.release()

锁, 上記の問題を回避できます. 多くても 1 つのスレッドが同時に同じロックを保持します. ロックが正常に取得された場合にのみ, コードは実行を継続できます. ロックを取得しようとする他のスレッドはブロックされます.ロックはロックを解除します。

危害

複数のスレッドの同時実行を防止します。

デッドロック

複数のロックがあり、複数のスレッドが互いに保持しているロックを取得しようとします。これにより、複数のスレッドが同時にハングし、オペレーティングシステムによってのみ終了できます。

ギル

Global Interpreter Lockグローバルインタープリターロック、任意の Python スレッドは実行前に取得する必要がありGIL、100 バイトコードが実行されるたびに自動的に解放されるためGIL、複数のスレッドが実際にマルチコア CPU を同時に使用することは決してなく、交互に実行されます。

Python のマルチスレッド同時実行では、マルチコア CPU を利用できません。

3.スレッドローカル

マルチスレッド環境では、グローバル変数をロックする必要があります。これは、ローカル変数を使用するよりも優れています。

関数呼び出しでローカル変数をパラメータとして渡す必要があり、これは非常に面倒ですが、ThreadLocalこの時点で変数を導入できます。

ThreadLocal、変数はグローバル変数ですが、各スレッドは、互いに干渉することなく、スレッドの独自の独立したコピーのみを読み取ることができます。

import threading

# 创建全局ThreadLocal对象:
local_school = threading.local()


def process_student():
    # 获取当前线程关联的student:
    std = local_school.student
    print('Hello, %s (in %s)' % (std, threading.current_thread().name))


def process_thread(name):
    # 绑定ThreadLocal的student:
    local_school.student = name
    process_student()


t1 = threading.Thread(target=process_thread, args=('Alice', ), name='Thread-A')
t2 = threading.Thread(target=process_thread, args=('Bob', ), name='Thread-B')
t1.start()
t2.start()
t1.join()
t2.join()

4. プロセスとスレッド

マルチプログレス

利点: 安定性が高く、サブプロセスのハングアップが他に影響を与えません。
短所: 作成や切り替えなどの操作のオーバーヘッドが非常に大きくなります。

子スレッド

利点: オーバーヘッドがわずかに少ない。
短所: すべてのスレッドがプロセスのメモリを共有し、1 つのスレッドがハングアップすると、プロセス全体がクラッシュします。

スレッドスイッチ

マルチスレッドの並列処理は、実際には単一のコアが複数のスレッド間をすばやく切り替えた結果であり、スレッドがある程度多すぎると、スレッドの切り替えにより多くのシステムリソースが消費され、効率が急激に低下します。

コンピューティング集約型 vs. IO 集約型

計算集約的

消費はすべてCPUで行われ、CPUは最高の効率で使用されます.同時に実行されるタスクの数は、CPUコアの数と同じでなければなりません;

Python は動作効率が悪く、計算プログラムには不向きなので、C 言語を使用するのが最適です。

I/O 集中型

IO の待機で消費され、CPU 消費が少なくなり、タスクが増え、CPU 効率が高くなります。

IO 集中型プログラムでの Python と C 言語の実行効率は大差なく、多くの場合、Python の開発効率の方が高くなります。

非同期入出力

オペレーティングシステムによって提供される非同期 IO サポート、イベントドリブンモデル。

Python のシングルスレッド非同期プログラミングモデルはと呼ばれます协程。

5.分散プロセス

プロセスは複数のマシンに分散 (分散) でき、スレッドは同じマシンの複数の CPU にのみ分散できます。

Pythonモジュールmultiprocessingのサブモジュールはmanagers配布をサポートしています。

# task_master.py

import random, queue
from multiprocessing.managers import BaseManager

# 发送任务的队列:
task_queue = queue.Queue()
# 接收结果的队列:
result_queue = queue.Queue()


def get_task_queue():
    return task_queue


def get_result_queue():
    return result_queue


# 从BaseManager继承的QueueManager:
class QueueManager(BaseManager):
    pass


if __name__ == "__main__":

    # 把两个Queue都注册到网络上, callable参数关联了Queue对象:
    QueueManager.register('get_task_queue', callable=get_task_queue)
    QueueManager.register('get_result_queue', callable=get_result_queue)
    # 绑定端口5000, 设置验证码'abc':
    manager = QueueManager(address=('127.0.0.1', 5000), authkey=b'abc')
    # 启动Queue:
    manager.start()
    # 获得通过网络访问的Queue对象:
    task = manager.get_task_queue()
    result = manager.get_result_queue()
    # 放几个任务进去:
    for i in range(10):
        n = random.randint(0, 10000)
        print('Put task %d...' % n)
        task.put(n)
    # 从result队列读取结果:
    print('Try get results...')
    for i in range(10):
        r = result.get(timeout=100)
        print('Result: %s' % r)
    # 关闭:
    manager.shutdown()
    print('master exit.')

import time, queue
from multiprocessing.managers import BaseManager


# 创建类似的QueueManager:
class QueueManager(BaseManager):
    pass


if __name__ == "__main__":

    # 由于这个QueueManager只从网络上获取Queue，所以注册时只提供名字:
    QueueManager.register('get_task_queue')
    QueueManager.register('get_result_queue')

    # 连接到服务器，也就是运行task_master.py的机器:
    server_addr = '127.0.0.1'
    print('Connect to server %s...' % server_addr)
    # 端口和验证码注意保持与task_master.py设置的完全一致:
    manager = QueueManager(address=(server_addr, 5000), authkey=b'abc')
    # 从网络连接:
    manager.connect()
    # 获取Queue的对象:
    task = manager.get_task_queue()
    result = manager.get_result_queue()
    # 从task队列取任务,并把结果写入result队列:
    for i in range(10):
        try:
            n = task.get(timeout=1)
            print('run task %d * %d...' % (n, n))
            r = '%d * %d = %d' % (n, n, n * n)
            time.sleep(1)
            result.put(r)
        except queue.Queue.Empty:
            print('task queue is empty.')
    # 处理结束:
    print('worker exit.')

managersモジュールを介して、他のマシンのプロセスからアクセスできるようにQueueネットワークに公開されます。workerQueueQueuemaster

Queueタスクの送信と結果の受信に使用されます.タスクの記述データはできるだけ小さくする必要があります.大規模なデータを送信する場合は、共有ディスクなどを介して自分で読み取ることができますworker;

追伸: 同じ目標を持って読んでくださっている皆さん、ありがとうございます。注目、コメント、賞賛を歓迎します!