Concurrent programming (threads and processes)

Concurrent processes and threads

Before talking about concurrency, we need to understand a few concepts: What is the serial, concurrent, parallel?

  • Serial: complete after executing a program and then to the next
  • Concurrent: between programs appear to be running simultaneously
  • Parallel: truly run

In addition, multi-channel technology , there are two things we need to look at (in brackets extended content):

  • Spatial multiplexing: a shared memory multi-process (Block -> pages -> + tab segment), but has its own separate memory space (base register, dynamic address relocation), non-interfering, physical isolation.
  • Time multiplexed: a common CPU (time sharing system)

Standby knowledge:

Process and thread has five states: New (New), Ready (Ready), running (Running), blocked (Blocked)

And death (Dead)


process

We are running a program called the process , then we can officially use the code to learn concurrent processes of it!

Two ways to open the child process:

  1. By specifying function as a target, create a Process object to generate a new process

    from multiprocessing import Process
    import time
    
    def task():
        print('进程 start')
        time.sleep(2)
        print('进程 end')
    
    
    if __name__ == '__main__':
        p = Process(target=task)
        p.start() # 向操作系统发送开子进程请求,具体什么时候开,开多长时间只和操作系统有关。
        time.sleep(5)
        print('主进程/父进程')
  2. Inheritance Process class, and override the run () method to create the Process class

    from multiprocessing import Process
    import time
    
    class Test(Process):
        def __init__(self,sex):
            super().__init__()
            self.sex = sex
    
        def run(self):
            '''run()方法为进程的执行体,执行start()后就会自动执行,不要直接调用!!!'''
    
            print(f'子进程的性别是{self.sex} start')
            time.sleep(2)
            print('子进程 end')
    if __name__ == '__main__':
        p = Test('女')
        p.start() # 向操作系统 发送开启子进程的请求
        print('主进程')

Process class method (Outline)

  • Executable override this method can achieve the process: run ()
  • start (): used to start the process
  • join ([timeout]): the current process must join the process of waiting to be executed in order to complete the execution down
  • The name of the property used to set or access the process: name
  • is_alive (): determine whether the process is still alive
  • daemon: This attribute is set for determining whether or daemon (background status)
  • pid: Returns the process ID
  • authkey: license key return process
  • terminate (): interrupt the process (request), exactly when the interrupt assigned by the operating system

The method of the Process class (detail)

  • join ([timeout]): the current process must join the process of waiting to be executed in order to complete the execution down

    If you do not specify a timeout, then after a process will wait indefinitely join the end of the process, if a timeout assigned, then it will wait for the maximum number of seconds, if you join the process is not over, then wait no longer.

    from multiprocessing import Process
    import time
    def foo(x):
        print('进程  start ')
        time.sleep(x)
        print('进程  end ')
    
    if __name__ == '__main__':
        p1 = Process(target=foo,args=(1,))
        p2 = Process(target=foo,args=(2,))
        p3 = Process(target=foo,args=(3,))
        start = time.time()
        p1.start() #
        p2.start() #
        p3.start() #
        # 核心需求就是
        p3.join() #1s
        p1.join() #1s
        p2.join() #1s
        # 总时长:按照最长的时间计算多一点。
        end = time.time()
        print(end-start) #3s多 or 6s多  ?  正解:3s多
        print('主')
  • daemon: This attribute is set for determining whether or daemon (background status)

    Characterized daemon is this: if the foreground process are dead, then the background thread will automatically deaths

    from multiprocessing import Process
    import time
    def foo():
        print('守护进程 start')
        time.sleep(3)
        print('守护进程 end')
    
    def task():
        print('子进程 start')
        time.sleep(5)
        print('子进程 end')
    
    if __name__ == '__main__':
        p = Process(target=foo)
        p2 = Process(target=task)
        p.daemon = True # 把这个子进程定义为了守护进程
        p.start()
        p2.start()
        time.sleep(1)
        print('主')
  • pid: Returns the process ID

    • Each process will have ID belonging to it, under normal circumstances only and all child processes are executed, the parent process will be unified recovery ( the wait () ) pid all child processes, this time dead child process is called It is a zombie process (zombie) . Zombie process will take up a lot of system resources, resulting in waste, the easiest way is to produce zombie kill the culprit, then all of a zombie process becomes orphaned by the init process to take over, init will wait () orphans process, they release system process table occupied resources, so, already dead orphans will be able to rest in peace process has left.

    • View pid of several ways:

      from multiprocessing import Process,current_process
      import time,os
      
      def task():
          print('子进程 start')
          print('在子进程中查看自己的pid',current_process().pid) # 在子进程中查看自己的pid
          print('在子进程中查看父进程的pid',os.getppid()) #
          time.sleep(2)
          print('子进程 end')
      
      if __name__ == '__main__':
          p = Process(target=task)
          p.start()
          print('在主进程查看子进程的pid',p.pid) # 一定要写在 start()之后
          print('主进程的pid',os.getpid())
          print('主进程的父进程pid',os.getppid())
          print('主')
      
      '''
      # 记住这些就ok了 这个是要掌握的
      # 角度 站在当前进程的角度
      os.getpid()#获取当前进程的pid
      os.getppid() #获取当前进程的父进程的pid
      子进程对象.pid #获取当前进程的子进程pid
      '''

Process safety and synchronization lock (Lock)

When multiple processes running at the same time, it is easy to get an error situation, such as grab votes, more than likely the same search and grab the tickets, we do not want that to happen, then we need to lock, multiprocessingproviding modules Lockand RLocktwo classes, which provide the following two methods for locking and releasing the lock:

  • acquire (blocking = Ture, timeout = -1): Lock request for lock or RLock, wherein the timeoutparameter specifies the number of seconds lock (resolves deadlock).

  • release (): release the lock

    So the difference between Lock and RLock as follows:

  • Lock: mutex (also known as mutex Mutex ), which is a basic lock object can only be locked each time, the rest of the lock request, you need to wait to acquire the lock is released.

  • RLock: reentrant lock ( a Reentrant Lock ), also known as recursive locks ( Recursive mutex ). For reentrant locks, it can be locked multiple times in the same process can also release several times. However acquire () and release () method must be paired, that is called n times acquire () lock, you must call the n-th release () to release the lock. RLock objects maintain a counter to keep track of.

Then we tried to grab votes to write about the code:

from  multiprocessing import Process,Lock
import json,time,os

def search():
    time.sleep(1) # 模拟网络io
    with open('db.txt',mode='rt',encoding='utf-8') as f:
        res = json.load(f)
        print(f'还剩{res["count"]}')

def get():
    with open('db.txt',mode='rt',encoding='utf-8') as f:
        res = json.load(f)
        # print(f'还剩{res["count"]}')
    time.sleep(1) # 模拟网络io
    if res['count'] > 0:
        res['count'] -= 1
        with open('db.txt',mode='wt',encoding='utf-8') as f:
            json.dump(res,f)
            print(f'进程{os.getpid()} 抢票成功')
        time.sleep(1.5) # 模拟网络io
    else:
        print('票已经售空啦!!!!!!!!!!!')

def task(lock):
    search()
   
    lock.acquire() # 锁住
    get()
    lock.release() # 释放锁头
    
if __name__ == '__main__':
    lock = Lock() # 写在主进程是为了让子进程拿到同一把锁.
    for i in range(15):
        p = Process(target=task,args=(lock,))
        p.start()
    #  进程锁 是把锁住的代码变成了串行

Supplementary (understand) semaphore (Semaphore):

from multiprocessing import Process, current_process, Semaphore
import time

def task(sm):
    sm.acquire()
    print(f'{current_process().name} 在执行')
    time.sleep(3)
    sm.release()

if __name__ == '__main__':
    sm = Semaphore(5)  # 指定同时最多有多少个进程在执行
    for i in range(15):
        t = Process(target=task,args=(sm,))
        t.start()

Supplementary (understand) GIL lock:

There are a lock GIL (Global Interpreter Lock) in Cpython interpreter, GIl lock is essentially a mutex.

Led under the same process, the same time can only run one thread, you can not take advantage of multi-core advantage.

Under the same process multiple concurrent threads can only be achieved can not be achieved in parallel.

The reason: because cpython own garbage collection is not thread safe, so be GIL lock.

Deadlock

We talked about the lock would have to mention a classic deadlock, simple to understand while two locks locked each other, are waiting for the other to release.

Here to share a little story:

面试官:你给我简单明了的说下死锁问题,我就把offer发给你。

应聘者:你把offer给我我就给你讲死锁问题

Deadlock should not appear in the program, in the preparation of the program should try to avoid deadlock. Here are several common ways

To solve the deadlock:

  • Use reentrant lock: Use RLock
  • Avoid multiple locking: Try to avoid multiple Lock to lock the same process.
  • Locking the same order: Let the locking between processes in the same order.
  • Lock using the timing: specify a timeout argument when calling acquire () method

Using a queue (Queue) control process communication

Queue module provided in a number of queue blocking, there are three main classes of queue, the main difference is that the inlet queues different queues. Briefly as follows:

  • queue.Queue (maxsize = 0): Representative FIFO (First In First Out) queue routine, MAXSIZE can limit the size of the queue, if the queue size of the queue reaches an upper limit, it will lock again into the element will be blocked, until queue elements are consumed. If maxsize is set to 0 or negative, the queue size is unlimited.
  • queue.LifoQueue (maxsize = 0): Representative LIFO (Last In advanced out) queue
  • queue.PriorityQueue (maxsize = 0): Representative priority queue, the priority of the smallest element in first-out queue, normally digital, a number of first-out.

These three queues substantially the same properties and methods, they provide the following attributes and methods:

  • put (item, block = True, timeout = None): element placed into the queue, if the queue is full, and the block parameter is True (blocking), the current process is blocked, the blocking time specified timeout, if the timeout is set to None , it represents the blocks until the elements of the queue are consumed; if the queue is full, and the block parameter is False (not blocked), then direct throw an exception.
  • get (item, block = True, timeout = None): Remove the element (element consumption) from the queue. And put a similar argument.
  • put_nowait (item): corresponding to the block is set to False.
  • get_nowait (item): corresponding to the block is set to False.
  • empty (): whether the queue is empty.
  • full (): whether the queue is full.
  • qsize (): returns the actual size of the queue (comprising several elements).

example:

# 我们也可以导入multiprocessing的Queue
from multiprocessing import Queue
q = Queue(2)
q.put('1')
q.put('2')

# q.put('3', timeout=1)    # 等1秒,还放不进去就报错
q.put('4', block=False)  # 报错 queue.Full

Process pool

Process pool is the base class concurrent.futuresmodule inside Executor, Executorprovides two sub-categories, namely, ProcessPoolExecutorand ThreadPoolExecutorwhich ProcessPoolExecutoris used to create process pool, and ThreadPoolExecutorused to create a thread pool.

The purpose of process pool / thread pool: When the number of concurrent tasks is far greater than could afford a computer, it can not open too many one-time number of tasks, we should consider going to limit my number of processes or threads, from guarantee The server does not collapse.

If we use the pools to manage concurrent programming, as long as the corresponding task function to process pool / thread pool, the rest of the things will be resolved by the pool.

Executor provides the following common methods:

  • submit (fn, * args, ** kwargs): fn function will be submitted to the process pool. The latter is the location parameters and keyword arguments to fn.

  • shutdown (wait = True): closed process pool

  • map (FUNC, iterables, timeout = None, chunkSize = 1) :( understand) the function is similar to a global function map (FUNC, iterables), but the function will start multiple processes to execute asynchronously map processing iterables.

    After the program will be submitted to (submit) to process pool task function, submit method returns a Future object, Future principal task of the process used to obtain the return value of the function. Asynchronous task execution process. Since the function execution process is equivalent to a "future perfect" job, so with Future to represent.

    Future main ways:

  • result (timeout = None): Gets the Future represents the final task of the process to return results. in case. If the Future process on behalf of the task has not been completed, which will block the current process, in which the blocking timeout parameter specifies a maximum number of seconds.

  • add_ done_ callback (fn): register a "callback" thread task on behalf of the future, when the task is completed successfully, the program will automatically trigger the fn function.

    Once you've used a process pool, the pool should call the process of shutdown () method, which will start off sequence process pool. After calling shutdown () method processes the pool no longer receive new tasks, but all will have been previously submitted to complete the task execution. When all tasks are executed to process pool, pool all processes in the process of death.

Example:

from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor
from threading import currentThread
from multiprocessing import current_process
import time


def task(i):
    # print(f'{currentThread().name} 在执行任务 {i}')
    print(f'进程 {current_process().name} 在执行任务 {i}')
    time.sleep(1)
    return i * 2


if __name__ == '__main__':
    # pool = ThreadPoolExecutor(4) # 池子里只有4个线程
    pool = ProcessPoolExecutor(4)  # 池子里只有4个线程
    fu_list = []
    for i in range(20):
        future = pool.submit(task, i)  # task任务要做20次,4个进程负责做这个事
        # print(future.result()) # 如果没有结果一直等待拿到结果,导致了所有的任务都在串行
        fu_list.append(future)
    pool.shutdown()  # 关闭了池的入口,会等待所有的任务执行完,结束阻塞.
    for fu in fu_list:
        print(fu.result())

Previous program called the result Future () method to get the return value of the thread task, but this method will block the current process, the process only until the task is completed, the blocking will result () method is released.

If we do not want to directly call result () method block the process, you can add a callback function by Future of add_ done_ callback () method, the callback function of the form fn (future). When the task is completed the process, the program will automatically trigger the callback function and the corresponding Future object as a parameter passed to the callback function.

The next program uses add_ done_ callback () method to get the return value of the thread task.

from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor
from threading import currentThread
from multiprocessing import current_process
import time


def task(i):
    # print(f'{currentThread().name} 在执行任务 {i}')
    print(f'进程 {current_process().name} 在执行任务 {i}')
    time.sleep(1)
    return i ** 2


def parse(future):
    # 处理拿到的结果
    print(future.result())


if __name__ == '__main__':
    # with ThreadPoolExecutor(4) as pool: # 池子里只有4个线程
    with ProcessPoolExecutor(4) as pool:  # 池子里只有4个线程
        fu_list = []
        for i in range(20):
            future = pool.submit(task, i)  # task任务要做20次,4个进程负责做这个事
            future.add_done_callback(parse)
            # 为当前任务绑定了一个函数,在当前任务执行结束的时候会触发这个函数,
            # 会把future对象作为参数传给函数
            # 这个称之为回调函数,处理完了回来就调用这个函数.

Tip: process pool / thread pool to achieve the context management protocol, therefore, can use with statements to manage process pool, so you can avoid manually shut down the process pool.

Added: We can also from multiprocessing.pool import Poolbe expressed process pool.


Thread

Method threads and processes basically similar, then the main difference between threads and processes is:

 操作系统可以同时执行多个任务,每个任务就是一个进程;进程可以同时执行多个任务,每一个任务就是一个线程。

Thread we also need to know two things:

  • Timer (Timer)
  • Coroutines: to achieve single-threaded concurrency, coroutine actually programmer abstracted out, the operating system does not know the coroutine exist, it said io a thread encountered an internal thread cut directly to other tasks, with this operating system It can not be found, to achieve maximum efficiency in single-threaded.
    • Advantages: its own operating system than the control switching switch faster.
    • Disadvantages: I have to detect all io, whenever there is a blockage whole are followed obstruction; and you can not use multi-core advantage.

Guess you like

Origin www.cnblogs.com/Du704/p/11569569.html