Python program operation process - process pool (multiprocess.Pool)

Python program operation process - process pool (multiprocess.Pool)

First, the process pool

Why should the process pool? The concept of process pool.

In the program the actual processing problems in the process, there will be tens of thousands of busy tasks need to be performed, leisure may be sporadic tasks. So when thousands of tasks that need to be executed, we need to create hundreds of thousands of process it? First, create the need for time-consuming process, the destruction process will also require time-consuming. The second open even thousands of processes, the operating system does not allow them to perform at the same time, so it will affect the efficiency of the program. Therefore, we can not indefinitely according to the task or open end of the process. So how do we want it?

Here, we introduce the concept of a process to give the pool, the definition of a pond, put a fixed number of processes on the inside, there is a demand came, took the process a pool to handle the task, wait until the process is completed, the process does not shut down , but the process will continue to wait and then back into the process pool task. If there are many tasks to be performed, the number of process pool is not enough, we must wait until the task is completed the return process to perform the task, get the idle process to continue. That is, the number of processes in the pool is fixed, then the same time up to a fixed number of processes running. This will not increase the difficulty of scheduling the operating system, but also saves time opening and closing process, enables concurrent effect to some extent.

Second, the concept Introduction -multiprocess.Pool

Pool([numprocess [,initializer [, initargs]]]): Creating a process pool

Third, the parameter usage

  1. numprocess: The number of processes to be created if omitted, the default cpu_count()value
  2. initializer: is callable object for each worker process to be executed on startup, defaults to None
  3. initargs: a set of parameters to be passed initializer

Fourth, the main usage

p.apply(func [, args [, kwargs]]): Func perform in a pool worker process (* args, kwargs), and returns the result. It is emphasized that: This does not function at all and execute func pool worker process. If you are performing the function func concurrently by different parameters, it must be called from different threads p.apply () function or use p.apply_async () **

p.apply_async(func [, args [, kwargs]]): Execute func (* args, ** kwargs) in a pool worker process, and then returns the result. The result of this process is an instance of class AsyncResult, callback is callable objects, receives input parameters. When func result becomes available, it will be appreciated passed to the callback. callback is prohibited to perform any blocking operation, otherwise it will receive the results of other asynchronous operations.

p.close(): Pool closed process, prevent further action. If you continue to suspend all operations, which will be completed before the worker process is terminated

P.join(): Wait for all the work process to exit. This method can only be in close()or teminate()called after

Five other methods (to know)

The method apply_async()and map_async()the return value is AsyncResul instance obj. Examples of the following methods:

obj.get(): Returns the result, if it is necessary to wait for the results to arrive. timeout is optional. If you do not arrive within the specified time, it will trigger a. If the remote operation raises an exception, it will once again be raised when calling this method.

obj.ready(): If the call is complete, return True

obj.successful(): If the call is completed and no exception is thrown, returns True, if this method is called before the result is ready, throw an exception

obj.wait([timeout]): Wait for the results to become available.

obj.terminate(): Immediate termination of all work processes, without performing any cleanup or close any pending work. If p is garbage, this function will automatically call

Six, code examples -multiprocess.Pool

6.1 Synchronization

import os,time
from multiprocessing import Pool

def work(n):
    print('%s run' %os.getpid())
    time.sleep(3)
    return n**2

if __name__ == '__main__':
    p=Pool(3) #进程池中从无到有创建三个进程,以后一直是这三个进程在执行任务
    res_l=[]
    for i in range(10):
        res=p.apply(work,args=(i,)) # 同步调用,直到本次任务执行完毕拿到res,等待任务work执行的过程中可能有阻塞也可能没有阻塞
                                    # 但不管该任务是否存在阻塞,同步调用都会在原地等着
    print(res_l)

6.2 asynchronous

import os
import time
import random
from multiprocessing import Pool

def work(n):
    print('%s run' %os.getpid())
    time.sleep(random.random())
    return n**2

if __name__ == '__main__':
    p=Pool(3) #进程池中从无到有创建三个进程,以后一直是这三个进程在执行任务
    res_l=[]
    for i in range(10):
        res=p.apply_async(work,args=(i,)) # 异步运行,根据进程池中有的进程数,每次最多3个子进程在异步执行
                                          # 返回结果之后,将结果放入列表,归还进程,之后再执行新的任务
                                          # 需要注意的是,进程池中的三个进程不会同时开启或者同时结束
                                          # 而是执行完一个就释放一个进程,这个进程就去接收新的任务。  
        res_l.append(res)

    # 异步apply_async用法:如果使用异步提交的任务,主进程需要使用jion,等待进程池内任务都处理完,然后可以用get收集结果
    # 否则,主进程结束,进程池可能还没来得及执行,也就跟着一起结束了
    p.close()
    p.join()
    for res in res_l:
        print(res.get()) #使用get来获取apply_aync的结果,如果是apply,则没有get方法,因为apply是同步执行,立刻获取结果,也根本无需get

Seven process pool version socket concurrent chat

7.1server

#Pool内的进程数默认是cpu核数,假设为4(查看方法os.cpu_count())
#开启6个客户端,会发现2个客户端处于等待状态
#在每个进程内查看pid,会发现pid使用为4个,即多个客户端公用4个进程
from socket import *
from multiprocessing import Pool
import os

server=socket(AF_INET,SOCK_STREAM)
server.setsockopt(SOL_SOCKET,SO_REUSEADDR,1)
server.bind(('127.0.0.1',8080))
server.listen(5)

def talk(conn):
    print('进程pid: %s' %os.getpid())
    while True:
        try:
            msg=conn.recv(1024)
            if not msg:break
            conn.send(msg.upper())
        except Exception:
            break

if __name__ == '__main__':
    p=Pool(4)
    while True:
        conn,*_=server.accept()
        p.apply_async(talk,args=(conn,))
        # p.apply(talk,args=(conn,client_addr)) #同步的话,则同一时间只有一个客户端能访问

7.2client

from socket import *

client=socket(AF_INET,SOCK_STREAM)
client.connect(('127.0.0.1',8080))


while True:
    msg=input('>>: ').strip()
    if not msg:continue

    client.send(msg.encode('utf-8'))
    msg=client.recv(1024)
    print(msg.decode('utf-8'))

Found: open multiple concurrent clients, the server at the same time only four different pid, only the end of a client, another client will come.

Eight, the callback function

Need callback scenarios: Any process a task once the pool is finished processing, the main process immediately inform: My good amount, you can deal with my result. The main process is called a function to process the results, the callback function that is

We can put the elapsed time (blocking) of the task into the process pool, and then specify a callback function (main process responsible for implementation), so that the main process in the implementation of the callback function eliminating the need for I / O process, it directly to that the results of the task.

8.1 Use multiple processes to reduce the url request more time wasted waiting for network

from multiprocessing import Pool
import requests
import json
import os

def get_page(url):
    print('<进程%s> get %s' %(os.getpid(),url))
    respone=requests.get(url)
    if respone.status_code == 200:
        return {'url':url,'text':respone.text}

def pasrse_page(res):
    print('<进程%s> parse %s' %(os.getpid(),res['url']))
    parse_res='url:<%s> size:[%s]\n' %(res['url'],len(res['text']))
    with open('db.txt','a') as f:
        f.write(parse_res)


if __name__ == '__main__':
    urls=[
        'https://www.baidu.com',
        'https://www.python.org',
        'https://www.openstack.org',
        'https://help.github.com/',
        'http://www.sina.com.cn/'
    ]

    p=Pool(3)
    res_l=[]
    for url in urls:
        res=p.apply_async(get_page,args=(url,),callback=pasrse_page)
        res_l.append(res)

    p.close()
    p.join()
    print([res.get() for res in res_l]) #拿到的是get_page的结果,其实完全没必要拿该结果,该结果已经传给回调函数处理了

'''
打印结果:
<进程3388> get https://www.baidu.com
<进程3389> get https://www.python.org
<进程3390> get https://www.openstack.org
<进程3388> get https://help.github.com/
<进程3387> parse https://www.baidu.com
<进程3389> get http://www.sina.com.cn/
<进程3387> parse https://www.python.org
<进程3387> parse https://help.github.com/
<进程3387> parse http://www.sina.com.cn/
<进程3387> parse https://www.openstack.org
[{'url': 'https://www.baidu.com', 'text': '<!DOCTYPE html>\r\n...',...}]
'''

8.2 Examples of reptiles

import re
from urllib.request import urlopen
from multiprocessing import Pool

def get_page(url,pattern):
    response=urlopen(url).read().decode('utf-8')
    return pattern,response

def parse_page(info):
    pattern,page_content=info
    res=re.findall(pattern,page_content)
    for item in res:
        dic={
            'index':item[0].strip(),
            'title':item[1].strip(),
            'actor':item[2].strip(),
            'time':item[3].strip(),
        }
        print(dic)
if __name__ == '__main__':
    regex = r'<dd>.*?<.*?class="board-index.*?>(\d+)</i>.*?title="(.*?)".*?class="movie-item-info".*?<p class="star">(.*?)</p>.*?<p class="releasetime">(.*?)</p>'
    pattern1=re.compile(regex,re.S)

    url_dic={
        'http://maoyan.com/board/7':pattern1,
    }

    p=Pool()
    res_l=[]
    for url,pattern in url_dic.items():
        res=p.apply_async(get_page,args=(url,pattern),callback=parse_page)
        res_l.append(res)

    for i in res_l:
        i.get()

Nine, do not need the callback function

If you wait for all tasks in the main process after the implementation of all processes in the pool, and then unified the results, you do not need the callback function.

from multiprocessing import Pool
import time,random,os

def work(n):
    time.sleep(1)
    return n**2
if __name__ == '__main__':
    p=Pool()

    res_l=[]
    for i in range(10):
        res=p.apply_async(work,args=(i,))
        res_l.append(res)

    p.close()
    p.join() #等待进程池中所有进程执行完毕

    nums=[]
    for res in res_l:
        nums.append(res.get()) #拿到所有结果
    print(nums) #主进程拿到所有的处理结果,可以在主进程中进行统一进行处理

Guess you like

Origin www.cnblogs.com/Lin2396/p/11568390.html