python in the process pool and a callback function

First, data sharing

1. Communication between process should try to avoid sharing data the way

2. The data between processes are independent, you can communicate by means of a queue or pipeline, both of which are based on message passing.

Although inter-process data independently, but can be used Manager data sharing, in fact Manager features much more than that.

1
2
3
4
命令就是一个程序,按回车就会执行(这个只是在windows情况下)
tasklist 查看进程
tasklist | findstr  pycharm    #(findstr是进行过滤的),|就是管道(tasklist执行的内容就放到管道里面了,
管道后面的findstr  pycharm就接收了)

There are two ways to achieve communication between 3. (IPC) Process: pipes and queue

Import Manager multiprocessing from, Process, Lock 
 2 DEF Work (DIC, the mutex): 
 . 3 # mutex.acquire () 
 . 4 # DIC [ 'COUNT'] - =. 1 
 . 5 mutex.release # () 
 . 6 may be so locked # 
 7 the mutex with: 
 . 8 DIC [ 'COUNT'] - =. 1 
 . 9 IF the __name__ == '__main__': 
10 = lock the mutex () 
. 11 m = Manager () # sharing, since the dictionary is shared dictionary, the locks have to add 
share_dic = m.dict 12 is ({ 'COUNT': 100}) 
13 is P_L = [] 
14 for I in Range (100): 
15 = P Process (target = Work, args = (share_dic, the mutex)) 
16 p_l.append (p) # first added to it 
17 p.start () 
18 for i in P_L: 
19 i.join ()  
20 Print (share_dic)
21 # shared it means that there will be competition,

Second, the process pool

When using the Python system management, in particular multiple files simultaneously directory or multiple remote hosts, parallel operation can save a lot of time. Multi-process is one of the means to achieve concurrent problems that need attention are:

  1. Obviously tasks require concurrent execution is usually much larger than the number of nuclear
  2. An operating system can not be unlimited open process, there are usually several nuclear opened several processes
  3. Open process too much, but efficiency drops (open process is the need to take up system resources, and open the extra number of nuclear processes can not be done in parallel)

For example, when the number of objects is not operated, can be directly used in Process multiprocessing dynamic processes into a plurality of students, more than a dozen good, but if it is hundreds, thousands. . . To limit the number of manual processes but too cumbersome, then you can play the efficacy of the process pool.

So what is the process pool it? Process pool is to control the number of processes

 ps: For advanced applications of remote procedure calls, should use the process pool, Pool may provide a specified number of processes for users to call, when a new request is submitted to the pool if the pool is not yet full, it will create a new process for executing the request; but if the number of processes that the pool has reached the specified maximum, then the request will wait until the end of the process there is a pool, the pool will reuse process process. 

Process pool structure:

 Create a process pool class: If you specify numprocess 3, the process pool will be created from scratch three process, and then used throughout the three processes to perform all tasks, not open to other processes

1. Create a process pool

1
Pool([numprocess  [,initializer [, initargs]]]):创建进程池

2. Parameter Description

1
2
3
numprocess:要创建的进程数,如果省略,将默认为cpu_count()的值,可os.cpu_count()查看
initializer:是每个工作进程启动时要执行的可调用对象,默认为 None
initargs:是要传给initializer的参数组

3. The method of introduction

1
2
3
4
5
6
7
8
9
10
11
12
13
p. apply (func [, args [, kwargs]]):在一个池工作进程中执行
func( * args, * * kwargs),然后返回结果。
需要强调的是:此操作并不会在所有池工作进程中并执行func函数。
如果要通过不同参数并发地执行func函数,必须从不同线程调用p. apply ()
函数或者使用p.apply_async()
 
 
p.apply_async(func [, args [, kwargs]]):在一个池工作进程中执行func( * args, * * kwargs),然后返回结果。此方法的结果是AsyncResult类的实例,
callback是可调用对象,接收输入参数。当func的结果变为可用时,
将理解传递给callback。callback禁止执行任何阻塞操作,
否则将接收其他异步操作中的结果。
    
p.close():关闭进程池,防止进一步操作。禁止往进程池内在添加任务(需要注意的是一定要写在close()的上方)
1
P.jion():等待所有工作进程退出。此方法只能在close()或teminate()之后调用

Application 1:

Pool multiprocessing Import from 
 2 Import OS, Time 
 . 3 DEF Task (n-): 
 . 4 Print ( '[% S] IS running' os.getpid% ()) 
 . 5 the time.sleep (2) 
 . 6 Print ( '[% S] IS DONE 'os.getpid% ()) 
 . 7 2 ** n-return 
 . 8 IF the __name__ ==' __main__ ': 
 . 9 Print # (os.cpu_count ()) # number View CPU 
10 Pool P = (. 4) maximum four # process 
11 for i in range (1,7) : # open seven tasks 
12 res = p.apply (task, args = (i,)) # sync, waiting for a run until they have finished implementation of another 
13 print ( ' the end of this task:% S 'RES%) 
14 p.close () # prohibit the addition to procedure task pool 
15 p.join () # in processes such as cell 
16 print (' primary ')

---------------- # 
 2 # So why do we use the process pool it? This is because the process pool use to control the number of processes, 
 3 # we need a few opened several processes. If you do not achieve concurrent process pool, then the process will open a lot of 
 4 # If you drive a process particularly large, then your machine will be very slow, so we control the process, with a few on the 
 5 # to open a few, too does not take up too much memory 
 . 6 Import from Pool multiprocessing 
 . 7 Import OS, Time 
 . 8 DEF Walk (n-): 
 . 9 Print ( 'Task [% S] running ...'% os.getpid ()) 
10 the time.sleep (. 3) 
2 ** n-return. 11 
12 is the __name__ IF == '__main__': 
13 is P = Pool (. 4) 
14 res_obj_l = [] 
15 for I in Range (10): 
16 = p.apply_async RES (Walk, args = (I, )) 
17 # Print (RES) # print out the object 
18 res_obj_l.append (res) # so now get is a list of how to get the value of it? We use a method .get 
19 p.close () # add tasks prohibited to process the pool
P.join 20 is () 
21 is Print # (res_obj_l) 
22 is Print ([obj.get () for obj in res_obj_l]) # was thus obtained

So what is synchronous, asynchronous what is it?

Synchronization refers to a process in the implementation of a request, if the request is take some time to return information, then this process will wait forever until it receives return information only continue execution

Asynchronous means that the process does not need to wait forever, but continue to do the following, regardless of the state of other processes. When a message is returned to the system will notify the process to be processed, which can improve the efficiency of execution.

What is a serial, parallel what is it?

For example: to open several cars side by side can be said to be "parallel", only one belongs to open a "serial" the. Clearly, much faster than the speed of parallel serial. (Parallel independently of each other, one after the other over a serial to wait)

Application 2:

The use of process pool maintenance of a fixed number of processes (to improve client and server previously)

from socket import *
 2 from multiprocessing import Pool
 3 s = socket(AF_INET,SOCK_STREAM)
 4 s.setsockopt(SOL_SOCKET,SO_REUSEADDR,1) #端口重用
 5 s.bind(('127.0.0.1',8081))
 6 s.listen(5)
 7 print('start running...')
 8 def talk(coon,addr):
 9     while True:  # 通信循环
10         try:
11             cmd = coon.recv(1024)
12             print(cmd.decode('utf-8'))
13             if not cmd: break
14             coon.send(cmd.upper())
15             print('发送的是%s'%cmd.upper().decode('utf-8'))
16         except Exception:
17             break
18     coon.close()
19 if __name__ == '__main__':
Pool P = 20 is (. 4) 
21 is the while True: # cycle link 
22 is Coon, addr = s.accept () 
23 is Print (Coon, addr) 
24 p.apply_async (Talk, args = (Coon, addr)) 
25 S.CLOSE () 
26 # because it is a cycle, so you do not have the p.join

from socket import *
 2 c = socket(AF_INET,SOCK_STREAM)
 3 c.connect(('127.0.0.1',8081))
 4 while True:
 5     cmd = input('>>:').strip()
 6     if not cmd:continue
 7     c.send(cmd.encode('utf-8'))
 8     data = c.recv(1024)
 9     print('接受的是%s'%data.decode('utf-8'))
10 c.close()

 

Third, the callback function

1
2
3
4
5
6
7
回调函数什么时候用?(回调函数在爬虫中最常用)
造数据的非常耗时
处理数据的时候不耗时
 
 
你下载的地址如果完成了,就自动提醒让主进程解析
谁要是好了就通知解析函数去解析(回调函数的强大之处)

Need callback scenarios: Any process a task once the pool is finished processing, the main process immediately inform: My good amount, you can deal with my result. The main process is called a function to process the results, the callback function that is

We can put the elapsed time (blocking) of the task into the process pool, and then specify a callback function (main process responsible for implementation), so that the main process in the implementation of the callback function eliminating the need for I / O process, it directly to that the results of the task.

from  multiprocessing import Pool
 2 import requests
 3 import os
 4 import time
 5 def get_page(url):
 6     print('<%s> is getting [%s]' %(os.getpid(),url))
 7     response = requests.get(url)  #得到地址
 8     time.sleep(2)
 9     print('<%s> is  done [%s]'%(os.getpid(),url))
10     return {'url':url,'text':response.text}
11 def parse_page(res):
12     '''解析函数'''
13     print('<%s> parse [%s]'%(os.getpid(),res['url']))
14     with open('db.txt','a') as f:
15         parse_res = 'url:%s size:%s\n' %(res['url'],len(res['text']))
16         f.write(parse_res)
17 if __name__ == '__main__':
18     p = Pool(4)
= URLs. 19 [ 
20 is' https://www.baidu.com ', 
21 is' http://www.openstack.org', 
22 is' https://www.python.org ', 
23 is' HTTPS: // Help .github.com / ', 
24' http://www.sina.com.cn/ ' 
25] 
26 is for URLs in URL: 
27 obj = p.apply_async (the get_page, args = (URL,), the callback = parse_page) 
p.close 28 () 
29 p.join () 
30 Print ( 'primary', os.getpid ()) # do not .get () method of the

 If you wait for all tasks in the main process after the implementation of all processes in the pool, and then unified the results, you do not need the callback function

from  multiprocessing import Pool
 2 import requests
 3 import os
 4 def get_page(url):
 5     print('<%os> get [%s]' %(os.getpid(),url))
 6     response = requests.get(url)  #得到地址  response响应
 7     return {'url':url,'text':response.text}
 8 if __name__ == '__main__':
 9     p = Pool(4)
10     urls = [
11         'https://www.baidu.com',
12         'http://www.openstack.org',
13         'https://www.python.org',
14         'https://help.github.com/',
15         'http://www.sina.com.cn/'
16     ]
17     obj_l= []
18     for url in urls:
19         obj = p.apply_async(get_page,args=(url,))
20         obj_l.append(obj)
21     p.close()
22     p.join()
23     print([obj.get() for obj in obj_l])

 

Guess you like

Origin www.cnblogs.com/intruder/p/10936222.html