python process

process creation - fork

1. Process vs Program

The code that has been written, when it is not running, is called a program

The running code becomes the process

A process, in addition to containing code, also has an environment that needs to run, etc., so it is different from a program

2. fork( )

Python's os module encapsulates common system calls, including fork, which can easily create subprocesses in Python programs:

    import os

    # 注意,fork函数,只在Unix/Linux/Mac上运行,windows不可以
    pid = os.fork()

    if pid == 0:
        print('哈哈1') else: print('哈哈2') 

operation result:

illustrate:

  • When the program executes to os.fork(), the operating system will create a new process (child process), and then copy all the information of the parent process to the child process
  • Then both the parent process and the child process will get a return value from the fork() function, which must be 0 in the child process, and the id number of the child process in the parent process

In Unix/Linux operating systems, a fork() system function is provided, which is very special.

Ordinary function calls, call once and return once, but fork() calls once and returns twice, because the operating system automatically copies the current process (called the parent process) (called the child process), and then separately in the Returns within parent and child processes.

The child process always returns 0, while the parent process returns the ID of the child process.

The reason for this is that a parent process can fork many child processes, so the parent process needs to write down the ID of each child process, and the child process only needs to call getppid() to get the ID of the parent process.

3. getpid()、getppid()


import os

rpid = os.fork()
if rpid<0:
    print("fork调用失败。")
elif rpid == 0: print("我是子进程(%s),我的父进程是(%s)"%(os.getpid(),os.getppid())) x+=1 else: print("我是父进程(%s),我的子进程是(%s)"%(os.getpid(),rpid)) print("父子进程都可以执行这里的代码") 

operation result:

我是父进程(19360),我的子进程是(19361)
父子进程都可以执行这里的代码
我是子进程(19361),我的父进程是(19360)
父子进程都可以执行这里的代码


Multi-process modifying global variables

#coding=utf-8
import os
import time

num = 0

# 注意,fork函数,只在Unix/Linux/Mac上运行,windows不可以 pid = os.fork() if pid == 0: num+=1 print('哈哈1---num=%d'%num) else: time.sleep(1) num+=1 print('哈哈2---num=%d'%num) 

operation result:

Summarize:

  • In multi-process, each process has its own copy of all data (including global variables) without affecting each other

Multiple fork problems

If in a program, there are 2 fork function calls, will there be 3 processes?


#coding=utf-8
import os
import time

# 注意,fork函数,只在Unix/Linux/Mac上运行,windows不可以
pid = os.fork()
if pid == 0: print('哈哈1') else: print('哈哈2') pid = os.fork() if pid == 0: print('哈哈3') else: print('哈哈4') time.sleep(1) 

operation result:

illustrate:

Execution order of parent and child processes

The execution order of the parent process and the child process is irregular and depends entirely on the scheduling algorithm of the operating system

multiprocessing

If you plan to write a multi-process service program, Unix/Linux is undoubtedly the right choice. Since Windows does not have a fork call, is it not possible to write multi-process programs in Python on Windows?

Since Python is cross-platform, it should naturally provide a cross-platform multi-process support. The multiprocessing module is the cross-platform version of the multiprocessing module.

The multiprocessing module provides a Process class to represent a process object. The following example demonstrates starting a child process and waiting for it to finish:

#coding=utf-8
from multiprocessing import Process
import os

# 子进程要执行的代码 def run_proc(name): print('子进程运行中,name= %s ,pid=%d...' % (name, os.getpid())) if __name__=='__main__': print('父进程 %d.' % os.getpid()) p = Process(target=run_proc, args=('test',)) print('子进程将要执行') p.start() p.join() print('子进程已结束') 

operation result:

illustrate

  • When creating a child process, you only need to pass in an execution function and function parameters, create a Process instance, and start it with the start() method. This way, creating a process is simpler than fork().
  • The join() method can wait for the child process to end before continuing to run, which is usually used for synchronization between processes.

The syntax structure of Process is as follows:

Process([group [, target [, name [, args [, kwargs]]]]])

  • target: Indicates the object called by this process instance;

  • args: A tuple of positional parameters representing the calling object;

  • kwargs: a dictionary of keyword arguments representing the calling object;

  • name: the alias of the current process instance;

  • group: not used in most cases;

Common methods of the Process class:

  • is_alive(): Determine whether the process instance is still executing;

  • join([timeout]): whether to wait for the execution of the process instance to end, or how many seconds to wait;

  • start(): start a process instance (create a child process);

  • run(): If the target parameter is not given, when the start() method is called on this object, the run() method in the object will be executed;

  • terminate(): terminates immediately regardless of whether the task is completed;

Common properties of the Process class:

  • name: alias of the current process instance, the default is Process-N, N is an integer starting from 1;

  • pid: the PID value of the current process instance;

Example 1


from multiprocessing import Process
import os
from time import sleep # 子进程要执行的代码 def run_proc(name, age, **kwargs): for i in range(10): print('子进程运行中,name= %s,age=%d ,pid=%d...' % (name, age,os.getpid())) print(kwargs) sleep(0.5) if __name__=='__main__': print('父进程 %d.' % os.getpid()) p = Process(target=run_proc, args=('test',18), kwargs={"m":20}) print('子进程将要执行') p.start() sleep(1) p.terminate() p.join() print('子进程已结束') 

operation result:

父进程 21378.
子进程将要执行
子进程运行中,name= test,age=18 ,pid=21379...
{'m': 20} 子进程运行中,name= test,age=18 ,pid=21379... {'m': 20} 子进程已结束 

Example 2


#coding=utf-8
from multiprocessing import Process
import time
import os #两个子进程将会调用的两个方法 def worker_1(interval): print("worker_1,父进程(%s),当前进程(%s)"%(os.getppid(),os.getpid())) t_start = time.time() time.sleep(interval) #程序将会被挂起interval秒 t_end = time.time() print("worker_1,执行时间为'%0.2f'秒"%(t_end - t_start)) def worker_2(interval): print("worker_2,父进程(%s),当前进程(%s)"%(os.getppid(),os.getpid())) t_start = time.time() time.sleep(interval) t_end = time.time() print("worker_2,执行时间为'%0.2f'秒"%(t_end - t_start)) #输出当前程序的ID print("进程ID:%s"%os.getpid()) #创建两个进程对象,target指向这个进程对象要执行的对象名称, #args后面的元组中,是要传递给worker_1方法的参数, #因为worker_1方法就一个interval参数,这里传递一个整数2给它, #如果不指定name参数,默认的进程对象名称为Process-N,N为一个递增的整数 p1=Process(target=worker_1,args=(2,)) p2=Process(target=worker_2,name="dongGe",args=(1,)) #使用"进程对象名称.start()"来创建并执行一个子进程, #这两个进程对象在start后,就会分别去执行worker_1和worker_2方法中的内容 p1.start() p2.start() #同时父进程仍然往下执行,如果p2进程还在执行,将会返回True print("p2.is_alive=%s"%p2.is_alive()) #输出p1和p2进程的别名和pid print("p1.name=%s"%p1.name) print("p1.pid=%s"%p1.pid) print("p2.name=%s"%p2.name) print("p2.pid=%s"%p2.pid) #join括号中不携带参数,表示父进程在这个位置要等待p1进程执行完成后, #再继续执行下面的语句,一般用于进程间的数据同步,如果不写这一句, #下面的is_alive判断将会是True,在shell(cmd)里面调用这个程序时 #可以完整的看到这个过程,大家可以尝试着将下面的这条语句改成p1.join(1), #因为p2需要2秒以上才可能执行完成,父进程等待1秒很可能不能让p1完全执行完成, #所以下面的print会输出True,即p1仍然在执行 p1.join() print("p1.is_alive=%s"%p1.is_alive()) 

Results of the:


进程ID:19866
p2.is_alive=True
p1.name=Process-1
p1.pid=19867
p2.name=dongGe
p2.pid=19868 worker_1,父进程(19866),当前进程(19867) worker_2,父进程(19866),当前进程(19868) worker_2,执行时间为'1.00'秒 worker_1,执行时间为'2.00'秒 p1.is_alive=False 

Process creation - Process subclass

To create a new process, you can also use the class method. You can customize a class and inherit the Process class. Every time you instantiate this class, it is equivalent to instantiating a process object. Please see the following example:

from multiprocessing import Process
import time
import os

#继承Process类 class Process_Class(Process): #因为Process类本身也有__init__方法,这个子类相当于重写了这个方法, #但这样就会带来一个问题,我们并没有完全的初始化一个Process类,所以就不能使用从这个类继承的一些方法和属性, #最好的方法就是将继承类本身传递给Process.__init__方法,完成这些初始化操作 def __init__(self,interval): Process.__init__(self) self.interval = interval #重写了Process类的run()方法 def run(self): print("子进程(%s) 开始执行,父进程为(%s)"%(os.getpid(),os.getppid())) t_start = time.time() time.sleep(self.interval) t_stop = time.time() print("(%s)执行结束,耗时%0.2f秒"%(os.getpid(),t_stop-t_start)) if __name__=="__main__": t_start = time.time() print("当前程序进程(%s)"%os.getpid()) p1 = Process_Class(2) #对一个不包含target属性的Process类执行start()方法,就会运行这个类中的run()方法,所以这里会执行p1.run() p1.start() p1.join() t_stop = time.time() print("(%s)执行结束,耗时%0.2f"%(os.getpid(),t_stop-t_start)) 

Process pool Pool

When the number of subprocesses to be created is not large, you can directly use the Process in multiprocessing to dynamically generate multiple processes, but if there are hundreds or even thousands of targets, the workload of manually creating processes is huge, and you can use to the Pool method provided by the multiprocessing module.

When initializing the Pool, you can specify a maximum number of processes. When a new request is submitted to the Pool, if the pool is not full, a new process will be created to execute the request; but if the number of processes in the pool is already When the specified maximum value is reached, the request will wait until a process in the pool ends, and a new process will be created for execution. Please see the following example:

from multiprocessing import Pool
import os,time,random

def worker(msg): t_start = time.time() print("%s开始执行,进程号为%d"%(msg,os.getpid())) #random.random()随机生成0~1之间的浮点数 time.sleep(random.random()*2) t_stop = time.time() print(msg,"执行完毕,耗时%0.2f"%(t_stop-t_start)) po=Pool(3) #定义一个进程池,最大进程数3 for i in range(0,10): #Pool.apply_async(要调用的目标,(传递给目标的参数元祖,)) #每次循环将会用空闲出来的子进程去调用目标 po.apply_async(worker,(i,)) print("----start----") po.close() #关闭进程池,关闭后po不再接收新的请求 po.join() #等待po中所有子进程执行完成,必须放在close语句之后 print("-----end-----") 

operation result:

----start----
0开始执行,进程号为21466
1开始执行,进程号为21468
2开始执行,进程号为21467 0 执行完毕,耗时1.01 3开始执行,进程号为21466 2 执行完毕,耗时1.24 4开始执行,进程号为21467 3 执行完毕,耗时0.56 5开始执行,进程号为21466 1 执行完毕,耗时1.68 6开始执行,进程号为21468 4 执行完毕,耗时0.67 7开始执行,进程号为21467 5 执行完毕,耗时0.83 8开始执行,进程号为21466 6 执行完毕,耗时0.75 9开始执行,进程号为21468 7 执行完毕,耗时1.03 8 执行完毕,耗时1.05 9 执行完毕,耗时1.69 -----end----- 

Analysis of common functions of multiprocessing.Pool:

  • apply_async(func[, args[, kwds]]) : use non-blocking method to call func (parallel execution, blocking mode must wait for the previous process to exit before executing the next process), args is the parameter list passed to func, kwds is passed a list of keyword arguments to func;

  • apply(func[, args[, kwds]]): call func in blocking mode

  • close(): Close the Pool so that it no longer accepts new tasks;

  • terminate(): terminates immediately regardless of whether the task is completed;

  • join(): The main process blocks, waiting for the exit of the child process, which must be used after close or terminate;

apply blocking

from multiprocessing import Pool
import os,time,random

def worker(msg): t_start = time.time() print("%s开始执行,进程号为%d"%(msg,os.getpid())) #random.random()随机生成0~1之间的浮点数 time.sleep(random.random()*2) t_stop = time.time() print(msg,"执行完毕,耗时%0.2f"%(t_stop-t_start)) po=Pool(3) #定义一个进程池,最大进程数3 for i in range(0,10): po.apply(worker,(i,)) print("----start----") po.close() #关闭进程池,关闭后po不再接收新的请求 po.join() #等待po中所有子进程执行完成,必须放在close语句之后 print("-----end-----") 

operation result:

0开始执行,进程号为21532
0 执行完毕,耗时1.91
1开始执行,进程号为21534 1 执行完毕,耗时1.72 2开始执行,进程号为21533 2 执行完毕,耗时0.50 3开始执行,进程号为21532 3 执行完毕,耗时1.27 4开始执行,进程号为21534 4 执行完毕,耗时1.05 5开始执行,进程号为21533 5 执行完毕,耗时1.60 6开始执行,进程号为21532 6 执行完毕,耗时0.25 7开始执行,进程号为21534 7 执行完毕,耗时0.63 8开始执行,进程号为21533 8 执行完毕,耗时1.21 9开始执行,进程号为21532 9 执行完毕,耗时0.60 ----start---- -----end----- 

Interprocess communication - Queue

Communication between processes is sometimes required, and the operating system provides many mechanisms to achieve inter-process communication.

1. Use of Queue

You can use the Queue of the multiprocessing module to realize data transfer between multiple processes. Queue itself is a message queue program. First, a small example is used to demonstrate the working principle of Queue:


#coding=utf-8
from multiprocessing import Queue
q=Queue(3) #初始化一个Queue对象,最多可接收三条put消息 q.put("消息1") q.put("消息2") print(q.full()) #False q.put("消息3") print(q.full()) #True #因为消息列队已满下面的try都会抛出异常,第一个try会等待2秒后再抛出异常,第二个Try会立刻抛出异常 try: q.put("消息4",True,2) except: print("消息列队已满,现有消息数量:%s"%q.qsize()) try: q.put_nowait("消息4") except: print("消息列队已满,现有消息数量:%s"%q.qsize()) #推荐的方式,先判断消息列队是否已满,再写入 if not q.full(): q.put_nowait("消息4") #读取消息时,先判断消息列队是否为空,再读取 if not q.empty(): for i in range(q.qsize()): print(q.get_nowait()) 

operation result:


False
True
消息列队已满,现有消息数量:3
消息列队已满,现有消息数量:3
消息1 消息2 消息3 

illustrate

When initializing the Queue() object (for example: q=Queue()), if the maximum number of messages that can be received is not specified in the parentheses, or the number is negative, then there is no upper limit on the number of acceptable messages (until the end of memory) ;

  • Queue.qsize(): Returns the number of messages contained in the current queue;

  • Queue.empty(): If the queue is empty, return True, otherwise False;

  • Queue.full(): If the queue is full, return True, otherwise False;

  • Queue.get([block[, timeout]]): Get a message in the queue, and then remove it from the queue, the default value of block is True;

1) If the default value of block is used and the timeout (in seconds) is not set, if the message queue is empty, the program will be blocked (stop in the reading state) until the message is read from the message queue, if the timeout is set , it will wait for timeout seconds, and if no message has been read, a "Queue.Empty" exception will be thrown;

2) If the block value is False, if the message queue is empty, the "Queue.Empty" exception will be thrown immediately;

  • Queue.get_nowait():相当Queue.get(False);

  • Queue.put(item,[block[, timeout]]): Write the item message to the queue, the default value of block is True;

1) If the default value of block is used and the timeout (in seconds) is not set, if the message queue has no space to write, the program will be blocked (stop in the writing state) until space is vacated from the message queue. If timeout is set, it will wait for timeout seconds, and if there is no space, a "Queue.Full" exception will be thrown;

2) If the block value is False, if there is no space to write in the message queue, the "Queue.Full" exception will be thrown immediately;

  • Queue.put_nowait(item):相当Queue.put(item, False);

2. Queue instance

Let's take Queue as an example, create two child processes in the parent process, one writes data to the Queue, and the other reads data from the Queue:

from multiprocessing import Process, Queue
import os, time, random

# 写数据进程执行的代码:
def write(q): for value in ['A', 'B', 'C']: print 'Put %s to queue...' % value q.put(value) time.sleep(random.random()) # 读数据进程执行的代码: def read(q): while True: if not q.empty(): value = q.get(True) print 'Get %s from queue.' % value time.sleep(random.random()) else: break if __name__=='__main__': # 父进程创建Queue,并传给各个子进程: q = Queue() pw = Process(target=write, args=(q,)) pr = Process(target=read, args=(q,)) # 启动子进程pw,写入: pw.start() # 等待pw结束: pw.join() # 启动子进程pr,读取: pr.start() pr.join() # pr进程里是死循环,无法等待其结束,只能强行终止: print '' print '所有数据都写入并且读完' 

operation result:

3. Queue in the process pool

If you want to use Pool to create a process, you need to use Queue() in multiprocessing.Manager() instead of multiprocessing.Queue(), otherwise you will get an error message like the following:

RuntimeError: Queue objects should only be shared between processes through inheritance.

The following example demonstrates how processes in a process pool communicate:


#coding=utf-8

#修改import中的Queue为Manager
from multiprocessing import Manager,Pool
import os,time,random def reader(q): print("reader启动(%s),父进程为(%s)"%(os.getpid(),os.getppid())) for i in range(q.qsize()): print("reader从Queue获取到消息:%s"%q.get(True)) def writer(q): print("writer启动(%s),父进程为(%s)"%(os.getpid(),os.getppid())) for i in "dongGe": q.put(i) if __name__=="__main__": print("(%s) start"%os.getpid()) q=Manager().Queue() #使用Manager中的Queue来初始化 po=Pool() #使用阻塞模式创建进程,这样就不需要在reader中使用死循环了,可以让writer完全执行完成后,再用reader去读取 po.apply(writer,(q,)) po.apply(reader,(q,)) po.close() po.join() print("(%s) End"%os.getpid()) 

operation result:


(21156) start
writer启动(21162),父进程为(21156)
reader启动(21162),父进程为(21156) reader从Queue获取到消息:d reader从Queue获取到消息:o reader从Queue获取到消息:n reader从Queue获取到消息:g reader从Queue获取到消息:G reader从Queue获取到消息:e (21156) End 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325171415&siteId=291194637