How to use the Python multiprocessing module

In this article [1] , we will learn how to use a specific Python class (process class) from the multiprocessing module. I'll give you a quick overview with examples.

What is a multiprocessing module?

What better way to describe a module than pulling it from its official documentation? Multiprocessingis a package that supports spawning processes using an API similar to the thread module. The multiprocessing package provides local and remote concurrency, effectively sidestepping the global interpreter lock by using subprocesses instead of threads.

The threading module is not the focus of this article, but in summary, the threading module will handle a small piece of code execution (lightweight and with shared memory), while the multiprocessing module will handle program execution (heavier and fully isolated).

In general, the multiprocessing module provides various other classes, functions, and utilities that can be used to handle multiple processes executed during program execution. This module is specifically designed to be the main point of interaction if a program needs to apply parallelism in its workflow. We will not discuss all the classes and utilities in the multiprocessing module, but will focus on a very specific class, the process class.

What is a process class?

In this section, we will try to provide a better introduction to what a process is and how to identify, use, and manage it in Python. As GNU Cthe library explains: "A process is the basic unit for allocating system resources. Each process has its own address space and (usually) a thread of control. A process executes a program; multiple processes can execute the same program. , but each process has its own copy of the program within its own address space and executes it independently of the other copies."

But what does this look like in Python? So far we have managed to get some description and reference to what a process is, the difference between a process and a thread, but so far we haven't touched any code. Okay, let's change things up and do a very simple process example in Python:

#!/usr/bin/env python
import os

# A very, very simple process.
if __name__ == "__main__":
    print(f"Hi! I'm process {os.getpid()}")

This will produce the following output:

[r0x0d@fedora ~]$ python /tmp/tmp.iuW2VAurGG/scratch.py
Hi! I'm process 144112

As you can see, any running Python script or program is a process of its own.

Create child process

So what about spawning different child processes within the parent process? Well, to do that we need the help of Process class from the multiprocessing module, it looks like this:

#!/usr/bin/env python
import os
import multiprocessing

def child_process():
    print(f"Hi! I'm a child process {os.getpid()}")

if __name__ == "__main__":
    print(f"Hi! I'm process {os.getpid()}")

    # Here we create a new instance of the Process class and assign our
    # `child_process` function to be executed.
    process = multiprocessing.Process(target=child_process)

    # We then start the process
    process.start()

    # And finally, we join the process. This will make our script to hang and
    # wait until the child process is done.
    process.join()

This will produce the following output:

[r0x0d@fedora ~]$ python /tmp/tmp.iuW2VAurGG/scratch.py
Hi! I'm process 144078
Hi! I'
m a child process 144079

关于上一个脚本的一个非常重要的注意事项:如果您不使用 process.join() 来等待子进程执行并完成,那么该点的任何其他后续代码将实际执行,并且可能会变得有点难以同步您的工作流程。

考虑以下示例:

#!/usr/bin/env python
import os
import multiprocessing

def child_process():
    print(f"Hi! I'm a child process {os.getpid()}")

if __name__ == "__main__":
    print(f"Hi! I'm process {os.getpid()}")

    # Here we create a new instance of the Process class and assign our
    # `child_process` function to be executed.
    process = multiprocessing.Process(target=child_process)

    # We then start the process
    process.start()

    # And finally, we join the process. This will make our script to hang and
    # wait until the child process is done.
    #process.join()

    print("AFTER CHILD EXECUTION! RIGHT?!")

该代码片段将产生以下输出:

[r0x0d@fedora ~]$ python /tmp/tmp.iuW2VAurGG/scratch.py
Hi! I'm process 145489
AFTER CHILD EXECUTION! RIGHT?!
Hi! I'
m a child process 145490

当然,断言上面的代码片段是错误的也是不正确的。这完全取决于您想要如何使用该模块以及您的子进程将如何执行。所以要明智地使用它。

创建各种子进程

如果要生成多个进程,可以利用 for 循环(或任何其他类型的循环)。它们将允许您创建对所需流程的尽可能多的引用,并在稍后阶段启动/加入它们。

#!/usr/bin/env python
import os
import multiprocessing

def child_process(id):
    print(f"Hi! I'm a child process {os.getpid()} with id#{id}")

if __name__ == "__main__":
    print(f"Hi! I'm process {os.getpid()}")
    list_of_processes = []

    # Loop through the number 0 to 10 and create processes for each one of
    # them.
    for i in range(010):
        # Here we create a new instance of the Process class and assign our
        # `child_process` function to be executed. Note the difference now that
        # we are using the `args` parameter now, this means that we can pass
        # down parameters to the function being executed as a child process.
        process = multiprocessing.Process(target=child_process, args=(i,))
        list_of_processes.append(process)

    for process in list_of_processes:
        # We then start the process
        process.start()

        # And finally, we join the process. This will make our script to hang
        # and wait until the child process is done.
        process.join()

这将产生以下输出:

[r0x0d@fedora ~]$ python /tmp/tmp.iuW2VAurGG/scratch.py
Hi! I'm process 146056
Hi! I'
m a child process 146057 with id#0
Hi! I'm a child process 146058 with id#1
Hi! I'
m a child process 146059 with id#2
Hi! I'm a child process 146060 with id#3
Hi! I'
m a child process 146061 with id#4
Hi! I'm a child process 146062 with id#5
Hi! I'
m a child process 146063 with id#6
Hi! I'm a child process 146064 with id#7
Hi! I'
m a child process 146065 with id#8
Hi! I'm a child process 146066 with id#9

数据通信

在上一节中,我描述了向 multiprocessing.Process 类构造函数添加一个新参数 args。此参数允许您将值传递给子进程以在函数内部使用。但你知道如何从子进程返回数据吗?

您可能会认为,要从子级返回数据,必须使用其中的 return 语句才能真正检索数据。进程非常适合以隔离的方式执行函数,而不会干扰共享资源,这意味着我们知道从函数返回数据的正常且常用的方式。在这里,由于其隔离而不允许。

相反,我们可以使用队列类,它将为我们提供一个在父进程与其子进程之间通信数据的接口。在这种情况下,队列是一个普通的 FIFO(先进先出),具有用于处理多处理的内置机制。

考虑以下示例:

#!/usr/bin/env python
import os
import multiprocessing

def child_process(queue, number1, number2):
    print(f"Hi! I'm a child process {os.getpid()}. I do calculations.")
    sum = number1 + number2

    # Putting data into the queue
    queue.put(sum)

if __name__ == "__main__":
    print(f"Hi! I'm process {os.getpid()}")

    # Defining a new Queue()
    queue = multiprocessing.Queue()

    # Here we create a new instance of the Process class and assign our
    # `child_process` function to be executed. Note the difference now that
    # we are using the `args` parameter now, this means that we can pass
    # down parameters to the function being executed as a child process.
    process = multiprocessing.Process(target=child_process, args=(queue,12))

    # We then start the process
    process.start()

    # And finally, we join the process. This will make our script to hang and
    # wait until the child process is done.
    process.join()

    # Accessing the result from the queue.
    print(f"Got the result from child process as {queue.get()}")

它将给出以下输出:

[r0x0d@fedora ~]$ python /tmp/tmp.iuW2VAurGG/scratch.py
Hi! I'm process 149002
Hi! I'
m a child process 149003. I do calculations.
Got the result from child process as 3

异常处理

处理异常是一项特殊且有些困难的任务,我们在使用流程模块时必须不时地完成它。原因是,默认情况下,子进程内发生的任何异常将始终由生成它的 Process 类处理。

下面的代码引发带有文本的异常:

#!/usr/bin/env python
import os
import multiprocessing

def child_process():
    print(f"Hi! I'm a child process {os.getpid()}.")
    raise Exception("Oh no! :(")

if __name__ == "__main__":
    print(f"Hi! I'm process {os.getpid()}")

    # Here we create a new instance of the Process class and assign our
    # `child_process` function to be executed. Note the difference now that
    # we are using the `args` parameter now, this means that we can pass
    # down parameters to the function being executed as a child process.
    process = multiprocessing.Process(target=child_process)

    try:
        # We then start the process
        process.start()

        # And finally, we join the process. This will make our script to hang and
        # wait until the child process is done.
        process.join()

        print("AFTER CHILD EXECUTION! RIGHT?!")
    except Exception:
        print("Uhhh... It failed?")

输出结果:

[r0x0d@fedora ~]$ python /tmp/tmp.iuW2VAurGG/scratch.py
Hi! I'm process 149505
Hi! I'
m a child process 149506.
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib64/python3.11/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib64/python3.11/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/tmp/tmp.iuW2VAurGG/scratch.py", line 7, in child_process
    raise Exception("Oh no! :(")
Exception: Oh no! :(
AFTER CHILD EXECUTION! RIGHT?!

如果您跟踪代码,您将能够注意到在 process.join() 调用之后仔细放置了一条 print 语句,以模拟父进程仍在运行,即使在子进程中引发了未处理的异常之后也是如此。

克服这种情况的一种方法是在子进程中实际处理异常,如下所示:

#!/usr/bin/env python
import os
import multiprocessing

def child_process():
    try:
        print(f"Hi! I'm a child process {os.getpid()}.")
        raise Exception("Oh no! :(")
    except Exception:
        print("Uh, I think it's fine now...")

if __name__ == "__main__":
    print(f"Hi! I'm process {os.getpid()}")

    # Here we create a new instance of the Process class and assign our
    # `child_process` function to be executed. Note the difference now that
    # we are using the `args` parameter now, this means that we can pass
    # down parameters to the function being executed as a child process.
    process = multiprocessing.Process(target=child_process)

    # We then start the process
    process.start()

    # And finally, we join the process. This will make our script to hang and
    # wait until the child process is done.
    process.join()

    print("AFTER CHILD EXECUTION! RIGHT?!")

现在,您的异常将在您的子进程内处理,这意味着您可以控制它会发生什么以及在这种情况下应该做什么。

总结

当工作和实现依赖于并行方式执行的解决方案时,多处理模块非常强大,特别是与 Process 类一起使用时。这增加了在其自己的隔离进程中执行任何函数的惊人可能性。

Reference

[1]

Source: https://developers.redhat.com/articles/2023/07/27/how-use-python-multiprocessing-module

本文由 mdnice 多平台发布

Guess you like

Origin blog.csdn.net/swindler_ice/article/details/132663046