Python subprocess ThreadPool elegant task management solution records

What I want to achieve: Similar to the download task management program, a maximum of 5 download tasks are downloaded at the same time. When there are 20 download tasks, only 5 are downloaded at the same time. Wait for the 5 downloading tasks to complete one, and then start the next new download task .

Cause: I have a python program that downloads the video when the mouse clicks on the video during runtime. I use subprocess to execute the download program, so as to realize the multi-process download at the same time.
But there is a problem with this. When I click on the video frequently, the previous download task is not completed, and the subsequent download task is activated, and a lot of download programs will be executed at the same time in the process, which causes the computer to freeze very much. And many download programs are terminated because of this. This is not elegant.
So I want the python program to run up to n (such as 5) download programs. If 5 download programs have already been run, my subsequent video click event will not activate the download program immediately, but wait for the first 5 download programs to complete. One, then start, and so on, subsequent clicks will queue up to wait for the first 5 downloaded videos to complete one of them before starting to download.

What I think of is the queue, the queue is only 5 blocks long, the download is completed, and then the tasks waiting in the queue are added to the queue, but I don't know how to simulate the effect of "waiting in line", and I don't think the queue is fully compatible My idea, because the queue must be early in and early out, but it may be downloaded later than the earlier ones.

Searching for the keywords queue and subprocess on Baidu failed to find the answer. There seem to be very few posts about subprocess in China. So I googled and found a post that fit my needs exactly.
Link: Stackflow: Python multiple subprocess with a pool/queue recover output as soon as one finishes and launch next job in queue?
The question of this post is probably: I use python's subprocess to create multiple processes, and I want to combine pool (pool) or queue (queue) to achieve: when a task is completed, the next task joins the queue (queue) and starts execution.

The answers with the highest votes were as follows:

ThreadPool could be a good fit for your problem, you set the number of worker threads and add jobs, and the threads will work their way through all the tasks.

A ThreadPool can fit the needs of your problem very well, you can set the maximum number of worker threads and add tasks, and these threads will complete all tasks in their own way.

The source code attached to this answer is as follows

from multiprocessing.pool import ThreadPool
import subprocess


def work(sample):
    my_tool_subprocess = subprocess.Popen('mytool {}'.format(sample),shell=True, stdout=subprocess.PIPE)
    line = True
    while line:
        myline = my_tool_subprocess.stdout.readline()
        #here I parse stdout..


num = None  # set to the number of workers you want (it defaults to the cpu count of your machine)
tp = ThreadPool(num)
for i in all_samples:
    tp.apply_async(work, (sample,))

tp.close()
tp.join()

ThreadPool, thread pool.
Although it is called "thread", it has nothing to do with multi-process or multi-thread. I use subprocess, that is, multi-process. ThreadPool is just a task manager. I only need to pay attention to these tasks and the maximum number of tasks executed at the same time. The rest, Just give it to him and it's done~
Then I changed it to my code

# 伪代码

def startDownload(url):
    subprocess.Popen("start python download.py "+url, shell=True)


threadManager = ThreadPool(5)

While 点击视频:
	url = 获取点击的视频的链接()
	threadManager.apply_async(startDownload, url)

Next I did an experiment with a control variable.
Experimental conditions, threadManager = ThreadPool(3), initiate 5 tasks

Multi-process initiation method Is there a start Whether the shell is true download.py initiation method Whether download.py wait() thread with or without async Thread with or without close and join result resultOftheExperiment
os.system none * subprocess.Popen wait() have have The goal is achieved, but the output is confusing. Press the close button when the first three processes are not completed, the entire program is not closed, the first three tasks seem to be closed, start the fourth task and the fifth task, and press it again Close button to close the program
os.system start * subprocess.Popen wait() have have 5 processes are opened in the form of new windows at the same time, the main process is closed, the child process will not be closed, it seems to be "independent"
subprocess.Popen none false subprocess.Popen wait() have have 5 processes run at the same time, the output is confusing, press the close button to close the whole program directly
subprocess.Popen none true subprocess.Popen wait() have have 5 processes run at the same time, the output is confusing, press the close button to close the whole program directly
subprocess.Popen start false subprocess.Popen wait() have have The download task cannot be initiated at all, no error is reported, and there is no response
subprocess.Popen start true subprocess.Popen wait() have have 5 processes are opened in the form of new windows at the same time, the main process is closed, the child process will not be closed, it seems to be "independent"

It seems that the os.system method should be used to initiate the process, and start cannot be added to open a new window.
Then I run a new round of experiments

Multi-process initiation method Is there a start Whether the shell is true download.py initiation method Whether download.py wait() thread with or without async Thread with or without close and join result resultOftheExperiment
os.system none * subprocess.Popen wait() have have The goal is achieved, but the output is confusing. Press the close button when the first three processes are not completed, the entire program is not closed, the first three tasks seem to be closed, start the fourth task and the fifth task, and press it again Close button to close the program
os.system none * subprocess.Popen wait() none have Only one download task is started, and the monitoring of the click video event is blocked, but it is actually still listening. After the download task ends, the next download task will be started automatically
os.system none * subprocess.Popen wait() have none To achieve the purpose, it is the same as [1]. Because my click video event monitoring function will not end by itself, so I don't think there is any need for close and join to block the end of the main process

It seems that apply_async must be used, if the main process itself will not close, there is no need for threadManager.close() and threadManager.join() to let the main process wait for the child process to complete the task.
Then I run a new round of experiments

Multi-process initiation method Whether the shell is true download.py initiation method whether to start Whether download.py wait() result resultOftheExperiment
os.system * subprocess.Popen none wait() The goal is achieved, but the output is confusing. Press the close button when the first three processes are not completed, the entire program is not closed, the first three tasks seem to be closed, start the fourth task and the fifth task, and press it again Close button to close the program
os.system * subprocess.Popen none none Executing 5 download tasks at the same time, the output is confusing
os.system * subprocess.Popen start wait() 5 processes are opened in the form of new windows at the same time, the main process is closed, the child process will not be closed, it seems to be "independent"
os.system * subprocess.Popen start none 5 processes are opened in the form of new windows at the same time, the main process is closed, the child process will not be closed, it seems to be "independent"
os.system * os.system none * The goal is achieved, but the output is confusing. Press the close button when the first three processes are not completed, the entire program is not closed, the first three tasks seem to be closed, start the fourth task and the fifth task, and press it again Close button to close the program
os.system * os.system start * 5 processes are opened in the form of new windows at the same time, the main process is closed, the child process will not be closed, it seems to be "independent"

If you use subprocess.popen, you must add wait to block. If you use os.system, you can naturally block. You can’t add start, add start, and the successful program of start means the end.

Summary: use apply_async and os.system, and do not add start to meet the requirements.

# 伪代码

def startDownload(url):
    os.system("start python download.py "+url)


threadManager = ThreadPool(5)

While 点击视频:
	url = 获取点击的视频的链接()
	threadManager.apply_async(startDownload, url)

What will happen if the main process does not use os.system but subprocess and wait
Then I did the following experiment

Multi-process initiation method Is there a start Whether the shell is true result resultOftheExperiment
os.system none * The goal is achieved, but the output is confusing. Press the close button when the first three processes are not completed, the entire program is not closed, the first three tasks seem to be closed, start the fourth task and the fifth task, and press it again Close button to close the program
subprocess.Popen+wait none have The goal is achieved, but the output is confusing. Press the close button when the first three processes are not completed, the entire program is not closed, the first three tasks seem to be closed, start the fourth task and the fifth task, and press it again Close button to close the program

It seems that a wait is added to let the function wait for the subprocess to complete, and the blocking effect can also be achieved. So I guess subprocess does not add wait. Once the work of "creating a process" is completed, ThreadPool thinks that the execution of this function is over, so the price wait can also achieve the effect.

But the current output is still confusing. Why did you study start before? It is because start can open a new window and execute python commands, but you can also see from the previous experiment that after adding start, these sub-processes are independent and separated from the main process. , because the main process is closed, the child process is still executing, so it can be inferred that start itself calls the start of the system, and then starts the download program, rather than the main process directly calling the download program.

We can also try the combination of subprocess+wait blocking+start, and also find that the subprocess is independent and 5 tasks are downloaded at the same time.

So we urgently need to find a way to open a new window to run the download program, and also allow the download program to belong to the ThreadPool and be managed.

I found this article on google
Stackflow: subprocess.Popen in different console
The most voted code is:

from subprocess import Popen, CREATE_NEW_CONSOLE

Popen('cmd', creationflags=CREATE_NEW_CONSOLE)

input('Enter to exit from Python script...')

I added a wait and transformed it into my code

# 伪代码
from subprocess import Popen, CREATE_NEW_CONSOLE
from multiprocessing.pool import ThreadPool


def startDownload(url):
    command = "start python download.py "+url
	p = Popen(command, creationflags=CREATE_NEW_CONSOLE)
	p.wait()

threadManager = ThreadPool(5)

While 点击视频:
	url = 获取点击的视频的链接()
	threadManager.apply_async(startDownload, url)

And download.py uses os.system or subprocess+wait,
which perfectly meets my needs.
Download task management, output progress in windows.

Elegant and timeless

So to sum up,
the factors involved this time are
os.system()
poepn+CREATE_NEW_CONSOLE+wait
subprocess.popen+wait
command contains start
first os.system is blocked
subprocess.popen+wait is blocked
but as long as both Adding "start" will not block, because they execute start, which is equivalent to "asking the system to start" instead of starting it by themselves. After they call, their work is gone. This is why start can open a new window.

Guess you like

Origin blog.csdn.net/weixin_45518621/article/details/126584946