Project address: https://github.com/qmj0923/Concurrent-Executor-for-Python
Table of contents
need
- Call the same function with different parameters (similar to Python's built-in map() function) to shorten the running time through concurrent execution.
- Automatically save the execution results of subtasks midway. If the entire program is interrupted, it can continue from where it was interrupted when it is re-executed.
- Display a progress bar to observe the progress of concurrent execution.
existing tools
- concurrent.futures.Executor.map : Satisfies requirement 1, but not requirements 2 and 3.
- tqdm.contrib.concurrent.process_map : Satisfy requirement 1. Requirement 2 is not met. Requirement 3 shows the overall progress bar, but cannot observe the execution progress of each subtask.
project framework
The implemented framework can be applied to any callable function. foo()
A random function is written here , assuming we need to execute it foo(0), foo(1), ..., foo(4999)
. Then you can apply the framework as follows to start concurrent execution.
import time
from concurrent_executor import ConcurrentExecutor
def foo(x):
if x % 300 == 7:
raise ValueError('foo')
time.sleep(0.01)
return x
if __name__ == '__main__':
executor = ConcurrentExecutor()
result = executor.run(
data=[[i] for i in range(5000)],
func=foo,
output_dir='data/',
)
Here is a detailed explanation of ConcurrentExecutor.run()
the three parameters passed to .
func
The parameter is the function that needs to be called multiple times.data
The parameter is a list, each element of which will befunc
the input of the function (that is,func
the list of actual parameters). In other words, this framework is actually executingfunc(*data[0])
,foo(*data[1])
, ….output_dir
The argument is a directory to store intermediate files for recovery after an interruption.
For more parameters, see the comments in the ConcurrentExecutor.run() function, so I won’t go into details here.
running result
Like tqdm.contrib.concurrent.process_map , the total progress can be displayed on the command line:
2023-06-18 14:47:20,987 INFO 0%| | 0/5 [00:00<?, ?it/s]
2023-06-18 14:47:37,224 INFO 20%|## | 1/5 [00:16<01:04, 16.24s/it]
2023-06-18 14:47:37,340 INFO 40%|#### | 2/5 [00:16<00:20, 6.75s/it]
2023-06-18 14:47:37,342 INFO 100%|##########| 5/5 [00:16<00:00, 3.27s/it]
_tmp/
A folder is created under the output directory . Among them, log_<i>.log
the progress of subtasks is recorded in real time, part_<i>.json
which is used to save the execution results of subtasks:
<output_dir>
└── _tmp
├── log_<i>.log
└── part_<i>.json