[Homemade gadget] Python concurrent executor

Project address: https://github.com/qmj0923/Concurrent-Executor-for-Python

need

  1. Call the same function with different parameters (similar to Python's built-in map() function) to shorten the running time through concurrent execution.
  2. Automatically save the execution results of subtasks midway. If the entire program is interrupted, it can continue from where it was interrupted when it is re-executed.
  3. Display a progress bar to observe the progress of concurrent execution.

existing tools

project framework

The implemented framework can be applied to any callable function. foo()A random function is written here , assuming we need to execute it foo(0), foo(1), ..., foo(4999). Then you can apply the framework as follows to start concurrent execution.

import time
from concurrent_executor import ConcurrentExecutor

def foo(x):
    if x % 300 == 7:
        raise ValueError('foo')
    time.sleep(0.01)
    return x

if __name__ == '__main__':
    executor = ConcurrentExecutor()
    result = executor.run(
        data=[[i] for i in range(5000)],
        func=foo,
        output_dir='data/',
    )

Here is a detailed explanation of ConcurrentExecutor.run()the three parameters passed to .

  • funcThe parameter is the function that needs to be called multiple times.
  • dataThe parameter is a list, each element of which will be functhe input of the function (that is, functhe list of actual parameters). In other words, this framework is actually executing func(*data[0]), foo(*data[1]), ….
  • output_dirThe argument is a directory to store intermediate files for recovery after an interruption.

For more parameters, see the comments in the ConcurrentExecutor.run() function, so I won’t go into details here.

running result

Like tqdm.contrib.concurrent.process_map , the total progress can be displayed on the command line:

2023-06-18 14:47:20,987 INFO     0%|          | 0/5 [00:00<?, ?it/s]
2023-06-18 14:47:37,224 INFO     20%|##        | 1/5 [00:16<01:04, 16.24s/it]
2023-06-18 14:47:37,340 INFO     40%|####      | 2/5 [00:16<00:20,  6.75s/it]
2023-06-18 14:47:37,342 INFO     100%|##########| 5/5 [00:16<00:00,  3.27s/it]

_tmp/A folder is created under the output directory . Among them, log_<i>.logthe progress of subtasks is recorded in real time, part_<i>.jsonwhich is used to save the execution results of subtasks:

<output_dir>
└── _tmp
    ├── log_<i>.log
    └── part_<i>.json

Guess you like

Origin blog.csdn.net/Qmj2333333/article/details/131189617