About python concurrent.futures

1. What is future?
1. The standard library says, The Future class encapsulates the asynchronous execution of a callable.

Fluent python puts it this way, the future encapsulates the operation to be completed, it can be put into the queue, the completed status can be queried, and the result (or exception) can be obtained after getting the result (or throwing an exception).
Andao's translation of future is still very vivid, and the quality of the books translated by Andao is still guaranteed.

2. From the original code, the Future object has 5 states, encapsulating a lock, a state value, a result, an exception, and a callback function. The status is queried by canceled, running, and done, respectively.
Set the result and exception through set_result and set_exception, and trigger the callback function. The callback function has only one parameter, the future itself. The result of the future binding operation can be obtained in the callback function.
Obtain the final result through result, and raise the exception if there is an exception. The exception can be obtained through exception, and the exception is not raised.
The concurrent.futures.ThreadPoolExecutor sets the result, the abnormal thread and the thread that gets the result are no longer a thread. At this time, the lock of self._condition comes into play, and this lock is also before the operation bound by the future is completed. The reason for blocking when getting the result through the result() method.
3. In concurrent.futures, it is not so much that a future encapsulates an operation, but that each future is bound to an operation.

Note: When an exception occurs inside the thread pool, the exception will not be raised directly, but the exception will be temporarily encapsulated into the future through the set_exception() method of futures. When the operation encapsulated by the future is completed, obtaining the result through its result() method will raise an exception that occurs inside the thread pool.

2. ThreadPoolExecutor, where to create threads and how to control the number of threads.
1. ThreadPoolExecutor has a task queue and a set that saves thread objects.
2. It can be seen in the init method that the default maximum number of threads in the thread pool is ( cpu number * 5 )
if max_workers is None:
max_workers = (os. cpu_count() or 1) * 5
3. Bind a future object and an operation to a _WorkItem task in the submit method, and put the result and exception of the fn operation in the corresponding future in the run method , each fn corresponds to a future, and submit returns this future. Therefore, you can query the status of fn through futrue outside the thread pool, and obtain the result of fn or its exception. After that, the object of _WorkItem is dropped into the task queue.
4. Determine the number of threads in submit. If the number of threads does not reach the maximum number of threads, create a new thread. The new thread target is _worker, and the task of _worker is to take out the _WorkItem in the task queue and run().

class _WorkItem(object):
    def __init__(self, future, fn, args, kwargs):
        self.future = future
        self.fn = fn
        self.args = args
        self.kwargs = kwargs

    def run(self):
        if not self.future.set_running_or_notify_cancel():
            return

        try:
            result = self.fn(*self.args, **self.kwargs)
        except BaseException as e:
            self.future.set_exception(e)
        else:
            self.future.set_result(result)
def _worker(executor_reference, work_queue):
    try:
        while True:
            work_item = work_queue.get(block=True)
            if work_item is not None:
                work_item.run()
                # Delete references to object. See issue16284
                del work_item
                continue
            executor = executor_reference()
            # Exit if:
            #   - The interpreter is shutting down OR
            #   - The executor that owns the worker has been collected OR
            #   - The executor that owns the worker has been shutdown.
            if _shutdown or executor is None or executor._shutdown:
                # Notice other workers
                work_queue.put(None)
                return
            del executor
    except BaseException:
        _base.LOGGER.critical('Exception in worker', exc_info=True)
class ThreadPoolExecutor(_base.Executor):
    def submit(self, fn, *args, **kwargs):
        with self._shutdown_lock:
            if self._shutdown:
                raise RuntimeError('cannot schedule new futures after shutdown')

            f = _base.Future()
            w = _WorkItem(f, fn, args, kwargs)

            self._work_queue.put(w)
            self._adjust_thread_count()
            return f
    submit.__doc__ = _base.Executor.submit.__doc__

    def _adjust_thread_count(self):
        # When the executor gets lost, the weakref callback will wake up
        # the worker threads.
        def weakref_cb(_, q=self._work_queue):
            q.put(None)
        # TODO(bquinlan): Should avoid creating new threads if there are more
        # idle threads than items in the work queue.
        if len(self._threads) < self._max_workers:
            t = threading.Thread(target=_worker,
                                 args=(weakref.ref(self, weakref_cb),
                                       self._work_queue))
            t.daemon = True
            t.start()
            self._threads.add(t)
            _threads_queues[t] = self._work_queue

3. Why does Executor.map return parameters in an iterable order, while as_completed returns the completed future first?

1. Executor implements the map method, and ThreadPoolExecutor inherits it.
2. The map method of Executor first calls the submit method, throws the corresponding operation fn into the thread pool, and obtains a future List, and then sequentially obtains the results in the corresponding future by iterating over the list.
3. as_completed is to first query the status of all fs (futures), and then return the completed futures. The client code will first obtain the completed futures, then continuously check and obtain the completed futures, and then return, so it is the same as submitting the task. Order-independent returns completed tasks first.

4. The code at the end of the article is the comparison of the corresponding map and as_completed effects.


class Executor(object):
    def map(self, fn, *iterables, timeout=None, chunksize=1):

        if timeout is not None:
            end_time = timeout + time.time()

        fs = [self.submit(fn, *args) for args in zip(*iterables)]

        # Yield must be hidden in closure so that the futures are submitted
        # before the first iterator value is required.
        def result_iterator():
            try:
                for future in fs:
                    if timeout is None:
                        yield future.result()
                    else:
                        yield future.result(end_time - time.time())
            finally:
                for future in fs:
                    future.cancel()
        return result_iterator()

4. Why do futures block in Executor.map and as_completed methods when they are not completed?
1. The self._condition.wait(timeout) lock is the reason for blocking. The client code that calls result and the thread pool code that calls set_result and set_exception are not in the same thread. Only after the task corresponding to the future is completed, the thread pool The thread in set_result and self._condition.notify_all() in set_exception re-awakens the client code thread of wait. At this time, the blocking is released and the corresponding completed future is obtained.

class Future(object):
    def result(self, timeout=None):
        with self._condition:
            if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]:
                raise CancelledError()
            elif self._state == FINISHED:
                return self.__get_result()

            self._condition.wait(timeout)

            if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]:
                raise CancelledError()
            elif self._state == FINISHED:
                return self.__get_result()
            else:
                raise TimeoutError()

Attachment: test code

from concurrent.futures import ThreadPoolExecutor
from concurrent.futures import as_completed
import requests
from requests.exceptions import ConnectionError
from functools import partial
from os import cpu_count


def get_url(url):
    try:
        r = requests.get(url)
    except ConnectionError:
        raise ConnectionError('检查网络链接！')

    return url, r.status_code

URLS = [
    'https://my.oschina.net/u/2255341/blog',
    'https://github.com/timeline.json',
    'http://www.oschina.net/',
]


if __name__ == '__main__':
    # get_url('https://github.com/timeline.json')
    executor = ThreadPoolExecutor(max_workers=2)
    for res in executor.map(get_url, URLS):
        print(res)

    print('------------------------------------------------')
    for future in as_completed(map(partial(executor.submit, get_url), URLS)):
        res = future.result()
        print(res)

(py3env) ➜ concurrent_futures git:(master) ✗ python download.py

('https://my.oschina.net/u/2255341/blog', 403)

('https://github.com/timeline.json', 410)

('http://www.oschina.net/', 403)

------------------------------------------------

('https://my.oschina.net/u/2255341/blog', 403)

('http://www.oschina.net/', 403)

('https://github.com/timeline.json', 410)

Code address: https://github.com/kagxin/recipe/blob/master/concurrent_futures/download.py

About python concurrent.futures

Guess you like