Python: Use Future, asyncio handle concurrent

Concurrent meaning

In order to efficiently handle network I / O, need to use concurrency, because the network has high latency, so in order not to waste CPU cycles to wait, it is best to do other things before receiving network response.

In the I / O-intensive applications, if code is written correctly, regardless of concurrency strategy (using threads or asyncio package) in what is much higher throughput than the sequential execution of the code.

Concurrency means one thing to deal with and more. Parallel refers to the time to do more things. About structure, about a execution.

Parallel is the concept that do more than one thing at the same time, while concurrent threads are produced under this model we usually think.

Concurrent represent more than one thing happening at the same time by switching the time slot, even if only a single core can be achieved "while doing more than one thing," this effect.

The underlying concept of whether multiple processors concurrently and in parallel can be equivalent, these two not mutually exclusive.

For example encounter we have developed, we say that the number of concurrent resource requests reached 10,000. The meaning here is that there are 10,000 requests at the same time coming. But here it is not really clear at the same time to deal with these 10,000 requests it!

If the machine's processor has four cores, Hyper-Threading is not considered, then we believe that while there will be four threads running.

In other words, the number of concurrent access is 10,000, while the number of requests the underlying real parallel processing is 4.

If the number of concurrent smaller only four words, or if your machine has regressed 10000 core, that concurrent and parallel one effect here.

In other words, it can be complicated by the implementation of a virtual, but also can be performed simultaneously true. The parallel mean really performed simultaneously.

The conclusion is: Parallel execution is at the same time we have physical space-time, and concurrency is seen standing on the operating system thread after thread with this model abstract perspective "simultaneous" implementation.

Future

One, first met future

Key Features concurrent.futures modules are: ThreadPoolEXecutor and ProcessPoolExecutor class, two classes implement interfaces can be performed separately callable objects in different threads or processes.

These two classes within maintains a pool of worker threads or processes, as well as tasks to be performed queue.

from concurrent import futures

MAX_WORKERS = 20

def download_many():

    workers = min(MAX_WORKERS,len(url_list))
    with futures.ThreadPoolExecutor(workers) as executor:
        res = executor.map(download_one,sorted(url_list))
    return len(list(res))

The number of threads (1) set to work, the smaller value between the maximum allowed by the use of the number to be processed, so as not to create too much of a thread.

(2) download_one function calls concurrently in a plurality of threads, map method returns a generator, can be iterative, the function returns the value of each acquisition.

future and is an important component concurrent.futures asyncio module package.

There are two classes named Future python3.4 start from the standard library: concurrent.futures.Future and asyncio.Future

This is the same action of two classes: two examples represent possible Future classes are completed or not completed the delay calculation. Twisted Deferred with the class, Tornado framework class functions like Future

After encapsulation future operation to be completed, can be placed in a queue, the finished state can be queried, the result obtained (or thrown) to get the result (or error).

▲ Under normal circumstances should not create their own future, can only be instantiated by concurrent frame (concurrent.future or asyncio).

represent future things will eventually happen, but the only way to determine that something will happen is the time of execution has been scheduled.

Only when something is scheduled to give concurrent.futures.Executor subclass processing, will create concurrent.futures.Future instance.

**Executor.submit(fn, *args, kwargs)

Parameters Executor.submit () method of the object is a callable after calling this method will be callable for the incoming flight, return to a future.

▲ instead of blocking, but returned immediately. Use can be done () method of determining the task is finished.

Use cancel () method can cancel jobs submitted, if the task has been running in the thread pool, you can not cancel.

Client code should not change the state of the future, and transmits the future frame will change state at the end of the delay calculation represented future. And we can not control when the end of the calculation.

Executor.shutdown(wait=True)

Release system resources, call Executor.submit after () or Executor.map () and other asynchronous operations. Avoid using with statement explicitly call this method.

shutdown (wait = True) corresponds to the process tank pool.close () + pool.join () operation

wait = True, wait for the pool to perform all tasks completed after the completion of resource recovery continues -------- "Default

wait = False, returns immediately, and not wait for the task execution is completed pool

But no matter why wait parameter values, the entire program will wait until all tasks finished

Executor.add_done_callback(fn)

future has .add_done_callback (fn) method, which is only one parameter type is callable objects, running after the future will call the callable object specified.

receiving a parameter fn future, by obj.result (), after obtaining an execution result.

Executor.result()

.result () method, future run at the end of the call, then return the results callable objects thrown or re-thrown during execution object can be called abnormal.

If the run is not over, concurrent block the caller until the results can be returned.

concurrent.futures.as_completed()

Use concurrent.futures.as_completed function, the parameters of this function is a list of future / key dictionary for the future, the return value is a generator,

In the absence of task completion, will be blocked, when there is a task to complete, the task will yield future, will be able to perform for circulating the following statement, and then continue blocking circulation to end all tasks.

Can also be seen from the results, the first task will first notify the main thread.

Executor.map(func, * iterables, timeout=None)

Executor.map () return value is an iterator, __next__ method calls the iterator's future respective result () method, the result obtained is not the respective future future itself.

* Iterables: iterables, such as lists. Each time func executed, the parameters are taken from the iterables.

timeout: Set each asynchronous operation timeout

Modify Executor.map call into two for loops, and a schedule for creating a future, and one for obtaining the results of future

'''
遇到问题没人解答?小编创建了一个Python学习交流QQ群:579817333 
寻找有志同道合的小伙伴,互帮互助,群里还有不错的视频学习教程和PDF电子书!
'''
def download_many():

    with futures.ThreadPoolExecutor(max_workers=3) as executor:

        to_do = []
        for cc in sorted(url_list):
            future = executor.submit(download_one,cc)
            to_do.append(future)

        result = []
        for future in futures.as_completed(to_do):
            res = future.result()
            result.append(res)

executor.submit () method callable scheduled execution time, and then returns a Future, represents the operation to be performed.

future.result examples () method will not be blocked because the future as_completed output function.

▲ while future.result () using try block catch exceptions at

Second, obstructive I / O and GIL

Cpython interpreter itself is not thread-safe, it is Global Interpreter Lock (GIL), allowing only one thread to execute Python bytecode. Therefore, a Python process usually can not use more than one CPU core.

All standard library functions obstructive perform I / O operations, will be released while waiting for GIL operating system returns the result. I / O-intensive Python programs can benefit.

Python thread while waiting for a response to the network, blocking type I / O function will release the GIL, and then run a thread.

三、ProcessPoolExecutor

ProcessPoolExecutor and ThreadPoolExecutor classes implement a common Executor interface, the use of concurrent.futures module can easily put particular thread-based program turn into a process-based program.

ThreadPoolExecutor .__ init__ method requires max_workers parameter specifies the number of threads in the thread pool. (10, 100 or 1000 threads)

ProcessPoolExecutor class This parameter is optional, and is not used in most cases, the default value is os.cpu_count () function returns the number of CPU. Quad-core CPU, thus limiting only four concurrent. The thread pool can have hundreds of versions.

ProcessPoolExecutor class to allocate work to deal with multiple Python processes, so if you need to do CPU-intensive processing, the use of this module can bypass the GIL, use all the CPU core.

The principle is to create a ProcessPoolExecutor N independent Python interpreter, N being the number of the above system available core CPU.

Use the same method and ThreadPoolExecutor

from time import sleep,strftime
from concurrent import futures

def display(*args):
    print(strftime('[%H:%M:%S]'),end=' ')
    print(*args)

def loiter(n):
    msg = '{}loiter({}): doing nothing for {}s'
    display(msg.format('\t'*n,n,n))
    sleep(n*2)
    msg = '{}loiter({}): done.'
    display(msg.format('\t'*n,n))
    return n *10

def main():
    display('Script starting...')
    executor = futures.ThreadPoolExecutor(max_workers=3)
    results = executor.map(loiter,range(5))
    display('result:',results)
    display('Waiting for individual results:')

    for i,result in enumerate(results):
        display('result {}:{}'.format(i,result))

main()

Executor.map function returns the result of the same order and sequence started when called.

If the first call with the results generated for 10 seconds, while other calls only 1 second, 10 seconds blocking code, obtaining a first result returned by the method map generator output.

After that, get follow-up results will not be blocked because subsequent calls have ended.

If you need regardless of the order submitted, as long as you get results, use Executor.submit () and Executor.as_completed () function.

Fourth, show download progress bar

TQDM pack exceptionally easy to use.

'''
遇到问题没人解答?小编创建了一个Python学习交流QQ群:579817333 
寻找有志同道合的小伙伴,互帮互助,群里还有不错的视频学习教程和PDF电子书!
'''
from tqdm import tqdm
import time

for i in tqdm(range(1000)):
    time.sleep(.01)

tqdm function can handle any object can be iterative, generates an iterator.

When using this iterator, and a progress bar shows the remaining time to complete all iterations expected.

To calculate the remaining time, tqdm function can be used to obtain a function len determine the size of the object may be iterative, or the number of elements in the second parameter specifies contemplated.

如:iterable = tqdm.tqdm(iterable, total=len(xx_list))

Asyncio

First, the use of concurrent packet processing asyncio

This package is the main event loop using a coroutine concurrency.

import asyncio
import itertools
import sys

@asyncio.coroutine
def spin(msg):

    write,flush = sys.stdout.write,sys.stdout.flush

    for char in itertools.cycle('|/-\\'):
        status = char + ' ' +msg
        write(status)
        flush()
        write('\x08'*len(status))
        try:
            yield from asyncio.sleep(.1)
        except asyncio.CancelledError:
            break
    write(' '*len(status) + '\x08'*len(status))

@asyncio.coroutine
def slow_function():
    yield from asyncio.sleep(3)
    return 42

@asyncio.coroutine
def supervisor():
    spinner = asyncio.async(spin('thinking'))
    print('spinner object:',spinner)
    result = yield from slow_function()
    spinner.cancel()
    return result

def main():
    loop = asyncio.get_event_loop()
    result = loop.run_until_complete(supervisor())
    loop.close()
    print('Answer:',result)

(1) intends to co-drive asyncio process to be used @ asyncio.coroutine decoration.

(2) using the yield from asyncio.sleep instead of time.sleep, so that sleep does not block the event loop.

(3) asyncio.async (...) are scheduled to run time spin function coroutine, using a spin coroutine Task object package, and return immediately.

(4) obtain a reference to the event loop, driver supervisor coroutine.

▲ If you need to do nothing written over a period of time, you should use the yield from asyncio.sleep (DELAY)

asyncio.Task objects with almost equivalent threading.Thread objects, Task objects to achieve such cooperative multitasking library (such as: gevent) green thread (green thread) in

Gets the Task object has a scheduled run time, Thread instance must start method is called explicitly told him to run.

API does not terminate the thread from the outside, because the thread may be interrupted at any time, cause the system is inactive.

If you want to terminate the task, Task.cancel () instance method, an exception is thrown CancelledError. Coroutine catch this exception may be suspended in the yield, the process termination request.

二、asyncio.Future 与 concurrent.futures.Future

asyncio.Future and concurrent.futures.Future class interfaces are basically the same, but different implementations, not interchangeable.

just scheduling future results of the implementation of something.

In asyncio package, BaseEventLoop.create_task (...) method of receiving a coroutine, its scheduled running time, and then returns a asyncio.Task instance, is asyncio.Future class instance, as a subclass of Task is the Future, for packaging coroutine.

Asyncio.Future object class is used with a yield from, the following method is generally not required.

(1) without calling my_future.add_done_callback (...), because you can directly operate like to perform at the end of future co-operation on process yield from behind my_future expression,

(2) without calling my_future.result (), because the yield from the output from the future value is the result (result = yield from my_future).

asyncio.Future objects driven by the yield from, rather than call these methods drive.

Gets Task object in two ways:

(1)asyncio.async(coro_or_future, *, loop=None),

If the first argument is the Future or Task object returned. If coroutine, then async function calls loop.create_task (...) method to create a Task object.

(2)BaseEventLoop.create_task(coro),

Coroutine execution scheduled time, the object returns a asyncio.Task.

Three, asyncio and aiohttp

asyncio package only supports TCP and UDP. If you want to use HTTP or other protocols, we must rely on third-party packages.

'''
遇到问题没人解答?小编创建了一个Python学习交流QQ群:579817333 
寻找有志同道合的小伙伴,互帮互助,群里还有不错的视频学习教程和PDF电子书!
'''
import asyncio
import aiohttp

@asyncio.coroutine
def get_flag(url):
    resp = yield from aiohttp.request('GET',url)
    data = yield from resp.read()
    return data

@asyncio.coroutine
def download_one(url):
    data = yield from get_flag(url)
    return url

def download_many():
    loop = asyncio.get_event_loop()
    to_do = [download_one(url) for url in sorted(url_list)]
    wait_coro = asyncio.wait(to_do)
    res,_ = loop.run_until_complete(wait_coro)
    loop.close()
    
    return len(res)

Blocked by co-operation to achieve the process, the client code by yield from delegating responsibilities to co-drive in order to run asynchronously coroutine.

Construction of coroutine object list.

asyncio.wait is a coroutine, and the like after the end of it all passed coroutine finished running. wait function default behavior.

loop.run_until_complete (wait_coro) executed event loop. Wait_coro run until the end; the process cycle time is running, the script will be back up here.

asyncio.wait function execution returns after a tuple, a first end of the Future is a series element, the second element is not completed a series future.

(If you set timeout and return_when will not return to the end of the future)

▲ In order to use asyncio package, must function for each access network into an asynchronous version, using the yield from processing network operations, so as to put control back to the event loop.

to sum up:

(1) we write coroutine chain has always been driven by the outermost delegate generator package passed asyncio a function in the API (such as loop.run_until_complete (...)).

Next package implemented by asyncio (...) or .send (...)

(2) we write coroutine chain always yield from the responsibilities entrusted by the Association to asyncio package coroutine function or a coroutine method (yield from asyncio.sleep (...)), or other protocols to achieve high-level library Cheng (yield from aiohttp.request (...)),

That is the innermost layer of sub-generator is a function library actually perform I / O operations, rather than the functions we have written.

Four, asyncio combined with progress bar

Driven by loop.run_until_complete method, after all coroutine has finished running, this function will return all downloads results.

However, in order to update the progress bar, various co-operation after the end of the process necessary to get immediate results.

import asyncio
import aiohttp
from tqdm import tqdm
import collections

@asyncio.coroutine
def get_flag(url):
    resp = yield from aiohttp.request('GET',url)
    data = yield from resp.read()
    return data

@asyncio.coroutine
def download_one(url,semaphore):

    try:
        with (yield from semaphore):
            data = yield from get_flag(url)
    except Exception as exc:
        ''''''
    else:
        save_data(data)
    return url

@asyncio.coroutine
def download_coro(url_list,concur_req):

    counter = collections.Counter()
    semaphore = asyncio.Semaphore(concur_req)

    to_do = [download_one(url,semaphore) for url in url_list]
    to_do_iter = asyncio.as_completed(to_do)

    to_do_iter =  tqdm(to_do_iter,total=len(url_list))
    for future in to_do_iter:

        try:
            res = yield from future
        except Exception as exc:
            ''''''
        counter[status] += 1
    return counter

def download_many():
    loop = asyncio.get_event_loop()
    coro = download_coro(url_list,concur_req)
    res = loop.run_until_complete(coro)
    loop.close()

    return res

(1) use of a limiting mechanism to prevent too many concurrent initiation request to the server, the number can be set by using the thread pool class ThreadPoolExecutor;

(2) asyncio.Semaphore This object maintains an internal counter, the semaphore used as context manager. To ensure that there will not be any time over the X coroutine starts.

(3) asyncio.as_completed (xxx), to obtain an iterator, the iterator will return in the future after the end of the future run.

(4) Run the end of the iteration future, get results asyncio.Future object, use the yield from, rather than future.result () method.

(5) can not use the dictionary mapping mode, because asyncio.as_completed function returns the future pass as_completed future functions may be different. In asyncio internal packet, future we provide will be replaced in the future produce the same result.

Fifth, use the Executor object prevent obstruction event loop

In the above example, save_data (...), performs disk I / O operations, which should be executed asynchronously.

In the thread versions, save_data (...) blocks the thread download_one function, but blocked just one of many threads of a work.

Type blocking I / O calls behind the release GIL, therefore another thread can continue.

But in asyncio in, save_data (...) function blocks and unique client code thread asyncio common event loop, so when you save a file, the entire application will be frozen.

asyncio event loop behind the defenders a ThreadPoolExecutor object, we can call run_in_executor method, the callable issue its execution.

@asyncio.coroutine
def download_one(url,semaphore):

    try:
        with (yield from semaphore):
            data = yield from get_flag(url)
    except Exception as exc:
        ''''''
    else:
        loop = asyncio.get_event_loop()
        loop.run_in_executor(None, save_data, data)
    return url

(1) acquire event loop object.

The first parameter (2) run_in_executor method is Executor example; if set to None, using default event loop ThreadPoolExecutor instance.

(3) The remaining parameters are callable objects, and the object position parameter can be called.

Each download initiate multiple requests:

@asyncio.coroutine
def get_flag(url):
    resp = yield from aiohttp.request('GET',url)
    data = yield from resp.read()
    json = yield from resp.json()
    return data

@asyncio.coroutine
def download_one(url,semaphore):

    try:
        with (yield from semaphore):
            flag = yield from get_flag(url)
        with (yield from semaphore):
            country = yield from get_country(url)
    except Exception as exc:
        ''''''
    return url
Published 705 original articles · won praise 859 · Views 1.5 million +

Guess you like

Origin blog.csdn.net/sinat_38682860/article/details/105419842