tornado executes multiple asynchronous requests in parallel

When Tornado implements the background logic, it may encounter such a situation: It needs to request multiple third-party data at the same time, such as requesting data from multiple URLs at the same time, and these third-party data are not related to each other. The easiest way is to write multiple yields. After the first yield returns the result, continue to request the second yield. Although this will not affect the overall performance, because the current yield is in progress, the program can continue to execute other requests without waiting here. But for a single request, from its point of view, it is to sequentially request multiple third-party data and then return it, which will cause the request to take a long time to process. 

So does Tornado have such a mechanism? For a single request, if you need to request multiple unrelated external data, you can send these data requests at the same time, and then perform the following process after all the data is returned. It is equivalent to a math problem in elementary school. It takes 2 minutes to brush your teeth and 3 minutes to make instant noodles. We don’t need to brush your teeth and then soak the noodles; instead, you can brush your teeth while making instant noodles, which can reduce the 5 minutes of a single process to 3 minutes. Of course the answer is yes, Tornado has a similar mechanism.


Tornado can use the following methods to concurrently make n requests at the same time:

response1, response2,... responsen = yield [http_client.fetch(url1) , http_client.fetch(url2), ...... ,http_client.fetch(url2) ]

After all n requests are responded, it will return to the program control


For the principle, I found an article on the Internet, which is very detailed,

The following are reproduced content, the original article address: http://www.pulpcode.cn/2016/03/06/tornado-yield-futures-run-in-parallel/

The principle of tornado executing multiple asynchronous in parallel


origin

In fact, I have been using tornado asynchronously before, and I probably know the principle of tornado Coroutine yield. I never knew that tornado can support "multiple asynchronous concurrent executions at the same time, and then return after their results are obtained". The official website gives a similar way of writing:

You can also yield a list or dict of Futures, which will be started at the same time and run in parallel; a list or dict of results will be returned when they are all finished:

@gen.coroutine
def get(self):
http_client = AsyncHTTPClient()
response1, response2 = yield [http_client.fetch(url1),
http_client.fetch(url2)]
response_dict = yield dict(response3=http_client.fetch(url3),
response4=http_client.fetch(url4))
response3 = response_dict['response3']
response4 = response_dict['response4']

So this blog will dig out why.

prerequisite knowledge

If you want to understand this article, you need to know what is yield, what is asynchrony, and what is coroutine, understand the principle of tornado asynchrony, the Futrue objects used in it, coroutine, etc., how they are used by tornado ioloop for context switching.

When you use yield in a function, the function is called a generator. Unlike ordinary functions, it can be interrupted. What is interrupt? For example, there is a function now, you call it, at this time, you can only wait here for the result of the call, and you can't do anything, because the control has been handed over to the function, of course, a little higher than this is to pass in the callback function This function enables this function to call this callback function when certain conditions occur.

With yield, it is more advanced, and control can be returned to the caller during the execution of this function, so that we can continue to let this function execute after doing something. This is the real essence of yield, which is much higher-end than xrange.

pass parameters

If a function has a statement like this:

m = yield n, that actually means that in the communication between the function and the outside, n will be sent by the generator, and m will be passed in by the external caller (via send).

The corresponding tornado statement:response = yield http_client.fetch(url)

In fact, http_client.fetch(url) returns a Future object. When the handler function (such as get) is wrapped by the decorator, it will start the generator returned by yield through generator.next(), through ioloop and generator.send (vaule) drives the generator to run, and has achieved the purpose of asynchronous execution.

In tornado's coroutine asynchronous processing, the asynchronous callback function is encapsulated by the Future object. When you see Futrue, you will definitely remember that the new concurrent.futures function of python3.2 is actually similar in function, or the idea is the same, that is, "encapsulate callable for asynchronous execution." You can simply write it as: The tornado asynchronous function will return a Future object, yield this object, and will get the final result.

Future object

This Future object has a flag called _set_done, and _set_done is set when calling set_result(self, result) to set the result for this Future object. After set_done is set, all its add_done_callbacks will be executed. Then you can use the result method to get the final result.

A simple example of Future:

class HelloHandler(RequestHandler):
@gen.coroutine
def get(self):
x = yield self.do_test()
self.render("xxxx")
def do_test(self):
fut = Future()
fut.set_result("test")
return fut

find the problem

After talking about the above basic knowledge, we should go to the original problem, that is, the support code of "multiple asynchronous concurrent executions at the same time, and then return after their results are obtained".

First of all, we find the source code of tornado, and find coroutinethat it is actually implemented like this.

def _make_coroutine_wrapper(func, replace_callback):
"""The inner workings of ``@gen.coroutine`` and ``@gen.engine``.
The two decorators differ in their treatment of the ``callback``
argument, so we cannot simply implement ``@engine`` in terms of
``@coroutine``.
"""
# On Python 3.5, set the coroutine flag on our generator, to allow it
# to be used with 'await'.
if hasattr(types, 'coroutine'):
func = types.coroutine(func)
@functools.wraps(func)
def wrapper(*args, **kwargs):
future = TracebackFuture()
if replace_callback and 'callback' in kwargs:
callback = kwargs.pop('callback')
IOLoop.current().add_future(
future, lambda future: callback(future.result()))
try:
result = func(*args, **kwargs) # If the function is a normal function, it will return a value, otherwise it will return a generator.
except (Return, StopIteration) as e:
result = _value_from_stopiteration(e)
except Exception:
future.set_exc_info(sys.exc_info())
return future
else:
if isinstance(result, GeneratorType): # When it is a generator.
# Inline the first iteration of Runner.run. This lets us
# avoid the cost of creating a Runner when the coroutine
# never actually yields, which in turn allows us to
# use "optional" coroutines in critical path code without
# performance penalty for the synchronous case.
try:
orig_stack_contexts = stack_context._state.contexts
yielded = next(result) # result as a generator will execute to yield and return a Future object.
if stack_context._state.contexts is not orig_stack_contexts:
yielded = TracebackFuture()
yielded.set_exception(
stack_context.StackContextInconsistentError(
'stack_context inconsistency (probably caused '
'by yield within a "with StackContext" block)'))
except (StopIteration, Return) as e:
future.set_result(_value_from_stopiteration(e))
except Exception:
future.set_exc_info(sys.exc_info())
else:
Runner(result, future, yielded)
try:
return future
finally:
# Subtle memory optimization: if next() raised an exception,
# the future's exc_info contains a traceback which
# includes this stack frame. This creates a cycle,
# which will be collected at the next full GC but has
# been shown to greatly increase memory usage of
# benchmarks (relative to the refcount-based scheme
# used in the absence of cycles). We can avoid the
# cycle by clearing the local variable after we return it.
future = None
future.set_result(result)
return future
return wrapper

You can see the key words:

 
      
1
2
3
 
      
result = func(*args, **kwargs)
yielded = next(result)
Runner(result, future, yielded)

Simply put, this function captures the generator object returned by the decorator function and passes it to the Runner.

The Runner's code actually looks like this:

def __init__(self, gen, result_future, first_yielded):
self.future = first_yield
self.io_loop.add_future(
self.future, lambda f: self.run()
)
def run(self):
while True:
if not future.done():
return
try:
value = future.result()
yielded = self.gen.send(value)
except (StopIteration, Return) as e:
self.finished = True
except Exception:
self.finished = True
return
if not self.handle_yield(yielded):
return

And this Runner will register the Futrue object in the io_loop, or in the previous example, we can say that the asynchronous function fetch is registered in the ioloop, and when the fetch is completed, it will call one of its own callback functions (what we are discussing here is no In the case of passing callback to fetch, see the definition of AsyncHTTPClient for details), and set the value to the future object. And io_loop will call the callback function lambda f: self.run()to assign the value of future.result to value.
You can see that this value is sent to the generator, and then the generator will get the next execution point (generator)

This runner uses an iterative method to obtain generators one by one. After processing, use handle_yield to determine whether there is a next Future object and callback. (The reason for using iteration instead of recursion is because recursion in python is a slow thing.) until all futures are done.

Then we turn our attention to the handle_yield function.

def handle_yield(self, yielded):
# Lists containing YieldPoints require stack contexts;
# other lists are handled in convert_yielded.
if _contains_yieldpoint(yielded):
yielded = multi(yielded)
if isinstance(yielded, YieldPoint):
# YieldPoints are too closely coupled to the Runner to go
# through the generic convert_yielded mechanism.
self.future = TracebackFuture()
def start_yield_point():
try:
yielded.start(self)
if yielded.is_ready():
self.future.set_result(
yielded.get_result())
else:
self.yield_point = yielded
except Exception:
self.future = TracebackFuture()
self.future.set_exc_info(sys.exc_info())
if self.stack_context_deactivate is None:
# Start a stack context if this is the first
# YieldPoint we've seen.
with stack_context.ExceptionStackContext(
self.handle_exception) as deactivate:
self.stack_context_deactivate = deactivate
def cb():
start_yield_point()
self.run()
self.io_loop.add_callback(cb)
return False
else:
start_yield_point()
else:
try:
self.future = convert_yielded(yielded)
except BadYieldError:
self.future = TracebackFuture()
self.future.set_exc_info(sys.exc_info())
if not self.future.done() or self.future is moment:
self.io_loop.add_future(
self.future, lambda f: self.run())
return False
return True


can be seen

if not self.future.done() or self.future is moment:
self.io_loop.add_future(
self.future, lambda f: self.run())

So this is how the loop goes to the next yield. Also note that the YieldPoint mentioned in the code has been abandoned, and Tornado4.0 also recommends the use of the Future type.

Note again the convert_yielded function.

def convert_yielded(yielded):
"""Convert a yielded object into a `.Future`.
The default implementation accepts lists, dictionaries, and Futures.
If the `~functools.singledispatch` library is available, this function
may be extended to support additional types. For example::
@convert_yielded.register(asyncio.Future)
def _(asyncio_future):
return tornado.platform.asyncio.to_tornado_future(asyncio_future)
.. versionadded:: 4.1
"""
# Lists and dicts containing YieldPoints were handled earlier.
if isinstance(yielded, (list, dict)):
return multi(yielded)
elif is_future(yielded):
return yielded
elif isawaitable(yielded):
return _wrap_awaitable(yielded)
else:
raise BadYieldError("yielded unknown object %r" % (yielded,))


We noticed the call to multi(yielded), and finally we found this code, which is the answer:

def multi(children, quiet_exceptions=()):
"""Runs multiple asynchronous operations in parallel.
``children`` may either be a list or a dict whose values are
yieldable objects. ``multi()`` returns a new yieldable
object that resolves to a parallel structure containing their
results. If ``children`` is a list, the result is a list of
results in the same order; if it is a dict, the result is a dict
with the same keys.
That is, ``results = yield multi(list_of_futures)`` is equivalent
to::
results = []
for future in list_of_futures:
results.append(yield future)
If any children raise exceptions, ``multi()`` will raise the first
one. All others will be logged, unless they are of types
contained in the ``quiet_exceptions`` argument.
If any of the inputs are `YieldPoints <YieldPoint>`, the returned
yieldable object is a `YieldPoint`. Otherwise, returns a `.Future`.
This means that the result of `multi` can be used in a native
coroutine if and only if all of its children can be.
In a ``yield``-based coroutine, it is not normally necessary to
call this function directly, since the coroutine runner will
do it automatically when a list or dict is yielded. However,
it is necessary in ``await``-based coroutines, or to pass
the ``quiet_exceptions`` argument.
This function is available under the names ``multi()`` and ``Multi()``
for historical reasons.
.. versionchanged:: 4.2
If multiple yieldables fail, any exceptions after the first
(which is raised) will be logged. Added the ``quiet_exceptions``
argument to suppress this logging for selected exception types.
.. versionchanged:: 4.3
Replaced the class ``Multi`` and the function ``multi_future``
with a unified function ``multi``. Added support for yieldables
other than `YieldPoint` and `.Future`.
"""
if _contains_yieldpoint(children):
return MultiYieldPoint(children, quiet_exceptions=quiet_exceptions)
else:
return multi_future(children, quiet_exceptions=quiet_exceptions)


After that is the definition of multi_future:

def multi_future(children, quiet_exceptions=()):
"""Wait for multiple asynchronous futures in parallel.
This function is similar to `multi`, but does not support
`YieldPoints <YieldPoint>`.
.. versionadded:: 4.0
.. versionchanged:: 4.2
If multiple ``Futures`` fail, any exceptions after the first (which is
raised) will be logged. Added the ``quiet_exceptions``
argument to suppress this logging for selected exception types.
.. deprecated:: 4.3
Use `multi` instead.
"""
if isinstance(children, dict):
keys = list(children.keys())
children = children.values()
else:
keys = None
children = list(map(convert_yielded, children))
assert all(is_future(i) for i in children)
unfinished_children = set(children)
future = Future()
if not children:
future.set_result({} if keys is not None else [])
def callback(f):
unfinished_children.remove(f)
if not unfinished_children:
result_list = []
for f in children:
try:
result_list.append(f.result())
except Exception as e:
if future.done():
if not isinstance(e, quiet_exceptions):
app_log.error("Multiple exceptions in yield list",
exc_info=True)
else:
future.set_exc_info(sys.exc_info())
if not future.done():
if keys is not None:
future.set_result(dict(zip(keys, result_list)))
else:
future.set_result(result_list)
listening = set()
for f in children:
if f not in listening:
listening.add(f)
f.add_done_callback(callback)
return future


This is the key code for it to support parallel and asynchronous. You can see that this wrapped Future, its listening, maintains multiple sub-futures. Each time a sub-future is completed, the callback will be called to remove it from unfinished_children , when the callbacks of all sub-Futures are executed, the set_result method of this Future is actually called.



Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325721968&siteId=291194637