and generating python Detailed coroutine

sequence

The first process Introduction

The second difference between processes and threads

The third thread

A fourth generator (Generator)

The fifth coroutine

Sixth coroutine and generator

And the seventh multi-threaded coroutines

 

When it comes to Python coroutine would expect, processes and threads, of course, is inseparable from the generator.

A process Introduction

The implementation of a program example is a process. Each process provide all the resources needed to execute the program. (A collection of resources on the nature of the process)

A process has a virtual address space, executable code, operating system interface, security context (users and permissions to start the recording process, etc.), unique process ID, environment variables, priority class, minimum and maximum working space (memory space), but also at least one thread.

Each process will start when first produce a thread, that is the main thread. Then the main thread will then create other sub-threads.

Resource-related processes, including:

    A memory page (all the threads in the same process share the same memory space)

 B file descriptor (eg open sockets)

 C security credentials (eg user ID to start the process)

II process and thread difference

    A thread in the same process share the same memory space, but are independent of each process.
    B data with all threads in a process are shared (interprocess communication), data between processes are independent.
    C may affect the behavior of other threads to modify the main thread, but modifying the parent process (except delete) does not affect the other sub-processes.
    D context of a thread is executing instructions, and the process is related to the operation of cluster resources.
    E same between the threads of a process can communicate directly, but the exchange between the process needs to be implemented by the intermediate proxy.
    F to create a new thread is easy, but creating a new process needs to do a copy of the parent process.
    G A thread can operate other threads in the same process, but the process can only operate its child processes.
   H thread starts fast process starts slow (but the two are not comparable speed).

Three thread

A thread is defined 

A thread is the smallest unit of an operating system capable of operation scheduling. It is included in the process, the actual operation of the unit process. A thread refers to a single control flow of a process sequence, a process can be complicated by a plurality of threads, each thread in parallel to perform different tasks. A thread is an execution context (execution context), i.e., a series of instructions that when executed cpu required.

B works

Suppose you are reading a book, did not finish, you want to take a break, but you want to revert to a specific schedule was read at the back. One way is to write down the number of pages, the number of rows and the number of words of these three values, the value is the execution context. If your roommate when you rest, using the same method to read this book. You and she only needs to write down these three numbers together you can read this book in alternate time.

Work similar to this thread. CPU at the same time will give you an illusion can do more operations, in fact it only took a little time on each operation, essentially the same CPU time did only one thing. It can do so because it has execution context of each operation. Just like you and your friends can share the same book, like multi-tasking can also share the same CPU.

C common method

method Note
start() Threads ready, waiting for the CPU scheduling
setName() Set the name of the thread
getName() Get Thread name
setDaemon(True) Set a daemon thread
join() One by one each thread execution, after the implementation to continue down
run() Thread run method is performed automatically after the thread object cpu scheduling, if you want a custom thread class, override the run method directly on the line

 

D work process

python when using multi-threaded, native threads is called c language.

  1. Get public data
  2. Application gil
  3. python interpreter call os native threads
  4. os arithmetic operation performed cpu
  5. When the thread execution time arrives, regardless of whether the operation has been carried out completely, gil are required to release
  6. Then repeat the above process by another process
  7. And other threads before other processes executed, will switch to (his record context continues from)
    the whole process is that each thread execute its own operations, when executed on time to switch (context switch).

And GDP

In the non-python environment, the single-core cases, while only one task to perform. When the multi-core can support multiple threads simultaneously. But in python, no matter how many cores, while only execute one thread. The reason for this is due to the presence of GIL.

GIL stands for Global Interpreter Lock (Global Interpreter Lock), the source is considered the beginning of the python design, for data security decisions made. A thread you want to perform, you must first get GIL, we can GIL as a "pass", and in a python process, GIL only one. Can not get pass the thread was not allowed into the CPU. GIL found only in cpython because cpython call is native threads c language, so he can not directly operate cpu, only use GIL guarantee the same time only one thread to get the data. In pypy and jpython it is not the GIL.

GIL version differences in python:

1, in python2.x in, GIL release logic is the current thread met IO操作or ticks计数达到100when released. (Ticks python itself can be seen as a counter, to specialize the GIL, zero after each release, can be adjusted by this count sys.setcheckinterval). And each release GIL lock, thread lock contention, switching threads, consumes resources. And because of the presence of GIL lock, python in a process can only ever be executed simultaneously a thread (GIL get the thread to execute), which is why on multi-core CPU, multithreading python efficiency is not high.
2, in the python3.x, GIL does not use ticks count, instead using a timer (after the execution time threshold is reached, the current thread releases the GIL), so CPU-intensive programs more friendly, but still does not address the same time due to GIL only one thread of execution issues, so efficiency is still unsatisfactory.

F efficiency

python for different types of code execution efficiency is different

1, CPU intensive code (various loop processing, calculation, etc.), in this case, since multiple calculations, counting ticks will soon reach the threshold, then re-trigger the release of competitive GIL (s round threads switching course, need to consume resources), so multiple threads in python for CPU-intensive code is not friendly.
2, IO intensive code (document processing, web crawlers and other operations involving file read and write), multiple threads can effectively improve the efficiency (IO operations will be IO have to wait for a single thread, resulting in unnecessary waste of time, and open multiple threads a thread can wait while automatically switching to the thread B, without wasting CPU resources, which can improve the efficiency of program execution). So python multi-threading more friendly for IO-intensive code.

Suggest:

Under python you want to take full advantage of multi-core CPU, to use multiple processes. Because each process has its own independent GIL, without disturbing each other, so that you can execute in parallel in the true sense, in python, the efficiency is better than multi-process multi-thread (only for multi-core CPU terms).

G thread lock

Because it is random scheduling between threads, and each thread may be executed only after the execution of n, the dirty data may occur when multiple threads simultaneously modify the same data, so that the thread-locking occurs, i.e., one thread at a time perform the operation. Thread lock for locking resources, you can define multiple locks, like the following code, when you need to monopolize a resource, any lock can lock this resource, like you use different locks can be identical to one door lock is a reason.

Because it is random scheduling between threads, if there are multiple threads simultaneously operate an object, if the object is not well protected, can cause unexpected results of the proceedings, we call this "thread unsafe."

A fourth generator (Generator)

What is Builder

List by the formula, we can create a list directly. However, subject to memory limitations, the list is certainly limited capacity. Also, create a list containing one million elements, not only take up much storage space, if we only need to access the first few elements behind the elements that the vast majority of occupied space are wasted.

So, if the list element can be calculated out in accordance with an algorithm that if we can continue to calculate in the process cycle in a subsequent elements? This eliminates the need to create a complete list, thus saving a lot of space. In Python, while circulating mechanism for this calculation, known as a generator: generator.

How to Create a Builder

The first method is very simple, as long as a list generation type of []change (), creates a generator

>>> L = [x * x for x in range(10)]
>>> L
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
>>> g = (x * x for x in range(10))
>>> g
<generator object <genexpr> at 0x1022ef630>

Creating Land gdiffer only in the outermost layer []and (), La list, and ga generator.

Method Two, 

In a general function in the yield keyword, you can achieve a simple generator, then this function becomes a generator function, call the function is to create a generator (generator) objects. yield and return returns the same value, except that after the return return, function status is terminated, and the yield will save the current execution state function, held before the return function has returned to the state to continue.

When the generator function comprises one or more yield when calling a generator function, the function will return an object, but not immediately execute the __iter down like __ () and the __next __ () method is automatic, so we can when next () method of the object to iterate once the function is yield, pause function, control returns to the caller local variables and their status will be saved until the next call to the function terminates, StopIteraion is automatically thrown out, the generator how to save resources?

Just remember the current position, the generator retaining only one value, next after the last value, there is no

Only a next method,

#b#__next__()

working principle:

1 works is by repeatedly calling the next () method until you catch an exception.

2 Available next () call to generate object values. next two ways t .__ next __ () | next (t).

Available for loop acquires a return value (Each time, which takes a value generator)

(Not substantially used next()to obtain a return value is, but the use of direct forcycle iterations).

3 yield is equivalent to a return when the return value, and remember this return position, the next iteration, the code is executed from the next statement of yield.

4 .send () and next () the same, so that the generator can continue to take a step down (next encounter yield stop), but send () can pass a value that as a result of the overall yield of expression

- in other words, you can send a force to modify the value of the expression yield. Such as a yield function assignment, a = yield 5, this first iteration will return to 5, a not yet assigned. Second iteration, using .send (10), then, is forced to modify the yield 5 expression is 10, originally 5, then a = 10

The famous Fibonacci columns

Known Fibonacci number, except for the first and second number, any number can be obtained by adding by the first two numbers:

1,1,2,3,5,8,13,21,34,...

Fibonacci column with a list of Formula write to, however, use the function print it out very easily:

def fib(max):

     n,a,b=0,0,1

         while n<max:

              print(b)

              a, b = b, a+b n=n+1

              return 'done'

f=fib(10)

operation result:

1 1 2 3 5 8 13 21 34 55

Want to make it into a generator, only need to print (b), can be changed yiled b

def fib(max):

     n,a,b=0,0,1

         while n<max:

              yiled b

              a, b = b, a+b n=n+1

              return 'done'

Message # abnormal operating results printed: the value carried by way of the use of next

for i in f:

print(i)

Methods using the next data is not taken out, it will throw an exception

Traceback (most recent call last):

File "D:\python\index.py", line 80, in <module>

print(f.__next__())

StopIteration: done

How to handle the exception, an exception can crawl through

g=fib(6)

# Exception handling code

while True:

try:

x = next(g)

print('g:', x)

except StopIteration as e:

print('Generator return value:', e.value)

The final result:

g: 1

g: 1

g: 2

g: 3

g: 5

g: 8

Generator return value: done

===start loop===

Why Builder

Easier to use, the smaller the amount of code to use memory more efficiently. For example, a list of all the memory is allocated space in the establishment, and the generator is only used only when needed, more like a record represents an infinite stream. If we want to read and use content far more than memory, but requires the contents of all streams in the process, then the generator is a good choice, for example, allows the generator to return the current processing status, because it can state of preservation, so next time you can deal directly. Pipeline Builder.

The fifth coroutine

A process definition Association

Operation threads and processes are triggered by the program system interface, the system is the final performer, is the operating system functionality on its nature. And coroutine operation is designated by the programmer through the yield in python, artificial achieve concurrent processing.

Meaning coroutine exist: for multi-threaded applications, CPU to switch between threads executed by slicing the way, require time-consuming when a thread switch. Coroutine, only a single thread, a thread decomposed into a plurality of "micro-threads", a predetermined order of execution of a block of code in a thread.

Coroutine application scenarios: the CPU is not required when a large number of operating procedures exists (IO).
Gevent common third-party modules and greenlet, subsequent introduction. (Essentially, the package of greenlet GEVENT is advanced, so it is generally a line, which is a reasonably efficient module.)

 B greenlet and gevent

greenlet

from greenlet import greenlet

def test1():
    print(12)
    gr2.switch()
    print(34)
    gr2.switch()

def test2():
    print(56)
    gr1.switch()
    print(78)

= greenlet GR1 (test1)
GR2 = greenlet (test2)
gr1.switch ()
Indeed, greenlet is to switch between different tasks switch method.

gevent use

from gevent import monkey; monkey.patch_all()
import gevent
import requests

def f(url):
    print('GET: %s' % url)
    resp = requests.get(url)
    data = resp.text
    print('%d bytes received from %s.' % (len(data), url))

gevent.joinall ([
        gevent.spawn (F, 'https://www.python.org/'),
        gevent.spawn (F, 'https://www.yahoo.com/'),
        gevent.spawn (F , 'https://github.com/'),
])
to f task and its parameters are unified by joinall, to achieve a single thread coroutine. Code package level is high, the actual use only need to know a few of its main methods can be.

gevent based network library python coroutine, using greenlet libev provided at the top of a high-level event loop concurrency API.

    Features:

    (1) Fast libev the event loop, epoll mechanism on Linux.

    (2) based on the light of the execution unit greenlet.

    (3) API reuse the content python standard library.

    (4) support collaborative SSL sockets.

    (5) can be achieved through the DNS query thread pool or c-ares.

    (6) so that a third party by monkey patching module programming function cooperative.

gevent support coroutines, in fact, can be said to be the work of switching greenlet achieve.

    greenlet works as follows: if the access network I / O operations congestion occurs, greenlet to explicitly switch to another code segment execution has not been blocked, and the blocked state until after the disappearance, will then automatically switches the original code segment continue processing. It can be said, greenlet is a more reasonable arrangements for the serial work.

    Meanwhile, since the IO operation time-consuming, often the program is waiting, after the automatic switching gevent coroutine, to ensure that there is always greenlet running, without waiting for the completion of IO, which is the reason for the high coroutine efficiency than the average multithreaded .

    IO operation is done automatically, so gevent need to modify some of the standard library that comes with python will some common obstruction, such as: socket, select other places to achieve coroutine jump, this process can be done by monkey patch.

如下代码可以显示 gevent的使用流程:(python版本: 3.6  操作系统环境: windows10)

from gevent import monkey
monkey.patch_all()
import gevent
import urllib.request
 
def run_task(url):
    print("Visiting %s " % url)
    try:
        response = urllib.request.urlopen(url)
        url_data = response.read()
        print("%d bytes received from %s " % (len(url_data), url))
    except Exception as e:
        print(e)
 
if __name__ == "__main__":
    urls = ["https://stackoverflow.com/", "http://www.cnblogs.com/", "http://github.com/"]
    greenlets = [gevent.spawn(run_task, url) for url in urls]
    gevent.joinall(greenlets)
Visiting https://stackoverflow.com/ 
Visiting http://www.cnblogs.com/ 
Visiting http://github.com/ 
46412 bytes received from http://www.cnblogs.com/ 
54540 bytes received from http://github.com/ 
251799 bytes received from https://stackoverflow.com/ 

gevent The spawn can be seen as a method for forming a coroutine, joinall coroutine method is equivalent to adding tasks and start running. We can see from the results, three network requests concurrently, and the end of the order have been inconsistent, but only one thread.

    gevent also offers pool. If you have a dynamic number of greenlet need for concurrency management, you can use the pool to handle a large number of network requests and IO operations.

    The following is a pool of objects gevent, the multi-network modification request example above:


from gevent import monkey
monkey.patch_all()
from gevent.pool import Pool
import urllib.request
 
def run_task(url):
    print("Visiting %s " % url)
    try:
        response = urllib.request.urlopen(url)
        url_data = response.read()
        print("%d bytes reveived from %s " %(len(url_data), url))
    except Exception as e:
        print(e)
    
    return ("%s read finished.." % url)
 
if __name__ == "__main__":
    pool = Pool(2)
    urls = ["https://stackoverflow.com/", 
            "http://www.cnblogs.com/", 
            "http://github.com/"]
    results = pool.map(run_task, urls)
    print(results)

Visiting https://stackoverflow.com/ 
Visiting http://www.cnblogs.com/ 
46416 bytes reveived from http://www.cnblogs.com/ 
Visiting http://github.com/ 
253375 bytes reveived from https://stackoverflow.com/ 
54540 bytes reveived from http://github.com/ 
['https://stackoverflow.com/ read finished..', 'http://www.cnblogs.com/ read finished..', 'http://github.com/ read finished..']

From the results of view, the number of concurrent objects Pool coroutine were management, access to the first two, wherein when a task is completed, before proceeding with the third request.

 

C state

Coroutine four states, namely,

GEN_CREATED: awaiting execution

GEN_RUNNING: interpreter to execute

GEN_SUSPENDED: pause yield expression at

GEN_CLOSED: execution ends

State may coroutine () function to determine a inspect.getgeneratorstate, look at the following example:

from inspect import getgeneratorstate
from time import sleep
import threading
 
def get_state(coro):
    print("其他线程生成器状态:%s", getgeneratorstate(coro))  # <1>
def simple_coroutine():
    for i in range(3):
        sleep(0.5)
        x = yield i + 1  # <1>
my_coro = simple_coroutine()
print("生成器初始状态:%s" % getgeneratorstate(my_coro))  # <2>
first = next(my_coro)
for i in range(5):
    try:
        my_coro.send(i)
        print("主线程生成器初始状态:%s" % getgeneratorstate(my_coro))  # <3>
        t = threading.Thread(target=get_state, args=(my_coro,))
        t.start()
    except StopIteration:
        print ( "value generator pulling finished")
print ( "last status generator:% s"% getgeneratorstate (my_coro )) # <4>

Results of the:

Generating an initial status: GEN_CREATED
generator Status:% s GEN_SUSPENDED
generating state:% s GEN_SUSPENDED
value generator pulling completion
value generator pulling completion
value generator pulling completed
final status generator: GEN_CLOSED

Before activating coroutines, state coroutine is GEN_CREATED, and after performing the next (), and between () call generator send, I divided the main thread is the caller and multithreading to observe the status of coroutine results states are GEN_SUSPENDED, that is, coroutine in a suspended state, I originally wanted to use multiple threads to capture the state of co-operation process, even if the result is a multi-threaded capture coroutine is GEN_SUSPENDED, and GEN_RUNNING also shows that only with the interpreter running coroutine when the state coroutine is GEN_RUNNING, finally GEN_CLOSED, we pull value after completion of coroutines, the state becomes coroutine execution ends

Example: calculating the average coroutine

We can develop a coroutine, continued to send coroutine value, and let the cumulative value before coroutine and calculate the average, as follows:

from functools import wraps
def coroutine(func):
    @wraps(func)
    def primer(*args, **kwargs):
        gen = func(*args, **kwargs)
        next(gen)
        return gen
    return primer


@coroutine  # <1>
def averager():
    total = .0
    count = 0
    average = None
    while True:
        term = yield average
        total += term
        count += 1
        average = total / count
try:
    coro_avg = averager()
    print(coro_avg.send(10))
    print(coro_avg.send(20))
    print(coro_avg.send(30))
    coro_avg.close()  # <2>
    print(coro_avg.send(40))
except StopIteration:
    print("协程已结束")

operation result:

10.0
15.0
20.0
coroutine has ended

yield from item to item expression objects first thing we do, it is to call iter (item), derive an iterator, therefore, item can be any object that can be iterative, at some point, yield from possible substitute for cycle, so that our code more refined, yield is delegated from the generator, and the main function is the same yield, open two-way channel, the outermost layer of the caller and the actual transmission value generator are connected such that both it can send and output values

And generating a sixth coroutine

generator is not coroutine, in the strict sense speaking, generator called the semi-coroutines. Semi-coroutines and coroutine have much in common, for example, can put their right hand over control, can also yield many times, you can re-enter several times, but the semi-coroutines can not decide to yield control to whom.

To the producer consumer model, for example, understand the difference, take a look at coroutines implementation:


var q := new queue
coroutine produce
    loop
        while q is not full
            create some new items
            add the items to q
        yield to consume
 
coroutine consume
    loop
        while q is not empty
            remove some items from q
            use the items
        yield to produce

 

Can be seen, Produce coroutine can yield control to consume coroutine, and vice versa. That is, coroutines can decide their own control over who give.

Semi coroutine to die, it is only the control back to subroutine (subroutine). But still half Coroutine producers and consumers can realize the above-described model, but this time need a coroutine auxiliary dispatcher (the coroutine Dispatcher)
var Q: = new new Queue
Generator Produce
    Loop
        the while Q IS Not Full
            Create new new some items
            the Add The items to Q
        the yield Consume
Generator Consume
    Loop
        the while Q IS Not empty
            Remove some items from Q
            use The items
        the yield Produce
for the subroutine, Dispatcher
    var D: = new new Dictionary (Generator → Iterator)
    D [Produce]: = Start Produce
    D [Consume]: Start Consume =
    var Current: Produce =
    Loop
        current := next d[current]


Once you understand, in python to achieve this:

queue = []
limit = 9
def producer():
    while True:
        cap = limit - len(queue)
        while cap > 0:
            queue.append(random.randint(0, 100))
            cap -= 1
        print('producer yield back')
        yield 'consumer'
def consumer():
    while True:
        cap = len(queue)
        while cap > 0:
            item = queue.pop(0)
            print(item)
            cap -= 1
        print('consumer yield back')
        yield 'producer'
dic = {
    'producer': producer(),
    'consumer': consumer()
}
current = dic['producer']
times = 8
while times > 0:
    current = dic[next(current)]
    times -= 1

And the seventh multi-threaded coroutines

Coroutine (coroutine), also known as micro-threads, Fibers, is a kind of user-level lightweight threads. Coroutine has its own stack and register context. Coroutine switching time scheduling, the context save registers and stack to another location, waiting for recovery, when switching back, and from the context of the previously saved registers and stack continue to work. Concurrent programming, similar to coroutines and threads, each coroutine represents an execution unit, with its own local data, global data sharing and resource pooling with other coroutine.

Coroutine need to write a single operator scheduling logic, to the CPU, coroutine is single-threaded, so the CPU does not need to consider how the scheduling, context switching, eliminating CPU overhead, so coroutine and to a certain extent, Shang Hao Multithreading.

 

references 

https://www.cnblogs.com/whatisfantasy/p/6440585.html

https://www.cnblogs.com/liangmingshen/p/9706181.html

https://www.cnblogs.com/beiluowuzheng/p/9064152.html

python advanced tutorial (third edition)

 

Published 43 original articles · won praise 28 · views 40000 +

Guess you like

Origin blog.csdn.net/u013380694/article/details/90051730