process
Process is the smallest unit of resource allocation, with independent memory space, including register information, heap, stack, data segment, code segment, virtual memory, file handle, IO status, signal information, etc., the switching overhead of different processes is relatively large , While the process is relatively independent and stable, usually not affected by other processes
Inter-process communication includes pipes, message queues, semaphores, shared memory, sockets, etc.
Thread
Thread is the smallest unit of system scheduling, only need to save its own stack, register information and other small content, a process must have at least one thread, the switching overhead of different threads is much smaller than the process switching, but the thread is not independent and stable enough to be easily affected by the process And the impact of other threads
Because different threads all share the same memory, communication between threads directly uses shared memory, which is to use globally defined variables. In addition, different threads usually need to achieve synchronization, mutual exclusion and other functions through locks.
Coroutine
Both processes and threads are scheduled by the operating system. Although the thread switching overhead is smaller than the process, if it is switched frequently, it will still seriously affect performance
The operating system usually switches in three situations
- The program runs longer
- Preemption of programs with higher priority
- The program is blocked
In many network applications, a large number of requests will be accepted at the same time. The calculation of these requests is very small. The main time is spent on IO, and the most important is the network IO time, which leads to frequent IO blocking and thread switching, which seriously affects performance
The coroutine is to solve the performance problem of the program with IO as the main overhead in high concurrency scenarios
You can run multiple coroutines in a thread. When a coroutine calls a command that requires IO blocking, it will use asynchronous IO to avoid triggering the operating system to switch and then continue to execute another coroutine. Implemented within a thread, switching overhead is very small, performance will be greatly improved
Note that the coroutine is more obvious in the case of a large amount of IO concurrency, because only in this case can we guarantee that asynchronous IO is ready to be executed at any time. If the amount of IO is small, such as a request in 10 minutes, then After doing an asynchronous operation, you still have to wait for the asynchronous IO to be ready, which will still cause thread switching
Note that the coroutine will only switch in one case: IO call
This function needs to be implemented by the program framework, which is transparent to the operating system and transparent to the application program. Development workload
In the Go language, this function is native. The Go language itself implements this function and supports it
at the syntax level. In the Python language, this function is supported by the gevent package.
The following mainly talks about Python coroutines
yield
Yield is for generator use, such as the following code
def f(max):
n = 1
while n <= max:
yield n*n
n = n + 1
for i in f(5):
print(i)
If you do not use yield, then the function f needs to return a list, if max is very large, then you need to create a large memory to put this list, and after using yield, the function is regarded as an iterator, f (5) It returns an iterator. It triggers the iterator every time the for statement takes a value. When the iterator executes the yield command, it returns n * n and stops execution until the next time the for value is taken. The iterator then starts from n = n + 1 Continue execution so that no matter how large max is, the memory usage is constant
Give another example
def f():
n = 1
print("f function with yield inside")
while True:
msg = yield n
print("msg: ", msg)
n = n + 1
iter = f()
print("before invoke next")
print("receive: ", next(iter))
print("after invoke next")
print("receive: ", next(iter))
What is returned is
before invoke next
f function with yield inside
('receive: ', 1)
after invoke next
('msg: ', None)
('receive: ', 2)
It can be seen that when iter = f () is called, no information is printed out, that is, the f () function is not actually executed, but an iterator is returned, when the next (iter) function is executed (next is a built-in function of Python) , The f () function is executed, and here it only executes until yield n, and then stops executing and returns n as a result (here even the assignment of msg is not executed, which will be discussed further later), when the next next function, It will continue to execute from the assignment of msg until it meets the yield again. If the iterator has been executed, the next function will report a StopIteration exception
Continue to the next example
def f():
n = 1
print("f function with yield inside")
while True:
msg = yield n
print("msg: ", msg)
n = n + 1
iter = f()
print("before invoke next")
print("receive: ", next(iter))
print("after invoke next")
print("receive: ", iter.send("from outside"))
Here the second next is replaced with the send function that calls the iterator
returns
before invoke next
f function with yield inside
('receive: ', 1)
after invoke next
('msg: ', 'from outside')
('receive: ', 2)
The only difference from the previous example is that the printed msg is not None but the parameter of the send function. Like the next, the send function triggers the iterator to continue execution, but at the same time the parameter is assigned to msg as the result of the yield statement.
The following uses yield to simulate coroutines
def f_0():
n = 5
while n >= 0:
print('[f_0] ' + str(n))
yield
n = n - 1
def f_1():
m = 3
while m >= 0:
print('[f_1] ' + str(m))
yield
m = m - 1
iter_list = [f_0(), f_1()]
while True:
for it in iter_list:
try:
next(it)
except:
iter_list.remove(it)
if len(iter_list) == 0:
break
The result is
[f_0] 5
[f_1] 3
[f_0] 4
[f_1] 2
[f_0] 3
[f_1] 1
[f_0] 2
[f_1] 0
[f_0] 1
[f_0] 0
It can be seen that the function of continuously switching between the two functions is realized, but the code is cumbersome to write
greenlet
greenlet is a C extension library that implements native coroutines at the bottom
from greenlet import greenlet
def f_0():
n = 5
while n >= 0:
print('[f_0] ' + str(n))
parent_greenlet.switch()
n = n - 1
def f_1():
m = 3
while m >= 0:
print('[f_1] ' + str(m))
parent_greenlet.switch()
m = m - 1
def parent():
while True:
for task in greenlet_list:
task.switch()
if task.dead:
greenlet_list.remove(task)
if len(greenlet_list) == 0:
break
parent_greenlet = greenlet(parent)
greenlet_list = [greenlet(f_0, parent_greenlet), greenlet(f_1, parent_greenlet)]
parent_greenlet.switch()
return
[f_0] 5
[f_1] 3
[f_0] 4
[f_1] 2
[f_0] 3
[f_1] 1
[f_0] 2
[f_1] 0
[f_0] 1
[f_0] 0
The switch can also be passed by value, which will be passed to the function parameter according to the running status of the program, or to the return of the switch
def test1(x, y):
z = gr2.switch(x+y)
print(z)
def test2(u):
print(u)
gr1.switch(42)
print "end"
gr1 = greenlet(test1)
gr2 = greenlet(test2)
gr1.switch("hello", " world")
return
hello world
42
It can be seen that end is not printed because parent is not specified. By default, one end returns to main, and the other will not be executed. If parent is specified, parent will be returned after the end
peddled
Greenlet is also more complicated to write, and greenlet only implements coroutines, but does not implement the function of capturing IO operations and switching. In fact, general calculations do not require the switching of coroutines, and the performance has no effect, only in high concurrent IO operations When the program can be switched, its performance will be greatly improved
gevent is based on greenlet and uses many optimization measures including the epoll event monitoring mechanism of linux to improve the performance of high concurrent IO. For example, when a greenlet program needs to do network IO operations, it is registered as asynchronous monitoring and switching Go to other greenlet programs, wait for IO to complete, and then switch back to continue execution when appropriate, so that when the IO is very high, you can keep the program running, instead of spending time on IO waiting, while avoiding threading Switching overhead
import gevent
def f_0(param):
n = param
while n >= 0:
print('[f_0] ' + str(n))
gevent.sleep(0.1)
n = n - 1
def f_1(param):
m = param
while m >= 0:
print('[f_1] ' + str(m))
gevent.sleep(0.1)
m = m - 1
g1 = gevent.spawn(f_0, 5)
g2 = gevent.spawn(f_1, 3)
gevent.joinall([g1, g2])
return
[f_0] 5
[f_1] 3
[f_0] 4
[f_1] 2
[f_0] 3
[f_1] 1
[f_0] 2
[f_1] 0
[f_0] 1
[f_0] 0
It can be seen that the code is concise and clear. Compared with the normal program, it is to replace time.sleep () with gevent.sleep (). It is that gevent can do coroutine switching where it needs to block.
It can actually be simpler
import time
import gevent
from gevent import monkey
monkey.patch_all()
def f_0(param):
n = param
while n >= 0:
print('[f_0] ' + str(n))
time.sleep(0.1)
n = n - 1
def f_1(param):
m = param
while m >= 0:
print('[f_1] ' + str(m))
time.sleep(0.1)
m = m - 1
g1 = gevent.spawn(f_0, 5)
g2 = gevent.spawn(f_1, 3)
gevent.joinall([g1, g2])
Patching with monkey.patch_all () can intercept a large number of IO operations, such as time sleep, http request, etc., execute them asynchronously, and switch coroutines. This approach allows the original function to be used directly without modification. For developers, coroutines are transparent, no need to modify the code specifically, just leave it to gevent to take care of it
asyncio
Python 3.6 officially introduced the asyncio library as the Python standard library
The most important are the async and await keywords
async is used to declare a function as asynchronous and can be suspended
await is used to declare that the program is suspended. Await can only be followed by an asynchronous program or an object with __await__ attribute
import asyncio
import aiohttp
async def f_0(param):
n = param
while n >= 0:
print('[f_0] ' + str(n))
await asyncio.sleep(0.1)
n = n - 1
async def f_1(param):
m = param
while m >= 0:
print('[f_1] ' + str(m))
await asyncio.sleep(0.1)
m = m - 1
loop = asyncio.get_event_loop()
tasks = [
f_0(5),
f_1(3)
]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
return
[f_0] 5
[f_1] 3
[f_0] 4
[f_1] 2
[f_0] 3
[f_1] 1
[f_0] 2
[f_1] 0
[f_0] 1
[f_0] 0
another example
import asyncio
import aiohttp
async def request(session, url):
async with session.get(url) as response:
return await response.read()
async def fetch(url):
await asyncio.sleep(1)
async with aiohttp.ClientSession() as session:
html = await request(session, url)
print(html)
url_list = [
"http://www.qq.com",
"http://www.jianshu.com",
"http://www.cnblogs.com"
]
tasks = [fetch(url) for url in url_list]
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
You can see that you need to add async to support asynchronous calls, and use await to specify the place where
it is suspended. If the code specified by await cannot be suspended, an error will occur
and a specific asynchronous method or class needs to be used.
In comparison, gevent can be transparent to the program.
A normal synchronous program can be achieved asynchronously through gevent without any modification.
However, gevent uses the three-party package, and asyncio is the Python standard library, which provides support at the syntax level.