IO model of python concurrent programming

First, let's talk about the objects and steps involved when IO occurs. Taking read as an example, it will go through two stages:

1) Wait for the data to be ready

2) Copy data from the kernel to the process

 

Second, blocking Io (blocking IO)

In Linux, all sockets are blocked by default. A typical read operation flow is as follows:

Therefore, the feature of blocking IO is that both stages of IO execution (waiting for data and copying data) are blocked (blocked).

    The network programming that almost all programmers encounter for the first time starts with interfaces such as listen(), send(), recv(), etc. It is very convenient to build a server/client model by using these interfaces. However, most socket interfaces are blocking. As shown below

    ps: The so-called blocking interface refers to a system call (usually an IO interface) that does not return the call result and keeps the current thread blocked. It only returns when the system call obtains the result or the timeout error occurs.

So a simple solution:

# Use multithreading (or multiprocessing) on ​​the server side. The purpose of multithreading (or multiprocessing) is to allow each connection to have its own thread (or process), so that the blocking of any one connection will not affect other connections.

The problem with this program is:

#Enable multi-process or multi-threading method, when you encounter hundreds or thousands of connection requests at the same time, no matter multi-threading or multi-process will seriously occupy system resources, reduce the efficiency of the system's response to the outside world, and the threads and processes themselves It is also easier to enter suspended animation.

improve proposals:

#Many programmers may consider using "thread pool" or "connection pool". The "thread pool" is designed to reduce the frequency of thread creation and destruction, maintain a reasonable number of threads, and let idle threads resume new execution tasks. The "connection pool" maintains a buffer pool of connections, reuses existing connections as much as possible, and reduces the frequency of creating and closing connections. Both of these two technologies can reduce system overhead very well, and are widely used in many large systems, such as websphere, tomcat and various databases.

There are actually problems with the improvement plan:

# The "thread pool" and "connection pool" technologies only alleviate the resource occupation caused by frequent calls to the IO interface to a certain extent. Moreover, the so-called "pool" always has its upper limit. When the request greatly exceeds the upper limit, the response of the system composed of the "pool" to the outside world is not much better than when there is no pool. Therefore, the use of "pool" must consider the size of the response it faces, and adjust the size of the "pool" according to the size of the response.

    Corresponding to the thousands or even tens of thousands of client requests that may appear at the same time in the above example, "thread pool" or "connection pool" may relieve some of the pressure, but it cannot solve all problems. In short, the multi-threaded model can easily and efficiently solve small-scale service requests, but in the face of large-scale service requests, the multi-threaded model will also encounter bottlenecks. Non-blocking interfaces can be used to try to solve this problem.

 

Three non-blocking IO (non-blocking IO)

Under Linux, you can change the socket period to non-blocking by setting the socket period. When a read operation is performed on a non-blocking socket, the process is as follows;

    As can be seen from the figure, when the user process issues a read operation, if the data in the kernel is not ready, it does not block the user process, but returns an error immediately. From the perspective of the user process, after it initiates a read operation, it does not need to wait, but gets a result immediately. When the user process judges that the result is an error, it knows that the data is not ready, so the user can do other things within the time interval from this time to the next time when the read query is issued, or directly send the read operation again. Once the data in the kernel is ready and the system call of the user process is received again, it immediately copies the data to the user memory (it is still blocked at this stage) and returns.

    That is to say, after the non-blocking recvform system call is called, the process is not blocked, and the kernel returns to the process immediately. If the data is not ready, an error will be returned at this time. After the process returns, it can do something else and then issue the recvform system call. Repeat the above process, repeating the recvform system call. This process is often referred to as polling. Polling checks the kernel data until the data is ready, and then copies the data to the process for data processing. It should be noted that during the entire process of copying data, the process is still in a blocked state.

    Therefore, in non-blocking IO, the user process actually needs to continuously and actively ask whether the kernel data is ready.

 

#Server

from socket import *
import   time

s =socket()
s.bind(('127.0.0.1',8080))
s.listen(5)
s.setblocking(False)

r_list=[]
w_list=[]
while True:
    try:
        conn,addr= s.accept()
        r_list.append(conn)
    except BlockingIOError:
        print( ' Can go to other work ' )
        print('rlist:',len(r_list))

        del_rlist=[]
        for conn in r_list:
            try:
                data =conn.recv(1024)
                if not data:
                    conn.close()
                    del_rlist.append(conn)
                    continue
                w_list.append((conn,data.upper()))
            except BlockingIOError:
                continue
            except ConnectionResetError:
                conn.close()
                del_rlist.append(conn)

        del_wlist=[]
        for item in w_list:
            try:
                conn=item[0]
                res=item[1]
                conn.send(res)
                del_wlist.append(item)
            except BlockingIOError:
                continue
            except ConnectionResetError:
                conn.close()
                del_wlist.append(item)


        for conn in del_rlist:
            r_list.remove(conn)

        for  item in del_wlist:
            w_list.remove(item)



#client

from socket import *
import them
client =socket()
client.connect(('127.0.0.1',8080))

while True:
    data='%s say hello'%os.getpid()
    client.send(data.encode('utf-8'))
    res=client.recv(1024)
    print(res.decode('utf-8'))

 

But the non-blocking IO model is by no means recommended.

We can't deny its advantages: can do other work while waiting for the task to complete (including submitting other tasks, that is, 'background' can have multiple tasks executing 'simultaneously')

But it can't hide its shortcomings:

1 : The cyclic call to recv() will greatly mention the CPU usage, which is why we leave a sentence time.sleep( 2 )de in the code, otherwise it is very easy to get stuck on a low-profile host. .
2 : . The response latency of task completion increases, because the read operation is only polled every once in a while, and the task may be completed at any time between two polls, which reduces the overall data throughput.

Multiplexing IO (IOmultiplexing)

 IO multiplexing is also called select/epoll, and its advantage is that a single process can process io for multiple network connections at the same time.

The basic principle is select/epoll. This function will continuously poll all sockets, and notify the user process when data arrives in a certain socket. The special process is shown in the figure:

When the user process calls select, the entire process of name will be blocked, and at the same time, select will detect all the sockets it is responsible for. When the data in any socket is ready, select will return. At this time, the user process is calling the read operation to copy the data from the kernel to the user process.

emphasize:

1. If the number of connections processed is not very high, the performance of the web server using selec/epoll is not necessarily better than that of the web server using multi_threading+blocking IO, and it may be delayed.

even bigger. The advantage of select/epool is not that it can handle a single connection faster, but that it can handle more connections.

 2. In the multiplexing model, for each socket, it is generally set to non-blocking, but if the entire user's process is actually blocked consistently, the value is but the process is blocked by the select function instead of being blocked. socket io to block.

in conclusion:

The advantage of select is that it can handle multiple connections, not applicable to a single connection

#Server
 from socket import * 
import   select

s=socket
s.bind(('127.0.0.1',8080))
s.listen(5)
s.setblocking(False)

r_list=[s,]
w_list=[]
w_data={}
while True:
    print( ' Detected r_list: ' ,len(r_list))
    print( ' Detected w_list: ' ,len(w_list))
    rl,wl,xl=select.select(r_list,w_list,[],)

    for r in rl:
        if r==s:
            conn,addr=r.accept()
            r_list.append(conn)
        else:
            try:
                data = r.recv(1024)
                if not data:
                    r.close()
                    r_list.remove(r)
                    continue

                w_list.append(r)
                w_data[r]=data.upper()
            except ConnectionResetError:
                r.close()
                r_list.remove(r)
                continue
    for w in wl:
        w.send(w_data[w])
        w_list.remove(w)
        w_data.pop(w)




#client



from socket import  *
import them,

client=socket()
client.connect(('127.0.0.1',8080))
while True:
    data ="%s say hello "%os.getpid()
    client.send(data.encode('utf-8'))
    res=client.recv(1024)
    print(res.deccode('utf-8'))

Advantages of this model:

Compared with other models, the event-driven model using select() is executed with a single thread (process), which occupies less resources, does not consume too much CPU, and can provide services for multiple clients at the same time. If the view builds a simple event-driven Server program, this model has certain reference value.

Disadvantages of this model

First of all the select() interface is not the best choice for 'event driven' in advance. Because when the handle value to be detected is large, the select() interface itself needs to consume a lot of time to poll each handle. Many operating systems provide a more efficient interface, such as linuxt provides epoll. . etc. If you need to implement a more efficient server program, an interface like epoll is more recommended. Unfortunately, the epoll interface provided by different operating systems is very different. Therefore, it will be more difficult to use an interface similar to epoll to implement a server with better cross-platform capabilities according to Europe.

Second, the model mixes incident detection and incident response. Once the executive body of incident response is huge, it will be catastrophic to the entire model.

 

 Asynchronous IO (Asynchronous I/O)

Asynchronous IO under Linux is not very useful. It was introduced from the kernel version 2.6. Let's look at his process first:

 

 After the user process initiates the read operation, it can start doing other things immediately. On the other hand, from the perspective of the kernel, when it receives an asynchronous read, it will return immediately, so it will not generate any block for the user process. Then, the kernel will wait for the data preparation to complete, and then copy the data to user memory. When all this is done, the kernel will send a signal to the user process, telling it that the read operation is complete.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325190986&siteId=291194637