The use of select module under python and the difference between epoll and select and poll
Let's talk about the difference between epoll and select and poll (summary)
select, poll, epoll are all specific implementations of I/O multiplexing. The reason why these three ghosts exist is that they appear in order.
After the concept of I/O multiplexing was proposed, select was the first implementation (implemented in BSD around 1983).
select
After select was implemented, many problems were quickly exposed.
- select will modify the incoming parameter array, which is very unfriendly to a function that needs to be called many times.
- Every time you call select, you need to copy the fd set from user mode to kernel mode. This overhead will be very large when there are many fds.
- select If any sock (I/O stream) has data, select will only return, but it will not tell you that there is data on that sock, so you can only find one by one yourself.) Each time you call select, you need to In the kernel traversal of all fds passed in, this overhead is also large when there are many fds
- select can only monitor 1024 links, this has nothing to do with grass, linux is defined in the header file, see FD_SETSIZE.
- Select is not thread-safe. If you add a sock to select, and suddenly another thread finds out, Nima, this sock is not used and needs to be withdrawn. Sorry, this select is not supported. If you turn off this sock in a frenzy, the standard behavior of select is. . Uh. . Unpredictable,
So 14 years later (1997) a group of people implemented poll again, and poll fixed many problems with select
poll
- poll removes the limit of 1024 links, so how many links do you need, just as long as you are happy, master.
- From the design point of view, poll does not modify the incoming array anymore, but this depends on your platform, so it is better to be careful when walking in the rivers and lakes.
In fact, the delay of 14 years is not a problem of efficiency, but the hardware of that era is too weak, a server processing more than 1,000 links is like a god, and select has met the needs for a long time.
But poll is still not thread-safe, which means that no matter how powerful the server is, you can only handle one set of I/O streams in one thread. You can of course cooperate with multiple processes, but then you have all kinds of problems with multiple processes.
So 5 years later, in 2002, the great god Davide Libenzi realized epoll.
epoll
epoll can be said to be the latest implementation of I/O multiplexing. epoll fixes most of the problems of poll and select, such as:
- For every time you need to copy FD from user mode to kernel mode, epoll's solution is in the epoll_ctl function. Every time a new event is registered to the epoll handle (specify EPOLL_CTL_ADD in epoll_ctl), all fds will be copied into the kernel, rather than repeated during epoll_wait. epoll guarantees that each fd will only be copied once during the entire process.
- Similarly, epoll does not have a connection limit of 1024.
- epoll is now thread safe.
- epoll now not only tells you the data in the sock group, but also tells you which sock has data, so you don't have to find it yourself.
- The solution of epoll is not to add current to the device waiting queue corresponding to fd in turn every time like select or poll, but only hang current once during epoll_ctl (this time is essential) and specify a callback function for each fd , when the device is ready and wakes up the waiters on the waiting queue, this callback function will be called, and this callback function will add the ready fd to a ready list). The job of epoll_wait is actually to check whether there is a ready fd in this ready list (use schedule_timeout() to sleep for a while and judge the effect for a while, which is similar to step 7 in the select implementation).
Summarize
(1) The select, poll implementation needs to continuously poll all fd sets by itself until the device is ready, during which it may alternate between sleeping and waking up multiple times. In fact, epoll also needs to call epoll_wait to continuously poll the ready list, during which it may alternate between sleep and wake up many times, but when the device is ready, call the callback function, put the ready fd into the ready list, and wake up and go to sleep in epoll_wait process. Although both sleep and alternate, select and poll need to traverse the entire fd collection when "awake", and epoll only needs to judge whether the ready list is empty when "awake", which saves a lot of CPU. time. This is the performance improvement brought by the callback mechanism.
(2) Select, poll each call must copy the fd set from user mode to kernel mode once, and hang current to the device waiting queue once, while epoll only needs to copy once, and hang current to the waiting queue. Only hang once (at the beginning of epoll_wait, note that the waiting queue here is not a device waiting queue, but a waiting queue defined inside epoll). This can also save a lot of costs.
An example of select under python is
transferred from http://www.cnblogs.com/coser/archive/2012/01/06/2315216.html
server
import select import socket import Queue #create a socket server = socket.socket (socket.AF_INET, socket.SOCK_STREAM) server.setblocking(False) #set option reused server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR , 1) server_address= ('192.168.1.102',10001) server.bind(server_address) server.listen(10) #sockets from which we except to read inputs = [server] #sockets from which we expect to write outputs = [] #Outgoing message queues (socket:Queue) message_queues = {} #A optional parameter for select is TIMEOUT timeout = 20 while inputs: print "waiting for next event" readable , writable , exceptional = select.select(inputs, outputs, inputs, timeout) # When timeout reached , select return three empty lists if not (readable or writable or exceptional) : print "Time out ! " break; for s in readable : if s is server: # A "readable" socket is ready to accept a connection connection, client_address = s.accept() print " connection from ", client_address connection.setblocking(0) inputs.append(connection) message_queues[connection] = Queue.Queue() else: data = s.recv(1024) if data : print " received " , data , "from ",s.getpeername() message_queues[s].put(data) # Add output channel for response if s not in outputs: outputs.append(s) else: #Interpret empty result as closed connection print " closing", client_address if s in outputs : outputs.remove(s) inputs.remove(s) s.close() #remove message queue del message_queues[s] for s in writable: try: next_msg = message_queues[s].get_nowait() except Queue.Empty: print " " , s.getpeername() , 'queue empty' outputs.remove(s) else: print " sending " , next_msg , " to ", s.getpeername() s.send(next_msg) for s in exceptional: print " exception condition on ", s.getpeername() #stop listening for input on the connection inputs.remove(s) if s in outputs: outputs.remove(s) s.close() #Remove message queue del message_queues[s]
client
import socket messages = ["This is the message" , "It will be sent" , "in parts "] print "Connect to the server" server_address = ("192.168.1.102",10001) #Create a TCP/IP sock socks = [] for i in range(10): socks.append (socket.socket (socket.AF_INET, socket.SOCK_STREAM)) for s in socks: s.connect(server_address) counter = 0 for message in messages : #Sending message from different sockets for s in socks: counter+=1 print " %s sending %s" % (s.getpeername(),message+" version "+str(counter)) s.send(message+" version "+str(counter)) #Read responses on both sockets for s in socks: data = s.recv(1024) print " %s received %s" % (s.getpeername(),data) if not data: print "closing socket ",s.getpeername() s.close()
poll server
import socket import select import Queue # Create a TCP/IP socket, and then bind and listen server = socket.socket (socket.AF_INET, socket.SOCK_STREAM) server.setblocking(False) server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) server_address = ("192.168.1.102", 10001) print "Starting up on %s port %s" % server_address server.bind(server_address) server.listen(5) message_queues = {} #The timeout value is represented in milliseconds, instead of seconds. timeout = 1000 # Create a limit for the event READ_ONLY = ( select.POLLIN | select.POLLPRI | select.POLLHUP | select.POLLERR) READ_WRITE = (READ_ONLY|select.POLLOUT) # Set up the poller poller = select.poll() poller.register(server,READ_ONLY) #Map file descriptors to socket objects fd_to_socket = {server.fileno():server,} while True: print "Waiting for the next event" events = poller.poll(timeout) print "*"*20 print len(events) print events print "*"*20 for fd ,flag in events: s = fd_to_socket[fd] if flag & (select.POLLIN | select.POLLPRI) : if s is server : # A readable socket is ready to accept a connection connection , client_address = s.accept() print " Connection " , client_address connection.setblocking(False) fd_to_socket[connection.fileno()] = connection poller.register(connection,READ_ONLY) #Give the connection a queue to send data message_queues[connection] = Queue.Queue() else : data = s.recv(1024) if data: # A readable client socket has data print " received %s from %s " % (data, s.getpeername()) message_queues[s].put(data) poller.modify(s,READ_WRITE) else : # Close the connection print " closing" , s.getpeername() # Stop listening for input on the connection poller.unregister(s) s.close() del message_queues[s] elif flag & select.POLLHUP : #A client that "hang up" , to be closed. print " Closing ", s.getpeername() ,"(HUP)" poller.unregister(s) s.close() elif flag & select.POLLOUT : #Socket is ready to send data , if there is any to send try: next_msg = message_queues[s].get_nowait() except Queue.Empty: # No messages waiting so stop checking print s.getpeername() , " queue empty" poller.modify(s,READ_ONLY) else : print " sending %s to %s" % (next_msg , s.getpeername()) s.send(next_msg) elif flag & select.POLLERR: #Any events with POLLERR cause the server to close the socket print " exception on" , s.getpeername() poller.unregister(s) s.close() del message_queues[s]