Principles of listen() and accept() in TCP three-way handshake

Table of contents

listen() queue analysis

accept() function

Blocking vs non-blocking I/O

Synchronous and asynchronous I/O

listen() queue analysis

  • listen(): Listening port, used for server-side roles in TCP connections.
  • listen() function call format: 
int listen(int sockfd, int backlog)
    
    
     
     
  • To understand the parameter of backlog, we need to talk about the topic of "listening socket queue"

[Listening socket queue]

  • For a socket that calls listen() to listen, the operating system will maintain two queues for this socket

[Unfinished connection queue (for saving connections)]

  • When the client sends the first [syn packet] of the three-way handshake of the tcp connection to the server, the server will create an item corresponding to the syn packet in the incomplete queue .
  • In fact, we can regard this as a semi-connection [because the connection has not been established yet], the state of this semi-connection will change from LISTEN to SYN_RCVD state , and at the same time return the second handshake packet to the client [syn,ack ] At this time, the server is actually waiting for the completion of the third handshake.

[Completed connection queue (for saving connections)]

  • When the third handshake is completed, the connection becomes ESTABLISHED, and each client that has completed the three-way handshake is placed in this queue as an item.
  • The meaning of backlog : the sum of the entries in the completed queue and the unfinished queue cannot exceed the backlog
  • When does the client connect() return? In fact, it returns after receiving the second handshake packet of the three-way handshake (that is, receiving the syn/ack sent back by the server).
  • RTT is the time that any item in the outstanding queue remains in the outstanding queue, and this time depends on the client and the server
    • For the client, this RTT time is the sum of the first and second handshake times
    • For the server, this RTT time is actually the combined time of the second and third handshakes
  • If the transmission speed of the three handshake packets is very fast, the connection can be established in about 187 milliseconds. This time is very slow, so I feel that the cost of establishing a TCP connection is quite high.
  • If a malicious client delays sending the third packet of the three-way handshake. Then this connection cannot be established, then this item in SYN_RCVD [in the unfinished queue on the server side] will stay in the unfinished queue of the server consistently, and the staying time is about 75 seconds. If it exceeds this time, This item will be killed by the operating system.

accept() function

  • The accept() function is used to take out an item [each item is a TCP connection that has completed the three-way handshake] from the position of the head of the queue in the completed connection queue [team head], and return it to the process.
  • What if the completed connection queue is empty?
    • Then accept() will be stuck here [sleeping] waiting, and will not be woken up until there is an item in the completed queue.
  • From a programming point of view, we need to use accept() to remove the data [TCP connection] in the completed queue as soon as possible.
  • accept() returns a socket, which represents the tcp connection that has been established with the three-way handshake, because accept() is the data taken from the completed queue.
  • The server program must strictly distinguish between two sockets:
    • a) Listen to the socket on port 9000, this thing is called "listen socket [listenfd]", as long as the server program is running, this socket should always exist .
    • b) When the client connects in, the operating system will create a socket for each client that successfully establishes a three-way handshake [of course, a connected socket], and accept() returns this kind of socket . That is, an item taken from the completed connection queue. Subsequently, the server uses the socket returned by accept() to communicate with the client.
  • thinking questions
  • (1) If the sum of the two queues [completed connection queue, and unfinished connection queue] reaches the second parameter specified by listen(), that is to say, the queue is full, at this time, another client sends a syn request , how does the server respond?
    •  In fact, the server will ignore the syn and will not respond to the client side. If it finds that the syn has not responded, it will resend the syn packet after a while.
  • (2) There is a time difference between when the connection is thrown into the completed queue and when accept() takes the connection out of the completed queue. If you have not waited for accept() to remove the connection from the completed queue When the connection is removed, if the client sends data, how will the data be processed?
    • This data will be saved in the receiving buffer of the connected socket, how big is this buffer, and the maximum amount of data that can be received.

[syn attack (syn flood)]

  • A typical malicious behavior involving weaknesses in the TCP/IP protocol.
  • Denial of service attack (DOS/DDOS).
  • backlog: Further clarifies and specifies the maximum number of completed connections that the kernel queues on a given socket [the maximum number of entries in the completed connection queue].
  • When you write code, use accept() to remove the connection in the completed queue as soon as possible, and leave it free as soon as possible for subsequent entries that have completed the three-way handshake. Then the completed queue will generally not be full. Generally, this backlog The value gives around 300.

Blocking vs non-blocking I/O

  • Blocking and non-blocking mainly refer to whether this function will cause our process to enter the sleep() [stuck here sleeping] state when calling a certain system function .

【Blocking I/O】

  • Call a function, this function is stuck here, the whole program flow does not go down [sleep], the function is stuck here waiting for something to happen, only when this thing happens, this function will go down . .
  • For example, blocking functions accept(), recvfrom().
  • This kind of blocking is not good, and the efficiency is very low. Generally, we don't use blocking methods to write server programs.

【Non-blocking I/O】

  • It will not get stuck, make full use of time slices, and perform better.
  • Two distinctive features of non-blocking mode:
    • (1) Constantly call the accept() and recvfrom() functions to check if there is any data coming. If not, the function will return a special error flag to tell you that this flag may be EWULDBLOCK or EAGAIN if the data is not Arrival, then there is a chance to execute other functions here, but you have to call accept(), recvfrom() again to check whether the data has arrived, which is very tiring.
    • (2) If the data arrives, you have to get stuck here to copy the data from the kernel buffer to the user buffer, so the copying stage is stuck and completed.

Synchronous and asynchronous I/O

 

[Different step I/O]

  • When calling an asynchronous I/O function, a receive buffer must be specified for the function, and a callback function must also be given.
  • After calling an asynchronous I/O function, the function returns immediately. The rest of the judgment is handed over to the operating system. The operating system will judge whether the data has arrived. If the data has arrived, the operating system will copy the data to the buffer provided by you, and then call the callback function you specified to notify you.
  • It's easy to tell the difference between non-blocking and asynchronous I/O:
    • (1) Non-blocking I/O needs to call the I/O function continuously to check whether the data is coming. If the data comes, it has to be stuck in the I/O function to copy the data from the kernel buffer to the user buffer, and then This function can only return
    • (2) Asynchronous I/O does not need to call the I/O function continuously to check whether the data has arrived, it only needs to be called once, and then you can do other things
  •  The kernel judges the arrival of data, copies the data to the buffer provided by you, and calls your callback function to notify you that you are not stuck there

[Synchronous I/O]

  • Such as select/poll, epoll.
  • 1) Call select() to judge whether there is data, if there is data, go down, there is no data stuck there.
  • 2) After select() returns, use recvfrom() to fetch data. Of course, it will get stuck when fetching data.
  • Synchronous I/O feels more troublesome. You have to call two functions to get the data, but the ratio of synchronous I/O and blocking I/O is the so-called I/O multiplexing [the advantage of using two functions to receive data ] ability

【I/O Multiplexing】

  • The so-called I/O multiplexing means that my multiple sockets [multiple TCP connections] can be bundled [a pile], and I can use the synchronous I/O function such as select to wait for data here.
  • The ability of select() is to wait for any one of the multiple TCP connections to have data, and then which TCP has data, and then I will use specific methods such as recvfrom() to receive it.
  • Therefore, this ability to call a function to determine whether a bunch of TCP connections come to data is called I/O multiplexing, English I/O multiplexing [I/O multiplexing]
  • Many materials put blocking I/O, non-blocking I/O, and synchronous I/O into one category, because they more or less have blocking behaviors, and some materials even directly refer to blocking I/O, non-blocking I/O all boils down to a synchronous I/O model, which is fine too.
  • Asynchronous I/O is classified as a separate category, because asynchronous I/O is truly non-blocking .

Table of contents

listen() queue analysis

accept() function

Blocking vs non-blocking I/O

Synchronous and asynchronous I/O

listen() queue analysis

  • listen(): Listening port, used for server-side roles in TCP connections.
  • listen() function call format: 
int listen(int sockfd, int backlog)
    
    
   
   
  • To understand the parameter of backlog, we need to talk about the topic of "listening socket queue"

[Listening socket queue]

  • For a socket that calls listen() to listen, the operating system will maintain two queues for this socket

[Unfinished connection queue (for saving connections)]

  • When the client sends the first [syn packet] of the three-way handshake of the tcp connection to the server, the server will create an item corresponding to the syn packet in the incomplete queue .
  • In fact, we can regard this as a semi-connection [because the connection has not been established yet], the state of this semi-connection will change from LISTEN to SYN_RCVD state , and at the same time return the second handshake packet to the client [syn,ack ] At this time, the server is actually waiting for the completion of the third handshake.

[Completed connection queue (for saving connections)]

  • When the third handshake is completed, the connection becomes ESTABLISHED, and each client that has completed the three-way handshake is placed in this queue as an item.
  • The meaning of backlog : the sum of the entries in the completed queue and the unfinished queue cannot exceed the backlog
  • When does the client connect() return? In fact, it returns after receiving the second handshake packet of the three-way handshake (that is, receiving the syn/ack sent back by the server).
  • RTT is the time that any item in the outstanding queue remains in the outstanding queue, and this time depends on the client and the server
    • For the client, this RTT time is the sum of the first and second handshake times
    • For the server, this RTT time is actually the combined time of the second and third handshakes
  • If the transmission speed of the three handshake packets is very fast, the connection can be established in about 187 milliseconds. This time is very slow, so I feel that the cost of establishing a TCP connection is quite high.
  • If a malicious client delays sending the third packet of the three-way handshake. Then this connection cannot be established, then this item in SYN_RCVD [in the unfinished queue on the server side] will stay in the unfinished queue of the server consistently, and the staying time is about 75 seconds. If it exceeds this time, This item will be killed by the operating system.

accept() function

  • The accept() function is used to take out an item [each item is a TCP connection that has completed the three-way handshake] from the position of the head of the queue in the completed connection queue [team head], and return it to the process.
  • What if the completed connection queue is empty?
    • Then accept() will be stuck here [sleeping] waiting, and will not be woken up until there is an item in the completed queue.
  • From a programming point of view, we need to use accept() to remove the data [TCP connection] in the completed queue as soon as possible.
  • accept() returns a socket, which represents the tcp connection that has been established with the three-way handshake, because accept() is the data taken from the completed queue.
  • The server program must strictly distinguish between two sockets:
    • a) Listen to the socket on port 9000, this thing is called "listen socket [listenfd]", as long as the server program is running, this socket should always exist .
    • b) When the client connects in, the operating system will create a socket for each client that successfully establishes a three-way handshake [of course, a connected socket], and accept() returns this kind of socket . That is, an item taken from the completed connection queue. Subsequently, the server uses the socket returned by accept() to communicate with the client.
  • thinking questions
  • (1) If the sum of the two queues [completed connection queue, and unfinished connection queue] reaches the second parameter specified by listen(), that is to say, the queue is full, at this time, another client sends a syn request , how does the server respond?
    •  In fact, the server will ignore the syn and will not respond to the client side. If it finds that the syn has not responded, it will resend the syn packet after a while.
  • (2) There is a time difference between when the connection is thrown into the completed queue and when accept() takes the connection out of the completed queue. If you have not waited for accept() to remove the connection from the completed queue When the connection is removed, if the client sends data, how will the data be processed?
    • This data will be saved in the receiving buffer of the connected socket, how big is this buffer, and the maximum amount of data that can be received.

[syn attack (syn flood)]

  • A typical malicious behavior involving weaknesses in the TCP/IP protocol.
  • Denial of service attack (DOS/DDOS).
  • backlog: Further clarifies and specifies the maximum number of completed connections that the kernel queues on a given socket [the maximum number of entries in the completed connection queue].
  • When you write code, use accept() to remove the connection in the completed queue as soon as possible, and leave it free as soon as possible for subsequent entries that have completed the three-way handshake. Then the completed queue will generally not be full. Generally, this backlog The value gives around 300.

Blocking vs non-blocking I/O

  • Blocking and non-blocking mainly refer to whether this function will cause our process to enter the sleep() [stuck here sleeping] state when calling a certain system function .

【Blocking I/O】

  • Call a function, this function is stuck here, the whole program flow does not go down [sleep], the function is stuck here waiting for something to happen, only when this thing happens, this function will go down . .
  • For example, blocking functions accept(), recvfrom().
  • This kind of blocking is not good, and the efficiency is very low. Generally, we don't use blocking methods to write server programs.

【Non-blocking I/O】

  • It will not get stuck, make full use of time slices, and perform better.
  • Two distinctive features of non-blocking mode:
    • (1) Constantly call the accept() and recvfrom() functions to check if there is any data coming. If not, the function will return a special error flag to tell you that this flag may be EWULDBLOCK or EAGAIN if the data is not Arrival, then there is a chance to execute other functions here, but you have to call accept(), recvfrom() again to check whether the data has arrived, which is very tiring.
    • (2) If the data arrives, you have to get stuck here to copy the data from the kernel buffer to the user buffer, so the copying stage is stuck and completed.

Synchronous and asynchronous I/O

 

[Different step I/O]

  • When calling an asynchronous I/O function, a receive buffer must be specified for the function, and a callback function must also be given.
  • After calling an asynchronous I/O function, the function returns immediately. The rest of the judgment is handed over to the operating system. The operating system will judge whether the data has arrived. If the data has arrived, the operating system will copy the data to the buffer provided by you, and then call the callback function you specified to notify you.
  • It's easy to tell the difference between non-blocking and asynchronous I/O:
    • (1) Non-blocking I/O needs to call the I/O function continuously to check whether the data is coming. If the data comes, it has to be stuck in the I/O function to copy the data from the kernel buffer to the user buffer, and then This function can only return
    • (2) Asynchronous I/O does not need to call the I/O function continuously to check whether the data has arrived, it only needs to be called once, and then you can do other things
  •  The kernel judges the arrival of data, copies the data to the buffer provided by you, and calls your callback function to notify you that you are not stuck there

[Synchronous I/O]

  • Such as select/poll, epoll.
  • 1) Call select() to judge whether there is data, if there is data, go down, there is no data stuck there.
  • 2) After select() returns, use recvfrom() to fetch data. Of course, it will get stuck when fetching data.
  • Synchronous I/O feels more troublesome. You have to call two functions to get the data, but the ratio of synchronous I/O and blocking I/O is the so-called I/O multiplexing [the advantage of using two functions to receive data ] ability

【I/O Multiplexing】

  • The so-called I/O multiplexing means that my multiple sockets [multiple TCP connections] can be bundled [a pile], and I can use the synchronous I/O function such as select to wait for data here.
  • The ability of select() is to wait for any one of the multiple TCP connections to have data, and then which TCP has data, and then I will use specific methods such as recvfrom() to receive it.
  • Therefore, this ability to call a function to determine whether a bunch of TCP connections come to data is called I/O multiplexing, English I/O multiplexing [I/O multiplexing]
  • Many materials put blocking I/O, non-blocking I/O, and synchronous I/O into one category, because they more or less have blocking behaviors, and some materials even directly refer to blocking I/O, non-blocking I/O all boils down to a synchronous I/O model, which is fine too.
  • Asynchronous I/O is classified as a separate category, because asynchronous I/O is truly non-blocking .

Guess you like

Origin blog.csdn.net/mrqiuwen/article/details/129624495