In-depth understanding of TCP / IP implementation principles underlying queue

Since the last study of the TCP / IP congestion control algorithms, the more I want more in-depth understanding of some of the underlying principles of TCP / IP, search a lot of information on the network and see the great God Tao Hui column about high-performance network programming , gains a lot. To sum up today, and add some of their own thinking.

 I have a better understanding of the Java language, understanding of the Java programming on the network beyond the use Netty framework. Norman Maurer source contributors Netty Netty for web developers have had word of advice, "Never block the event loop, reduce context-swtiching". That is, try not to clog IO thread, but also to minimize the thread switch.

 Why can not block IO thread reads the network information? Here we must begin to understand from the classic network C10K, how server supports 10,000 concurrent requests. C10K root cause is that the IO model of the network. Linux network processing are used in synchronous blocking the way, that is, each request is assigned a process or thread, to support 10,000 concurrent, is necessary to use 10,000 threads processing requests thing? This 10000 thread scheduling, context switching and the memory they consume, will become a bottleneck. C10K common solution is to use I / O multiplexing, Netty is.

In-depth understanding of TCP / IP implementation principles underlying queue

 Netty has a responsible server listens creating a thread group (mainReactor) connected and responsible for IO thread group (subReactor) connected read and write operations, but also have a special Worker thread group (ThreadPool) processing business logic.

Three independent of each other, so there are many benefits. First, there is the establishment of a special thread group is responsible for monitoring and handling network connection, can prevent TCP / IP half-connection queue (sync) and full connection queue (acceptable) to be filled. Two separate thread group and Worker Thread IO, both parallel processing network I / O and logic operations, can be avoided IO thread is blocked to prevent TCP / IP packets received queue is full. Of course, if the business logic low, i.e. IO intensive light computing business, the business logic can be placed IO processing thread, a thread switch to avoid, which is the second half of Norman Maurer words.

 TCP / IP queue how so much ah? Today we'll take a closer look TCP / IP, several queues, including the establishment of semi-connected queue (sync) when connecting, the whole connection queue (accept) and receiving receive messages when, outoforder, prequeue and backlog queue.

Establish a connection queue

In-depth understanding of TCP / IP implementation principles underlying queue

 As shown above, there are two queues: syns queue (half-connection queue) and accept queue (connection queue full). After the three-way handshake, the server receives a client's SYN packet, the relevant information into the semi-connection queue, and SYN + ACK reply to the client.

 When the service end of the third step of receiving the client's ACK, if not then the whole connection queue is full, then the connection from the semi-queue out the relevant information into a fully connected to the queue, otherwise by value tcp_abort_on_overflow to perform actions directly discarded or over a period of time and try again.

Receive queue when packets

 Compared to establish a connection, TCP processing logic upon receiving packets is more complex, involving more queues and configuration parameters.

 Application receives a TCP packet where the program and the network server system receives incoming TCP packets are two separate processes. Both are examples of manipulation of socket, but will be determined by a time lock contention who should control, resulting in many different scenarios. For example, an application being received packet, an operating system and receives packets through the network card, then how the process? If the application does not call recv read or read messages, the operating system will receive the message how to deal with?

 Then we will give priority to three figures, describes three scenarios when receiving TCP packet, and which presents four receive relevant queue.

Receiving a packet scene

In-depth understanding of TCP / IP implementation principles underlying queue

FIG receiving the TCP packet is a schematic diagram of a scene. The operating system first received message stored in the socket of the receive queue, then the user process again calls recv to read.

1) 当网卡接收报文并且判断为TCP协议时,经过层层调用,最终会调用到内核的 tcp_v4_rcv方法。由于当前TCP要接收的下一个报文正是S1,所以 tcp_v4_rcv函数将其直接加入到 receive队列中。 receive队列是将已经接收到的TCP报文,去除了TCP头部、排好序放入的、用户进程可以直接按序读取的队列。由于socket不在用户进程上下文中(也就是没有用户进程在读socket),并且我们需要S1序号的报文,而恰好收到了S1报文,因此,它进入了 receive队列。

2) 接收到S3报文,由于TCP要接收的下一个报文序号是S2,所以加入到 out_of_order队列,所有乱序的报文会放在这里。

3) 接着,收到了TCP期望的S2报文,直接进入 recevie队列。由于此时 out_of_order队列不为空,需要检查一下。

4) 每次向 receive队列插入报文时都会检查 out_of_order队列,由于接收到S2报文后,期望的的序号为S3,所以 out_of_order队列中的S3报文会被移到 receive队列。

5) 用户进程开始读取socket,先在进程中分配一块内存,然后调用 read或者 recv方法。socket有一系列的具有默认值的配置属性,比如socket默认是阻塞式的,它的 SO_RCVLOWAT属性值默认为1。当然,recv这样的方法还会接收一个flag参数,它可以设置为 MSG_WAITALL、 MSG_PEEK、 MSG_TRUNK等等,这里我们假定为最常用的0。进程调用了 recv方法。

6) 调用 tcp_recvmsg方法

7) tcp_recvmsg方法会首先锁住socket。socket是可以被多线程使用的,而且操作系统也会使用,所以必须处理并发问题。要操控socket,就先获取锁。

8) 此时, receive队列已经有3个报文了,将第一个报文拷贝到用户态内存中,由于第五步中socket的参数并没有带 MSG_PEEK,所以将第一个报文从队列中移除,从内核态释放掉。反之, MSG_PEEK标志位会导致 receive队列不会删除报文。所以, MSG_PEEK主要用于多进程读取同一套接字的情形。

9) 拷贝第二个报文,当然,执行拷贝前都会检查用户态内存的剩余空间是否足以放下当前这个报文,不够时会直接返回已经拷贝的字节数。

10) 拷贝第三个报文。

11) receive队列已经为空,此时会检查 SO_RCVLOWAT这个最小阈值。如果已经拷贝字节数小于它,进程会休眠,等待更多报文。默认的 SO_RCVLOWAT值为1,也就是读取到报文就可以返回。

12) 检查 backlog队列, backlog队列是用户进程正在拷贝数据时,网卡收到的报文会进这个队列。如果此时 backlog队列有数据,就顺带处理下。 backlog队列是没有数据的,因此释放锁,准备返回用户态。

13) 用户进程代码开始执行,此时recv等方法返回的就是从内核拷贝的字节数。

接收报文场景二

 第二张图给出了第二个场景,这里涉及了 prequeue队列。用户进程调用recv方法时,socket队列中没有任何报文,而socket是阻塞的,所以进程睡眠了。然后操作系统收到了报文,此时 prequeue队列开始产生作用。该场景中, tcp_low_latency为默认的0,套接字socket的 SO_RCVLOWAT是默认的1,仍然是阻塞socket,如下图。

In-depth understanding of TCP / IP implementation principles underlying queue

 其中1,2,3步骤的处理和之前一样。我们直接从第四步开始。

4) 由于此时 receive, prequeue和 backlog队列都为空,所以没有拷贝一个字节到用户内存中。而socket的配置要求至少拷贝 SO_RCVLOWAT也就是1字节的报文,因此进入阻塞式套接字的等待流程。最长等待时间为 SO_RCVTIMEO指定的时间。socket在进入等待前会释放socket锁,会使第五步中,新来的报文不再只能进入 backlog队列。

5) 接到S1报文,将其加入 prequeue队列中。

6) 插入到 prequeue队列后,会唤醒在socket上休眠的进程。

7) 用户进程被唤醒后,重新获取socket锁,此后再接收到的报文只能进入 backlog队列。

8) 进程先检查 receive队列,当然仍然是空的;再去检查 prequeue队列,发现有报文S1,正好是正在等待序号的报文,于是直接从 prequeue队列中拷贝到用户内存,再释放内核中的这个报文。

9) 目前已经拷贝了一个字节的报文到用户内存,检查这个长度是否超过了最低阈值,也就是len和 SO_RCVLOWAT的最小值。

10) 由于 SO_RCVLOWAT使用了默认值1,拷贝字节数大于最低阈值,准备返回用户态,顺便会查看一下backlog队列中是否有数据,此时没有,所以准备放回,释放socket锁。

11) 返回用户已经拷贝的字节数。

接收报文场景三

 在第三个场景中,系统参数 tcp_low_latency为1,socket上设置了 SO_RCVLOWAT属性值。服务器先收到报文S1,但是其长度小于 SO_RCVLOWAT。用户进程调用 recv方法读取,虽然读取到了一部分,但是没有到达最小阈值,所以进程睡眠了。与此同时,在睡眠前接收的乱序的报文S3直接进入 backlog队列。然后,报文S2到达,由于没有使用 prequeue队列(因为设置了tcplowlatency),而它起始序号正是下一个待拷贝的值,所以直接拷贝到用户内存中,总共拷贝字节数已满足 SO_RCVLOWAT的要求!最后在返回用户前把 backlog队列中S3报文也拷贝给用户。

In-depth understanding of TCP / IP implementation principles underlying queue

1) 接收到报文S1,正是准备接收的报文序号,因此,将它直接加入到有序的 receive队列中。

2) 将系统属性 tcp_low_latency设置为1,表明服务器希望程序能够及时的接收到TCP报文。用户调用的 recv接收阻塞socket上的报文,该socket的 SO_RCVLOWAT值大于第一个报文的大小,并且用户分配了足够大的长度为len的内存。

3) 调用 tcp_recvmsg方法来完成接收工作,先锁住socket。

4) 准备处理内核各个接收队列中的报文。

5) receive队列中有报文可以直接拷贝,其大小小于len,直接拷贝到用户内存。

6) 在进行第五步的同时,内核又接收到S3报文,此时socket被锁,报文直接进入 backlog队列。这个报文并不是有序的。

7) 在第五步时,拷贝报文S1到用户内存,它的大小小于 SO_RCVLOWAT的值。由于socket是阻塞型,所以用户进程进入睡眠状态。进入睡眠前,会先处理 backlog队列的报文。因为S3报文是失序的,所以进入out_of_order队列。用户进程进入休眠状态前都会先处理一下 backlog队列。

8) 进程休眠,直到超时或者 receive队列不为空。

9) packet received by the core S2. Note that, at this time due to the opening of the tcp_low_latency flag, so the message is not going to wait for the process to enter prequeue queue processing.

10) Since the message S2 is the message to be received, at the same time, a user process to sleep waiting for the message, so the message will be directly copied to the user memory S2.

After 11) each finished processing an orderly message, whether it is copied to the receive queue is copied directly to the user memory, checks out_of_order queue to see if the message can be processed. S3 packets are copied to the user memory, and then wake up the user process.

12) wake up the user process.

13) At this time, it checks whether the number of bytes copied is greater than SO_RCVLOWAT, and the backlog queue is empty. Both are satisfied, ready to return.

 To summarize the role of four queues.

  • receive queue is a real receive queue, the operating system receives the TCP packet, after inspection and treatment, will be saved to the queue.
  • backlog is "standby queue." When the socket is in the context of the user process (i.e., the user is made on the socket system calls, such as the recv), it will save the packet to the operating system when the backlog queue receives a packet, and then directly returned.
  • prequeue is "pre-existing queue." When the socket is not being used by the user process, that is, the user process calls read or recv system call, but the sleep state, the operating system will receive direct messages are stored in prequeue, and then return.
  • out_of_order is "out of order queue." Queue storage is out of order packets, the operating system is not the received packet TCP packet is ready to receive the next sequence number is placed out_of_order queue, waiting for the subsequent processing.

Guess you like

Origin www.cnblogs.com/CQqf2019/p/11095350.html