How Linux implements sending data packets over the network

When Linux wants to send a data packet, how does the packet go from the application to the Linux kernel and finally be sent out by the network card?

So today I will introduce to you  how Linux implements sending data packets over the network.

Contract issuance process

Assume that our network card has been started (RingBuffer allocated and initialized) and the server and client have established sockets

What needs to be noted here is that there are two RingBuffers that the network card applies to allocate during the startup process:

  • igb_tx_buffer Array: This array is used by the kernel to store the description information of the data packet to  be sentvzalloc .
  • e1000_adv_tx_desc Array: This array is used by the network card hardware to store the data packets to be sent . The network card hardware can directly access this memory through DMA and  dma_alloc_coherentallocate

igb_tx_buffer Each element in the array has a pointer to e1000_adv_tx_desc

In this way, the kernel can fill  e1000_adv_tx_desc the array with the data to be sent.

The network card hardware will then  e1000_adv_tx_desc read the actual data directly from the array and send the data to the network

image

Copy to kernel

  • socket system call copies data to the kernel

The application first implements system calls through the interface provided by the socket

send The functions and  functions we use in user mode  sendto are actually  sendto implemented by system calls.

send/sendtoFunctions are just for user convenience, encapsulated in a way that is easier to call.

/* sendto 系统调用 省略了一些代码 */
SYSCALL_DEFINE6(sendto, int, fd, void __user *, buff, size_t, len,
		unsigned int, flags, struct sockaddr __user *, addr,
		int, addr_len)
{
	...
	sock = sockfd_lookup_light(fd, &err, &fput_needed);
	...
	err = sock_sendmsg(sock, &msg, len);
	...	
}

Inside  sendto the system call, first  sockfd_lookup_light the function looks for the socket associated with the given file descriptor (fd)

Then call  sock_sendmsg the function ( sock_sendmsg ==> __sock_sendmsg ==> __sock_sendmsg_nosec)

The  sock->ops->sendmsg function actually executes  inet_sendmsg the protocol stack function

/*
__sock_sendmsg_nosec 函数

iocb:指向与 I/O 操作相关的结构体 kiocb
sock: 指向要执行发送操作的套接字结构体
msg: 指向存储要发送数据的消息头结构体 msghdr
size: 要发送的数据大小

*/
static inline int __sock_sendmsg_nosec(struct kiocb *iocb, struct socket *sock,
				       struct msghdr *msg, size_t size)
{
	...
	return sock->ops->sendmsg(iocb, sock, msg, size);
}

At this time, the kernel will look for the corresponding specific protocol sending function on the socket.

Taking TCP as an example, the specific protocol sending function is tcp_sendmsg

image

tcp_sendmsg Will apply for a kernel state memory  skb(sk_buff) , and then hang it on the send queue (the send queue is a linked list composed of skb)

image

Then copy the data to be sent by the user to skb. After copying, the [Send] operation will be triggered.

The sending mentioned here means that in the current context, the data to be sent is sent from the socket layer to the transport layer.

It should be noted that the actual sending does not necessarily start at this time, because some conditional judgments need to be made (for example, the data in the sending queue has exceeded half of the window size)

Only when the conditions are met can it be sent. If the conditions are not met, the system call may return directly.

Network protocol stack processing

  • Transport layer processing

Then the data comes to the transport layer

The transport layer mainly looks at  tcp_write_xmit functions. This function handles the congestion control and sliding window related work of the transport layer.

This function will calculate the size of the data sent this time based on factors such as the sending window and the maximum segment size, and then encapsulate the data into TCP segments and send them out.

If the window requirements are met, the TCP header is set and the data is passed to the lower network layer for processing.

In the transport layer, the kernel mainly does two things:

  • Make a copy of the data (skb)

Why make a copy? Because after the network card is sent, the skb will be released, but the TCP protocol supports loss retransmission.

Therefore, before receiving the ACK from the other party, a skb must be backed up to prepare for retransmission.

In fact, what is sent is a copy of skb at the beginning. The system will delete the real skb only after receiving the ACK from the other party.

  • Encapsulating TCP header

The system will add a TCP header and encapsulate it into a TCP segment according to the actual situation.

What you need to know here is: each skb contains all header information in the network protocol , such as MAC header, IP header, TCP/UDP header, etc.

When setting these headers, the kernel will fill the corresponding fields by adjusting the position of the pointer instead of frequently requesting and copying memory.

image

For example, when setting the TCP header, just point the pointer to the appropriate location of skb. When setting the IP header later, just move the pointer.

This method takes advantage of the linked list feature of the skb data structure to avoid the performance overhead caused by memory allocation and data copying, thereby improving the efficiency of data transmission.

  • Network layer processing

After the data leaves the transport layer, it comes to the network layer

The network layer mainly does the following things:

  • Route entry lookup:

Search the routing table based on the destination IP address and determine the next hop of the data packet ( ip_queue_xmit function)

  • IP header settings:

Based on the results of the routing table lookup, set the source and destination IP addresses, TTL (time to live), IP protocol and other fields in the IP header.

  • netfilter filtering:

netfilter is a framework in the Linux kernel used to filter and modify data packets

At the network layer, netfilter can be used to filter packets, NAT (Network Address Translation), etc.

  • skb split:

If the size of the data packet exceeds the MTU (Maximum Transmission Unit), the data packet needs to be divided into multiple fragments to accommodate network transmission, and each fragment will be encapsulated into a separate skb

  • Data link layer processing

When the data reaches the data link layer, two subsystems will work together to ensure that the data can be correctly encapsulated, parsed and transmitted during the sending and receiving process.

  • neighbor subsystem

Manage and maintain neighbor relationships between hosts or routers and other devices

The neighbor subsystem will send an arp request to find neighbors, and then store the neighbor information in the neighbor cache table to store the MAC address of the target host.

When a data packet needs to be sent to a target host, the data link layer will first query the neighbor cache table to obtain the MAC address of the target host, thereby correctly encapsulating the data packet (encapsulating the MAC header)

  • network equipment subsystem

The network device subsystem is responsible for handling operations related to the physical network interface, including the encapsulation and sending of data packets, as well as receiving and parsing data packets from the physical interface

The network device subsystem not only handles format conversion of data packets, such as adding frame headers and trailers in Ethernet, but also extracting data from frames.

Also responsible for handling hardware-related operations, such as clock synchronization for sending and receiving data packets, physical layer error detection, etc.

  • Arrive at the network card send queue

The network device subsystem will then select an appropriate network card send queue and add skb to the queue (bypassing the softirq handler)

Then, the kernel will call the entry function of the network card driver  dev_hard_start_xmit to trigger the sending of the data packet.

In some cases, the neighbor subsystem will also add skb packets to the soft interrupt queue (softnet_data) and trigger the soft interrupt (NET_TX_SOFTIRQ)

This process is to hand over the skb data packet to the soft interrupt handler for further processing and sending. The softirq handler will be responsible for the actual packet sending

This is  /proc/softirqsone of the reasons why NET_RX is generally much larger than NET_TX when viewed on a general server.

That is, for receiving packets, the NET_RX soft interrupt is required; for sending packets, the NET_TX soft interrupt is only triggered under certain circumstances.

Network card driver sends

The driver reads the description information of skb from the send queue and hangs it on the RingBuffer ( igb_tx_buffer the array mentioned earlier)

Then map the description information of skb to the memory DMA area accessible by the network card ( e1000_adv_tx_desc the array mentioned earlier)

The network card will  e1000_adv_tx_desc read the actual data directly from the array based on the description information and send the data to the network. This completes the sending process of the data packet

finishing touches

When the data transmission is completed, the network card device will trigger a hardware interrupt ( NET_RX_SOFTIRQ ). This hard interrupt is usually called "send completion interrupt" or "send queue clearing interrupt"

The main function of this hard interrupt is to perform cleanup work after sending, including releasing the memory previously allocated for the data packet, that is, releasing skb memory and RingBuffer memory.

Finally, when the ACK response of this TCP message is received, the transport layer will release the original skb (as mentioned earlier, what is sent is actually a copy of skb)

It can be seen that when the data transmission is completed, the driver is notified that the transmission is completed through a hard interrupt, and this interrupt type is NET_RX_SOFTIRQ

We mentioned earlier that when the network card receives a network packet, it will trigger  NET_RX_SOFTIRQan interrupt to tell the CPU that there is data to process.

In other words, whether the network card receives a network packet or sends a network packet, what is triggered is NET_RX_SOFTIRQ

Summarize

Finally, let’s summarize the process of sending network data packets in Linux systems:

image

  • The application makes system calls through the interface provided by the socket and copies data from the user state to the socket buffer in the kernel state.
  • The network protocol stack takes the data from the socket buffer and processes it layer by layer from top to bottom according to the TCP/IP protocol stack.
    • Transport layer processing: Taking TCP as an example, a copy of the data is copied in the transport layer (for loss retransmission), and then the TCP header is encapsulated for the data.
    • Network layer processing: selecting routes (confirming the IP of the next hop), filling in IP headers, netfilter filtering, fragmenting packets exceeding the MTU size, etc.
    • Neighbor subsystem and network device subsystem processing: Here the data will be further processed and encapsulated, and then added to the send queue of the network card
  • The driver reads the description information of skb from the send queue and hangs it on the RingBuffer, and then maps the description information of skb to the memory DMA area accessible by the network card.
  • The network card sends data to the network
  • When the data transmission is completed, a hard interrupt is triggered to release the skb memory and RingBuffer memory.

Guess you like

Origin blog.csdn.net/qq_41221596/article/details/132920604