TSO/GRO/GSO of IPv6 and its improper implementation of Linux

One thing that is clear is that IPv6 does not allow intermediate devices to fragment packets. The specific reason for this design is for simplicity and efficiency. Therefore, the IPv6 header is much more concise.

But TSO does not seem to violate the original intention of canceling IPv6 fragmentation. The hardware handles some of them properly. From the perspective of the routing software layer, everything seems to have never happened.

Let me briefly explain the difference between TSO and IP fragmentation:
Insert picture description here

Let’s take a look at a simple experiment. Use IPv6 to pull a large file from the server. The server and client capture packets as follows:
Insert picture description here
From the perspective of the client, there is no IP fragmentation, so it does not need to perform fragmentation reorganization. In fact, a complete 7140-byte message was received, as if this message was actually sent from the server. In fact, the server has obviously never sent such a large message. Obviously, this is the client. Aggregate the results of several small packages.

Therefore, the following conclusion is reasonable:

  • If the forwarding device feels that an IPv6 packet is larger than the egress MTU, and if its payload is a TCP segment, it is not wrong to enable TSO to segment it instead of directly discarding it and sending ICMP too big.

IPv6 only stipulates that the intermediate device cannot be fragmented, but it does not stipulate that the transmission must be kept as it is.

However, is this implemented in the Linux kernel? Strictly speaking, Linux has only achieved half of the above description.

The Linux kernel did:

  • Even if it is a forwarding device, as long as LRO/GRO is enabled, it will collect packets and aggregate multiple small packets into one large packet as much as possible, and modify the corresponding IPv6 header.
  • Even if it is a forwarding device, when forwarding an IPv6 packet that exceeds the MTU size, as long as the metadata gso_size of the packet is not greater than the MTU, it can be successfully forwarded, relying on GSO/TSO to send it in segments when sending, and then split it into independent Of small messages.

However, this is not appropriate.

gso_size can be considered as the size of small packets before aggregation. In my opinion, even if it is larger than the egress MTU, it can be segmented by the TSO/GSO mechanism! It's just that Linux is not implemented.

Look specifically at this code:

int ip6_forward(struct sk_buff *skb)
{
    
    
	...
	if (ip6_pkt_too_big(skb, mtu)) {
    
    
		/* Again, force OUTPUT device used as source address */
		skb->dev = dst->dev;
		icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
		__IP6_INC_STATS(net, idev, IPSTATS_MIB_INTOOBIGERRORS);
		__IP6_INC_STATS(net, ip6_dst_idev(dst),
				IPSTATS_MIB_FRAGFAILS);
		kfree_skb(skb);
		return -EMSGSIZE;
	}
	...
}

When you look at the code of ip6_pkt_too_big, you will find this inappropriate logic inside.

There is no need to raise the bar to say that the implementation of GSO is like this. I mean, as long as you specify an export MTU, no matter what kind of TCP packet, the underlying TSO/GSO will divide it into small segments according to the size of the MTU. The small segment is an independent and complete IPv6 message. This is related to the gso_size set by the entrance GRO!

But in fact, the realization of this by the Linux kernel is so cruel! I used systemtap and bpftrace to detect the conclusions on the spot, and then I looked at the code.


The leather shoes in Wenzhou, Zhejiang are wet, so they won’t get fat in the rain.

Guess you like

Origin blog.csdn.net/dog250/article/details/111399794