MSS of linux kernel protocol stack TCP option

table of Contents

1 MSS overview

2 Three-way handshake of the client

2.1 Send SYN segment MSS option value

2.1.1 tcp_advertise_mss()

2.1.2 Initialization of tp->advmss

2.2 Receiving SYN+ACK segment

3 Three-way handshake on the server side

3.1 Receiving SYN segment

3.2 Receive ACK message

3.2.1 Initialize advmss

3.3 Send SYN+ACK segment

4 Summary of the three-way handshake

5 Get MSS during sending

5.1 tcp_current_mss()

5.2 tcp_sync_mss()

5.2.1 tcp_mtu_to_mss ()

6 Summary


1 MSS overview

The concept of MSS (Maximum Segment Size) refers to the maximum segment size that the TCP layer can receive. This value only includes the data part of the TCP segment, not the option part.

In addition, there is an MSS option in the TCP header. During the three-way handshake, the TCP sender uses this option to tell the other party the maximum segment size that it can accept. This option value will only appear in the SYN segment, that is, the first two of the three-way handshake. Times .

Although the MSS option exists as an option, it is dispensable in principle, but most of the current TCP communication processes will carry this option, so it can be said that it is the most important option in TCP.

Regarding the content of MSS, it mainly includes two aspects:

  1. How are the MSS options carried in the SYN segment and the SYN+ACK segment determined (some materials will mention RMSS, that is, receive MSS)?
  2. How does MSS play a role in the process of sending packets after the connection is established (the SMSS mentioned in some materials, that is, sending MSS)?

From the three-way handshake process between the client and the server, we can see how RMSS is determined. From the data sending process in the connected state, we can see how the SMSS works.

2 Three-way handshake of the client

The opportunities for the client to process MSS are:

  1. When sending the SYN segment, tell the server the MSS that the local end can receive;
  2. Receive the MSS notified by the server after receiving the SYN+ACK.

2.1 Send SYN segment MSS option value

The SYN segment is also sent through tcp_transmit_skb(). In this function, tcp_syn_build_options() is called to construct the options carried in the SYN segment. The code is as follows:

static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
			    gfp_t gfp_mask)
{
...
	//如果发送的是SYN段,则调用tcp_syn_build_options()构造要携带的TCP选项,其中
	//MSS选项的值就是tcp_advertise_mss()的返回值
	if (unlikely(tcb->flags & TCPCB_FLAG_SYN)) {
		tcp_syn_build_options((__be32 *)(th + 1),
				      tcp_advertise_mss(sk),
				      (sysctl_flags & SYSCTL_FLAG_TSTAMPS),
				      (sysctl_flags & SYSCTL_FLAG_SACK),
				      (sysctl_flags & SYSCTL_FLAG_WSCALE),
				      tp->rx_opt.rcv_wscale,
				      tcb->when,
				      tp->rx_opt.ts_recent,
#ifdef CONFIG_TCP_MD5SIG
				      md5 ? &md5_hash_location :
#endif
				      NULL);
	}
...
}

2.1.1 tcp_advertise_mss()

Advertise means advertising. This function is used to calculate the MSS value to be told to the opposite end, corresponding to the TCB, which is actually calculating the value of tp->advmss based on the MTU of the local device.

static __u16 tcp_advertise_mss(struct sock *sk)
{
	struct tcp_sock *tp = tcp_sk(sk);
	//获取路由
	struct dst_entry *dst = __sk_dst_get(sk);
	//用tp->advmss初始化临时变量mss
	int mss = tp->advmss;

	//如果路由有效并且路由中的MSS比当前值小,那么用路由中的MSS更新tp->advmss
	//因为路由中的MSS是根据网络设备的MTU得来的,必须尊重,可以认为路由中的MSS
	//取值为min(65536-40, MTU-40),其中65535为IP报文的最大长度
	if (dst && dst_metric(dst, RTAX_ADVMSS) < mss) {
		mss = dst_metric(dst, RTAX_ADVMSS);
		tp->advmss = mss;
	}
	//返回确定的mss
	return (__u16)mss;
}

As can be seen from the above, tcp_advertise_mss() actually takes the smallest value of the current tp->advmss and routing table MSS, and the latter is the MTU-40 of the local network card. Next, we need to continue to look at the initial tp->advmss How the value is determined.

2.1.2 Initialization of tp->advmss

Searching the code can find that the initialization of this field is completed in tcp_connect_init(), which is called by tcp_connect(), which means that the initialization of the value is performed during the sending process of the SYN segment. The code is as follows:

static void tcp_connect_init(struct sock *sk)
{
...
	//即也是来源于本端MTU
	tp->advmss = dst_metric(dst, RTAX_ADVMSS);
...
}

To summarize: The MSS option value carried in the SYN segment is actually the MTU-40 of the local network device.

2.2 Receiving SYN+ACK segment

The received SYN+ACK segment is processed in tcp_rcv_synsent_state_process(), in which tcp_paser_option() is called to parse the options carried in the SYN segment. The code related to the MSS option is as follows:

void tcp_parse_options(struct sk_buff *skb, struct tcp_options_received *opt_rx,
		       int estab)
{
...
	switch (opcode) {
	case TCPOPT_MSS:
		if (opsize == TCPOLEN_MSS && th->syn && !estab) {
			//取得选项中携带的MSS值记录到临时变量in_mss中
			u16 in_mss = ntohs(get_unaligned((__be16 *)ptr));
			if (in_mss) {
				//user_mss是应用程序通过TCP选项TCP_MAXSEG设定的,如果不设置,默认为0;
				//如果设定了user_mss并且设定的值小于对端通告的,那么调整in_mss为两者中的最小值。
				if (opt_rx->user_mss &&
					opt_rx->user_mss < in_mss)
					in_mss = opt_rx->user_mss;
				//将min(user_mss, in_mss)记录到mss_clamp中
				opt_rx->mss_clamp = in_mss;
			}
		}
		break;
	}
	...
}

Summary: After receiving the MSS notified by the server, the client compares it with the MSS value set by the application through TCP_MAXSEG, and saves the smaller value of the two in tp->rx_opt.mss_clamp. This value will be Affect the determination of the client to send MSS, see below.

3 Three-way handshake on the server side

The opportunity for the server to process MSS is:

  1. Processing the MSS notified by the client after receiving the SYN segment;
  2. Determination of the MSS option value carried when sending SYN+ACK;

3.1 Receiving SYN segment

The core processing of the SYN segment is completed in tcp_v4_conn_request(), and the content related to MSS option parsing is as follows:

int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
{
...
	//将SYN段中携带的选项先解析到临时变量tmp_opt中
	struct tcp_options_received tmp_opt;
	struct request_sock *req;
...

	//先清空临时变量
	tcp_clear_options(&tmp_opt);
	//将临时变量中的mss_clamp初始化为536
	tmp_opt.mss_clamp = 536;
	//user_mss设置为用户通过套接字选项TCP_MAXSEG设定的值
	tmp_opt.user_mss  = tcp_sk(sk)->rx_opt.user_mss;
	//该函数上面已经见过,它会比较SYN段中携带的MSS和user_mss,
	//然后取二者中较小者保存在tmp_opt.mss_clamp中
	tcp_parse_options(skb, &tmp_opt, 0);
...
	//初始化连接请求块
	tcp_openreq_init(req, &tmp_opt, skb);
...
}

static inline void tcp_openreq_init(struct request_sock *req,
				    struct tcp_options_received *rx_opt,
				    struct sk_buff *skb)
{
...
	//最终确定下来的MSS记录到了连接请求块的mss字段中
	req->mss = rx_opt->mss_clamp;
...
}

3.2 Receive ACK message

The mss recorded in the req will receive the ACK message from the client after the three-way handshake is completed, and the server will assign it to tp->rx_opt.mss_clamp when it creates the TCB of the sub-socket. The code is as follows:

struct sock *tcp_create_openreq_child(struct sock *sk, struct request_sock *req, struct sk_buff *skb)
{
...
	newtp->rx_opt.mss_clamp = req->mss;
...
}

Summary: From the code point of view, in fact, after the server receives the SYN, the processing flow for MSS options is the same as the processing flow for MSS options after the client receives the SYN+ACK segment.

3.2.1 Initialize advmss

In the above, we did not see the initialization of tp->advmss on the server side. In fact, this process is executed after receiving the third ACK message of the handshake. The code is as follows:

struct sock *tcp_v4_syn_recv_sock(struct sock *sk, struct sk_buff *skb,
				  struct request_sock *req,
				  struct dst_entry *dst)
{
...
	//取值就是路由中的MSS值
	newtp->advmss = dst_metric(dst, RTAX_ADVMSS);
...
}

3.3 Send SYN+ACK segment

The sending process of the SYN+ACK segment is mainly completed by tcp_v4_send_synack(), and the content related to the MSS option is as follows:

static int tcp_v4_send_synack(struct sock *sk, struct request_sock *req,
			      struct dst_entry *dst)
{
...
	//创建SYN+ACK段
	skb = tcp_make_synack(sk, dst, req);
...
}

struct sk_buff *tcp_make_synack(struct sock *sk, struct dst_entry *dst,
				struct request_sock *req)
{
...
	//为SYN+ACK段构造TCP选项,第二个参数就是要发送给客户端的MSS,来自于路由
	tcp_syn_build_options((__be32 *)(th + 1), dst_metric(dst, RTAX_ADVMSS), ireq->tstamp_ok,
			      ireq->sack_ok, ireq->wscale_ok, ireq->rcv_wscale,
			      TCP_SKB_CB(skb)->when,
			      req->ts_recent,
			      (
#ifdef CONFIG_TCP_MD5SIG
			       md5 ? &md5_hash_location :
#endif
			       NULL)
			      );
...
}

Summary: From the code point of view, in fact, when the server sends the SYN+ACK segment, the processing flow for MSS options is the same as the processing flow for MSS options when the client sends the SYN segment.

4 Summary of the three-way handshake

As you can see from the previous code, whether it is a server or a client, they have the same selection of the MSS value advertised to the peer. They are all taken from the MTU-40 of the local network card, and this value will be recorded in tp ->advmss; after receiving the MSS notified by the peer, compare it with the tp->rx_opt.user_mss set by the application, and save the smaller of the two in tp->rx_opt.mss_clamp, mss_clamp will affect The selection of sending MSS during sending.

5 Get MSS during sending

It can be seen in the tcp_sendmsg() of TCP data transmission that the core logic of tcp_sendmsg() is to cut the data to be sent into skb according to the MSS. In the process, the current sending MSS is determined by calling tcp_current_mss(). As mentioned earlier in this note, the MSS notified by the peer is finally saved in tp->rx_opt. mss_clamp after correction. According to reason, we can use this value as sending MSS, but it is not that simple, as follows:

int tcp_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
		size_t size)
{
...
	//获取当前有效的MSS,如果没有带外数据,就可以使用更大的段(TSO相关)
	mss_now = tcp_current_mss(sk, !(flags&MSG_OOB));
	size_goal = tp->xmit_size_goal;
...
}

5.1 tcp_current_mss()

This interface is used to obtain the currently valid MSS, and will set the tp->xmit_size_goal variable according to the size of the MTU. This variable will be used to organize the skb in the future. Its value is related to the characteristics of TSO, GSO, etc., which is not the purpose of this note Ignore the topic of discussion.

/* Compute the current effective MSS, taking SACKs and IP options,
 * and even PMTU discovery events into account.
 *
 * LARGESEND note: !urg_mode is overkill, only frames up to snd_up
 * cannot be large. However, taking into account rare use of URG, this
 * is not a big flaw.
 */
unsigned int tcp_current_mss(struct sock *sk, int large_allowed)
{
	struct tcp_sock *tp = tcp_sk(sk);
	struct dst_entry *dst = __sk_dst_get(sk);
	//下面最终决策出的当前有效发送MSS值会记录到该变量中
	u32 mss_now;
	u16 xmit_size_goal;
	int doing_tso = 0;

	//mss_now来自于tp->mss_cache,一脸懵逼,到目前为止还没见过该字段(见下文)
	mss_now = tp->mss_cache;

	//判断是否允许TSO,忽略
	if (large_allowed && sk_can_gso(sk) && !tp->urg_mode)
		doing_tso = 1;

	if (dst) {
		//获取路由中保存的PMTU值
		u32 mtu = dst_mtu(dst);
		//icsk_pmut_cookie为上次缓存的PMTU值,其初始值为本端MTU大小,
		//如果二者不等,则说明PMTU发生了变化,需要调用tcp_sync_mss()更新MSS
		if (mtu != inet_csk(sk)->icsk_pmtu_cookie)
			mss_now = tcp_sync_mss(sk, mtu);
	}
	//如果TCP有SACK选项,则从MSS中减去相应的开销
	if (tp->rx_opt.eff_sacks)
		mss_now -= (TCPOLEN_SACK_BASE_ALIGNED + (tp->rx_opt.eff_sacks * TCPOLEN_SACK_PERBLOCK));

#ifdef CONFIG_TCP_MD5SIG
	if (tp->af_specific->md5_lookup(sk, sk))
		mss_now -= TCPOLEN_MD5SIG_ALIGNED;
#endif

	//下面的代码用来确定xmit_size_goal的值,该值和TSO相关,先忽略
	xmit_size_goal = mss_now;
	if (doing_tso) {
		xmit_size_goal = (65535 -
				  inet_csk(sk)->icsk_af_ops->net_header_len -
				  inet_csk(sk)->icsk_ext_hdr_len -
				  tp->tcp_header_len);

		xmit_size_goal = tcp_bound_to_half_wnd(tp, xmit_size_goal);
		xmit_size_goal -= (xmit_size_goal % mss_now);
	}
	tp->xmit_size_goal = xmit_size_goal;
	//返回当前有效的MSS值
	return mss_now;
}

5.2 tcp_sync_mss()

This function uses the parameter pmtu to update the PMTU related fields, where icsk->icsk_pmtu_cookie saves the previously cached PMTU value. After calculating the MSS according to the PMTU value, the calculation result is saved to tp->mss_cache. This value is the current latest MSS value.

/* This function synchronize snd mss to current pmtu/exthdr set.

   tp->rx_opt.user_mss is mss set by user by TCP_MAXSEG. It does NOT counts
   for TCP options, but includes only bare TCP header.

   tp->rx_opt.mss_clamp is mss negotiated at connection setup.
   It is minimum of user_mss and mss received with SYN.
   It also does not include TCP options.

   inet_csk(sk)->icsk_pmtu_cookie is last pmtu, seen by this function.

   tp->mss_cache is current effective sending mss, including
   all tcp options except for SACKs. It is evaluated,
   taking into account current pmtu, but never exceeds
   tp->rx_opt.mss_clamp.

   NOTE1. rfc1122 clearly states that advertised MSS
   DOES NOT include either tcp or ip options.

   NOTE2. inet_csk(sk)->icsk_pmtu_cookie and tp->mss_cache
   are READ ONLY outside this function.		--ANK (980731)
 */
//注释很重要,仔细看
unsigned int tcp_sync_mss(struct sock *sk, u32 pmtu)
{
	struct tcp_sock *tp = tcp_sk(sk);
	struct inet_connection_sock *icsk = inet_csk(sk);
	int mss_now;

	if (icsk->icsk_mtup.search_high > pmtu)
		icsk->icsk_mtup.search_high = pmtu;
	//根据PMTU值计算MSS,这里除了标准的IP、TCP首部外,还会考虑IP选项、TCP选项
	mss_now = tcp_mtu_to_mss(sk, pmtu);
	//调整MSS为当前发送窗口的一半
	mss_now = tcp_bound_to_half_wnd(tp, mss_now);

	/* And store cached results */
	//将PMTU缓存到icsk_pmtu_cookie中
	icsk->icsk_pmtu_cookie = pmtu;
	if (icsk->icsk_mtup.enabled)
		mss_now = min(mss_now, tcp_mtu_to_mss(sk, icsk->icsk_mtup.search_low));
	//最终决策出来的MSS保存在mss_cache中
	tp->mss_cache = mss_now;

	return mss_now;
}

Note: There are several fields that are strongly related to the PMTU mechanism that have not been clarified yet. After clarifying, come back and add...

5.2.1 tcp_mtu_to_mss ()

This function converts MTU to MSS value, considering IP option and TCP option.

/* Not accounting for SACKs here. */
int tcp_mtu_to_mss(struct sock *sk, int pmtu)
{
	struct tcp_sock *tp = tcp_sk(sk);
	struct inet_connection_sock *icsk = inet_csk(sk);
	int mss_now;

	/* Calculate base mss without TCP options:
	   It is MMS_S - sizeof(tcphdr) of rfc1122
	 */
	//从MTU中减去标准TCP首部、IP首部
	mss_now = pmtu - icsk->icsk_af_ops->net_header_len - sizeof(struct tcphdr);

	/* Clamp it (mss_clamp does not include tcp options) */
	//MSS不能超过对端通告的MSS
	if (mss_now > tp->rx_opt.mss_clamp)
		mss_now = tp->rx_opt.mss_clamp;

	/* Now subtract optional transport overhead */
	//减去扩展首部,启用IPsec时,会有扩展首部
	mss_now -= icsk->icsk_ext_hdr_len;

	/* Then reserve room for full set of TCP options and 8 bytes of data */
	//MSS最小不能小于48字节
	if (mss_now < 48)
		mss_now = 48;

	/* Now subtract TCP options size, not including SACKs */
	//减去TCP选项长度(不包括选择ACK选项),tp->tcp_header_len的值是该TCP连接中可能的最大TCP
	//首部长度,该值是在三次握手过程中根据双方对TCP选项的支持情况确定的
	mss_now -= tp->tcp_header_len - sizeof(struct tcphdr);

	return mss_now;
}

Summary: In fact, during the three-way handshake process, TCP will call tcp_sync_mss() multiple times to update the MSS value, but the principle is the same. It is nothing more than the device MTU or the latest PMTU update, which will not be listed here.

6 Summary

As mentioned at the beginning, MSS related content can be divided into two parts: sending MSS and receiving MSS. The related processing logic of these two kinds of MSS in the code is analyzed in detail. Here, a picture is used to show the relationship between them:
Insert picture description here
Figure 30 -9 is the update rule for sending MSS. The PMTU will be updated to the routing metrics[RTAX_MTU], and each time it is sent, it will recheck whether the sending MSS needs to be updated. At this time, it will first check the MSS in the routing and the last PMTU value cached in icsk_pmtu_cookie. If the two are not equal , Indicating that the MTU has changed during the two transmissions, and the route MSS will be used to update the icsk_pmtu_cookie. Then when deciding to send the final MSS, it will also refer to the value mss_clamp negotiated during the three-way handshake to ensure that the final value will not exceed this value, and mss_clamp is limited by the TCP option TCP_MAXSEG.

Figure 30-10 shows the determination rules of the notification MSS. Very directly, it is the MTU from the network device.

Guess you like

Origin blog.csdn.net/wangquan1992/article/details/109028964