In-depth understanding of the TCP protocol: Detailed three-way handshake

1. What is a three-way handshake?

When the TCP protocol to establish a connection, you need to send three packets:

First: the client requests to establish a connection to the server

Second: the server receives the client's request, issued a response

Third: The client receives a response think the connection is established

Detailed process:

Glossary:

SYN - flag only the first and second times for the third time and in any circumstances other 0

ACK - only the first flag is not 1, the second, third and any other cases are 1

Sequence Number sequence number, the random number initial value

Acknowledgment Number acknowledgment number, the next expected sequence number of the received data is

the first time:

Client server >>

SYN =1

ACK =0

Sequence Number = X (random number)

the second time:

SYN =1

ACK =1

Sequence Number = Y (random number)

Acknowledgment Number=X+1

Client server <<

the third time:

SYN =0

ACK =1

Sequence Number=X+1

Acknowledgment Number=Y+1

Client server >>

2. Why have the three-way handshake?

We consider a two-way handshake, and why not:

consider

Link 1 client server >>

2 client server link <<

A handshake:

If a link fails, the client will still be considered a successful connection, the server does not know the connection happen

2 If the link fails, the server and the client will think the connection is successful

Two-way handshake:

2 If the link fails, the server considers the connection is successful

Next, consider the three-way handshake:

If the link fault 1

The server does not receive the first handshake packet, so that there is not a connection request is not mistaken for a successful connection, it does not send a second handshake

The client can not receive a second handshake packet (the server does not send), it is not mistaken for a successful connection

The two sides will not be mistaken for a successful connection

If the link 2 failure

The client can not receive a second handshake packet (server sent but not received due to link failure), there is no mistaken the connection is successful, it will not send the third handshake

The server does not receive the third handshake packet (the client does not send), it is not mistaken for a successful connection

The two sides will not be mistaken for a successful connection

3. capture a typical TCP three-way handshake:

We use the tool wireshark

First execution in cmd ping www.baidu.com

这是为了确定目标IP地址,便于设置捕获规则

ip.addr==180.101.49.12

打开www.baidu.com

捕获到

这三个数据包即是TCP三次握手数据包(192.168.3.89是本地IP)

这里有一个问题:为什么Seq的初始值是0而不是一个随机数?

这是wireShark软件本身的特性,显示的不是实际值而是相对值

三次Sequence Number和Acknowledgment Number的真实值(使用十六进制):

第一次(分别是前八位和后八位):

第二次:

第三次:

可以看到符合

X 0

Y X+1

X+1 Y+1的规律

和我们的预期相符

三次数据包的标志位:

第一次:

第二次:

第三次:

可以看到wireshark已经非常贴心的替我们做好了标注

也和我们的预期一致

3.代码追踪和分析:

在之前的实验中,我们知道发出TCP连接请求的函数是__sys_connect

我们分析这个函数的源代码

 1 int __sys_connect(int fd, struct sockaddr __user *uservaddr, int addrlen)
 2 {
 3     struct socket *sock;
 4     struct sockaddr_storage address;
 5     int err, fput_needed;
 6     sock = sockfd_lookup_light(fd, &err, &fput_needed);
 7     if (!sock)
 8         goto out;
 9     err = move_addr_to_kernel(uservaddr, addrlen, &address);
10     if (err < 0)
11         goto out_put;
12     err =
13         security_socket_connect(sock, (struct sockaddr *)&address, addrlen);
14     if (err)
15         goto out_put;
16  
17     err = sock->ops->connect(sock, (struct sockaddr *)&address, addrlen,
18                  sock->file->f_flags);
19 out_put:
20     fput_light(sock->file, fput_needed);
21 out:
22     return err;
23 }

主要的执行过程是

1 err = sock->ops->connect(sock, (struct sockaddr *)&address, addrlen,
2                     sock->file->f_flags)

这是一个函数指针,我们通过gdb,发现指向:inet_stream_connect

源代码

 1 int inet_stream_connect(struct socket *sock, struct sockaddr *uaddr,
 2             int addr_len, int flags)
 3 {
 4     int err;
 5 
 6     lock_sock(sock->sk);
 7     err = __inet_stream_connect(sock, uaddr, addr_len, flags, 0);
 8     release_sock(sock->sk);
 9     return err;
10 }

发现是对__inet_stream_connect的封装,前面应当是并发控制

继续追踪源代码:

 1 int __inet_stream_connect(struct socket *sock, struct sockaddr *uaddr,
 2               int addr_len, int flags)
 3 {
 4     struct sock *sk = sock->sk;
 5     int err;
 6     long timeo;
 7 
 8     if (addr_len < sizeof(uaddr->sa_family))
 9         return -EINVAL;
10 
11     if (uaddr->sa_family == AF_UNSPEC) {
12         err = sk->sk_prot->disconnect(sk, flags);
13         sock->state = err ? SS_DISCONNECTING : SS_UNCONNECTED;
14         goto out;
15     }
16     switch (sock->state) {
17     default:
18         err = -EINVAL;
19         goto out;
20     case SS_CONNECTED:
21         err = -EISCONN;
22         goto out;
23     case SS_CONNECTING:
24         err = -EALREADY;
25         break;
26     case SS_UNCONNECTED:
27         err = -EISCONN;
28         if (sk->sk_state != TCP_CLOSE)
29             goto out;
30         err = sk->sk_prot->connect(sk, uaddr, addr_len);
31 ...太长了 后面的先省略
 
 
 重点是err = sk->sk_prot->connect(sk, uaddr, addr_len);
 

可以看到这个函数又是通过一个函数指针工作的 

err = sk->sk_prot->connect(sk, uaddr, addr_len);

追踪这个函数指针,发现最终指向:tcp_v4_connect

源代码

  1 int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
  2 {
  3     struct sockaddr_in *usin = (struct sockaddr_in *)uaddr;
  4     struct inet_sock *inet = inet_sk(sk);
  5     struct tcp_sock *tp = tcp_sk(sk);
  6     __be16 orig_sport, orig_dport;
  7     __be32 daddr, nexthop;
  8     struct flowi4 *fl4;
  9     struct rtable *rt;
 10     int err;
 11     struct ip_options_rcu *inet_opt;
 12     struct inet_timewait_death_row *tcp_death_row = &sock_net(sk)->ipv4.tcp_death_row;
 13  
 14     if (addr_len < sizeof(struct sockaddr_in))
 15         return -EINVAL;
 16  
 17     if (usin->sin_family != AF_INET)
 18         return -EAFNOSUPPORT;
 19  
 20     nexthop = daddr = usin->sin_addr.s_addr;
 21     inet_opt = rcu_dereference_protected(inet->inet_opt,
 22                          lockdep_sock_is_held(sk));
 23     if (inet_opt && inet_opt->opt.srr) {
 24         if (!daddr)
 25             return -EINVAL;
 26         nexthop = inet_opt->opt.faddr;
 27     }
 28  
 29     orig_sport = inet->inet_sport;
 30     orig_dport = usin->sin_port;
 31     fl4 = &inet->cork.fl.u.ip4;
 32     rt = ip_route_connect(fl4, nexthop, inet->inet_saddr,
 33                   RT_CONN_FLAGS(sk), sk->sk_bound_dev_if,
 34                   IPPROTO_TCP,
 35                   orig_sport, orig_dport, sk);
 36     if (IS_ERR(rt)) {
 37         err = PTR_ERR(rt);
 38         if (err == -ENETUNREACH)
 39             IP_INC_STATS(sock_net(sk), IPSTATS_MIB_OUTNOROUTES);
 40         return err;
 41     }
 42  
 43     if (rt->rt_flags & (RTCF_MULTICAST | RTCF_BROADCAST)) {
 44         ip_rt_put(rt);
 45         return -ENETUNREACH;
 46     }
 47  
 48     if (!inet_opt || !inet_opt->opt.srr)
 49         daddr = fl4->daddr;
 50  
 51     if (!inet->inet_saddr)
 52         inet->inet_saddr = fl4->saddr;
 53     sk_rcv_saddr_set(sk, inet->inet_saddr);
 54  
 55     if (tp->rx_opt.ts_recent_stamp && inet->inet_daddr != daddr) {
 56         /* Reset inherited state */
 57         tp->rx_opt.ts_recent       = 0;
 58         tp->rx_opt.ts_recent_stamp = 0;
 59         if (likely(!tp->repair))
 60             tp->write_seq       = 0;
 61     }
 62  
 63     inet->inet_dport = usin->sin_port;
 64     sk_daddr_set(sk, daddr);
 65  
 66     inet_csk(sk)->icsk_ext_hdr_len = 0;
 67     if (inet_opt)
 68         inet_csk(sk)->icsk_ext_hdr_len = inet_opt->opt.optlen;
 69  
 70     tp->rx_opt.mss_clamp = TCP_MSS_DEFAULT;
 71  
 72     tcp_set_state(sk, TCP_SYN_SENT);
 73     err = inet_hash_connect(tcp_death_row, sk);
 74     if (err)
 75         goto failure;
 76  
 77     sk_set_txhash(sk);
 78  
 79     rt = ip_route_newports(fl4, rt, orig_sport, orig_dport,
 80                    inet->inet_sport, inet->inet_dport, sk);
 81     if (IS_ERR(rt)) {
 82         err = PTR_ERR(rt);
 83         rt = NULL;
 84         goto failure;
 85     }
 86  
 87     sk->sk_gso_type = SKB_GSO_TCPV4;
 88     sk_setup_caps(sk, &rt->dst);
 89     rt = NULL;
 90  
 91     if (likely(!tp->repair)) {
 92         if (!tp->write_seq)
 93             tp->write_seq = secure_tcp_seq(inet->inet_saddr,
 94                                inet->inet_daddr,
 95                                inet->inet_sport,
 96                                usin->sin_port);
 97         tp->tsoffset = secure_tcp_ts_off(sock_net(sk),
 98                          inet->inet_saddr,
 99                          inet->inet_daddr);
100     }
101  
102     inet->inet_id = tp->write_seq ^ jiffies;
103  
104     if (tcp_fastopen_defer_connect(sk, &err))
105         return err;
106     if (err)
107         goto failure;
108  
109     err = tcp_connect(sk);
110  
111     if (err)
112         goto failure;
113  
114     return 0;
115  
116 failure:
117  
118     tcp_set_state(sk, TCP_CLOSE);
119     ip_rt_put(rt);
120     sk->sk_route_caps = 0;
121     inet->inet_dport = 0;
122     return err;
123 }

重点在于

72  tcp_set_state(sk, TCP_SYN_SENT)
109 err = tcp_connect(sk);

继续分析

源代码:

 1 void tcp_set_state(struct sock *sk, int state)
 2 {
 3     int oldstate = sk->sk_state;
 4 
 5     /* We defined a new enum for TCP states that are exported in BPF
 6      * so as not force the internal TCP states to be frozen. The
 7      * following checks will detect if an internal state value ever
 8      * differs from the BPF value. If this ever happens, then we will
 9      * need to remap the internal value to the BPF value before calling
10      * tcp_call_bpf_2arg.
11      */
12     BUILD_BUG_ON((int)BPF_TCP_ESTABLISHED != (int)TCP_ESTABLISHED);
13     BUILD_BUG_ON((int)BPF_TCP_SYN_SENT != (int)TCP_SYN_SENT);
14     BUILD_BUG_ON((int)BPF_TCP_SYN_RECV != (int)TCP_SYN_RECV);
15     BUILD_BUG_ON((int)BPF_TCP_FIN_WAIT1 != (int)TCP_FIN_WAIT1);
16     BUILD_BUG_ON((int)BPF_TCP_FIN_WAIT2 != (int)TCP_FIN_WAIT2);
17     BUILD_BUG_ON((int)BPF_TCP_TIME_WAIT != (int)TCP_TIME_WAIT);
18     BUILD_BUG_ON((int)BPF_TCP_CLOSE != (int)TCP_CLOSE);
19     BUILD_BUG_ON((int)BPF_TCP_CLOSE_WAIT != (int)TCP_CLOSE_WAIT);
20     BUILD_BUG_ON((int)BPF_TCP_LAST_ACK != (int)TCP_LAST_ACK);
21     BUILD_BUG_ON((int)BPF_TCP_LISTEN != (int)TCP_LISTEN);
22     BUILD_BUG_ON((int)BPF_TCP_CLOSING != (int)TCP_CLOSING);
23     BUILD_BUG_ON((int)BPF_TCP_NEW_SYN_RECV != (int)TCP_NEW_SYN_RECV);
24     BUILD_BUG_ON((int)BPF_TCP_MAX_STATES != (int)TCP_MAX_STATES);
25 
26     if (BPF_SOCK_OPS_TEST_FLAG(tcp_sk(sk), BPF_SOCK_OPS_STATE_CB_FLAG))
27         tcp_call_bpf_2arg(sk, BPF_SOCK_OPS_STATE_CB, oldstate, state);
28 
29     switch (state) {
30     case TCP_ESTABLISHED:
31         if (oldstate != TCP_ESTABLISHED)
32             TCP_INC_STATS(sock_net(sk), TCP_MIB_CURRESTAB);
33         break;
34 
35     case TCP_CLOSE:
36         if (oldstate == TCP_CLOSE_WAIT || oldstate == TCP_ESTABLISHED)
37             TCP_INC_STATS(sock_net(sk), TCP_MIB_ESTABRESETS);
38 
39         sk->sk_prot->unhash(sk);
40         if (inet_csk(sk)->icsk_bind_hash &&
41             !(sk->sk_userlocks & SOCK_BINDPORT_LOCK))
42             inet_put_port(sk);
43         /* fall through */
44     default:
45         if (oldstate == TCP_ESTABLISHED)
46             TCP_DEC_STATS(sock_net(sk), TCP_MIB_CURRESTAB);
47     }
48 
49     /* Change state AFTER socket is unhashed to avoid closed
50      * socket sitting in hash tables.
51      */
52     inet_sk_state_store(sk, state);
53 }

代码注释的含义为:

我们为在BPF中导出的TCP状态定义了一个新的枚举,以免强制冻结内部TCP状态。 以下检查将检测内部状态值是否与BPF值不同。 如果发生这种情况,那么我们需要在调用tcp_call_bpf_2arg之前将内部值重新映射为BPF值。

tcp_connect的功能是构造一个SYN报文并发送

 1 int tcp_connect(struct sock *sk)
 2 {
 3     struct tcp_sock *tp = tcp_sk(sk);
 4     struct sk_buff *buff;
 5     int err;
 6 
 7     tcp_call_bpf(sk, BPF_SOCK_OPS_TCP_CONNECT_CB, 0, NULL);
 8 
 9     if (inet_csk(sk)->icsk_af_ops->rebuild_header(sk))
10         return -EHOSTUNREACH; /* Routing failure or similar. */
11 
12     tcp_connect_init(sk);
13 
14     if (unlikely(tp->repair)) {
15         tcp_finish_connect(sk, NULL);
16         return 0;
17     }
18 
19     buff = sk_stream_alloc_skb(sk, 0, sk->sk_allocation, true);
20     if (unlikely(!buff))
21         return -ENOBUFS;
22 
23     tcp_init_nondata_skb(buff, tp->write_seq++, TCPHDR_SYN);
24     tcp_mstamp_refresh(tp);
25     tp->retrans_stamp = tcp_time_stamp(tp);
26     tcp_connect_queue_skb(sk, buff);
27     tcp_ecn_send_syn(sk, buff);
28     tcp_rbtree_insert(&sk->tcp_rtx_queue, buff);
29 
30     /* Send off SYN; include data in Fast Open. */
31     err = tp->fastopen_req ? tcp_send_syn_data(sk, buff) :
32           tcp_transmit_skb(sk, buff, 1, sk->sk_allocation);
33     if (err == -ECONNREFUSED)
34         return err;
35 
36     /* We change tp->snd_nxt after the tcp_transmit_skb() call
37      * in order to make this packet get counted in tcpOutSegs.
38      */
39     tp->snd_nxt = tp->write_seq;
40     tp->pushed_seq = tp->write_seq;
41     buff = tcp_send_head(sk);
42     if (unlikely(buff)) {
43         tp->snd_nxt    = TCP_SKB_CB(buff)->seq;
44         tp->pushed_seq    = TCP_SKB_CB(buff)->seq;
45     }
46     TCP_INC_STATS(sock_net(sk), TCP_MIB_ACTIVEOPENS);
47 
48     /* Timer for repeating the SYN until an answer. */
49     inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS,
50                   inet_csk(sk)->icsk_rto, TCP_RTO_MAX);
51     return 0;

我们研究这个函数

Guess you like

Origin www.cnblogs.com/Miliapus/p/12104428.html