Introduction and principle analysis of TCP/IP network protocol

1. Application layer protocol

For the application layer, the protocol is defined by the developer himself, and the developer encodes and parses the data according to the custom format specification.

However, from a theoretical analysis, its core mainly includes two points:

① Determine the content of the interaction between the client and the server (the content of the agreement)

② Determine the organization of the transmitted data

We use the current mainstream application layer protocol as an example to illustrate:

①HTTP protocol

Similar to Baidu's website, the former domain name will include the fixed format of http

https://www.baidu.com/?tn=44004473_52_oem_dg

②XML

By observing the file format of xml, we can find that it is highly readable, but when we transmit data, the various data headers accompanying its content are more complicated in the network transmission process, and the data is relatively complex in the network transmission process. Relatively redundant and less efficient

③JSON

Compared with xml, JSON has optimized the data header information to a certain extent. Compared with xml files, its content is more beautiful, with high readability and relatively high scalability. The data header information is much simplified and the transmission efficiency is also high.

④Other agreements

Google: protobuf protocol

IBM: MQTT protocol, message queue telemetry transmission, a protocol designed for the Internet of Things: messages are transmitted in byte format, messages of different byte sizes represent different meanings, redundant characters are omitted, and the length of the message is greatly reduced , which greatly reduces the length of broadband in network transmission, but the process of encoding and parsing is relatively complicated.

Transport layer protocol: implemented by the operating system and provides an API interface to the upper application layer

2. UDP protocol

We will describe it from several characteristics of the UDP protocol:

①No connection

According to the UDP protocol for data transmission, as long as the destination ip and port are reached, the data transmission can be carried out. It is similar to the process of sending text messages. When we send text messages, we do not need to connect with the other party in advance.

②Unreliable transmission

Similar to our process of sending text messages, we are responsible for sending text messages, and are not responsible for whether the text messages are received by the receiver, nor verify the correctness of the data. If the data is wrong, the UDP protocol layer will not return any error information to the application layer.

③ Data transmission in the form of datagrams

UDP must complete the transmission according to the size of the message that the application layer transmits to UDP, and cannot transmit the content of the message in batches

④ size limited

The UDP header has a maximum length limit of 16 bits, that is, a UDP transmission can only transmit 64kb of data (including the UDP header)

⑤ buffer zone

UDP does not have a sending buffer, only a receiving buffer

The sent data will be directly handed over to the kernel, and then transmitted by the kernel to the network layer for the next step of data transmission.

Although UDP has a buffer area for receiving data, the buffer area has a certain size. Once the buffer area is full, the data sent will be lost; at the same time, the buffer area cannot guarantee that the sent UDP message is consistent with the received UDP message. .

UDP sockets can both send data and receive data. This concept is called duplex.

Format of UDP protocol:

65535 bytes of data is

So UDP transmits up to 64kb of data each time

We have a certain description of the 16-bit UDP checksum:

After the data is converted into bytes, it will be added in its own way, and the result will be saved when the data is sent, and the data will be added in the same way again at the receiving end. If the two checksums are the same, then The data is correct. If the two checksums are not the same, it means that the data has been tampered with during transmission. This data verification method is CRC Cyclic Redundancy Check

3.TCP protocol

3.1Reliable transmission of TCP

①Confirm response

We think about such a scenario: when we chat, we use the other party's response to determine that the other party has received the message we sent

If it is in a question-and-answer situation, this kind of response will not cause problems, but if it is in the case of out-of-order network, we send two messages, and the other party responds to our two messages, how do we distinguish between the other party? What is the response to that message?

In fact, there is a relatively simple method to add sequence numbers to both requests and responses:

So in network transmission, how does TCP ensure the correct transmission of large amounts of data?

TCP transmits data in the form of a byte stream, that is to say, any data will be converted into bytes during network transmission, and TCP adds a serial number to each byte to achieve the correct transmission of multiple pieces of data. details as follows:

For example, in the first transmission process, the requester sends a byte stream of 1-1000 serial number (when requesting, set SYN to 1, and store the requested data in the form of byte number in the 32-bit serial number), Then the receiver returns the serial number of 1001, which is used to prompt the requester to send the next data from the byte with the serial number of 1001, and so on until the end of the communication.

The way of these identification data is stored in the 32-bit serial number and the confirmation serial number.

② Timeout retransmission

数据在网络传输过程中会经过很多的网络设备,但是网络设备的储存容量都是一定的,一旦某台网络设备超过了其能储存数据容量的极限,就会将新接收的数据进行丢弃,而对于数据接收方而言,在一定时间内没有接收到请求的数据,我们称之为数据传输超时。

数据传输超时主要包括以下情况:

TCP超时重传机制就是为了解决数据丢失问题:对于数据的发送方而言,如果发送了数据,在请求时间内没有接收到回应,就会重新再发送一遍数据,但是对于接收方而言,如果是因为在发送回应数据时出现了丢包问题,那么其接收数据是不是就重复了呢?

事实并非如此,对于数据接收方而言,其存在一个数据缓存区(我们可以将其理解为一个阻塞队列),数据放入时我们首先会检查缓存区中是否存在这些数据,如果存在就将新数据丢弃,不存在则加入,这样就保证了socket api拿到的为不重复的数据。

那么超时时间又该如何确定呢?

③连接管理

目的:在发送方和请求方初次建立连接的时候,保证双方具有收发数据的能力,换句话说,要通过连接管理来确保网络数据传输的稳定性和有效性

①三次握手

三次握手的过程可以这样理解:发送方发送数据给接收方,接收方受到数据给出回应,同时接收方也主动发送数据给发送方,最后发送方回应接收方的请求(我们通常把接收方的第一次的回应主动发送请求理解为一次数据传输,所以称为三次握手)

三次握手的目的是为了检测网络进行数据传输的可靠性,如果三次握手没有满足,那么双方需要重新协商一些其他信息。

当发送方给接收方发送数据时,syn置为1,当接收方发送回应给发送方时,ack置为1,当两者同时置为1时我们称之为同步请求

在三次握手时状态的变化:

总结:

②四次挥手

保证发送方和接收方进行有效的断开连接

发送方发起断开连接请求,接收方回应请求,同时接收方也发起断开请求,最后发送方给出ACK回应

四次挥手服务端状态分析:

CLOSE_WAIT:四次挥手挥了了两次之后出现的状态,这个状态就是在等待代码中调用socket.close()方法来进行后续的挥手过程~正常情况下,一个服务器上不应该存在大量的CLOSE_WAIT。如果存在,说明大概率是代码存在bug,close没有被执行到

TIME_WAIT:谁主动发起FIN,谁就进入TIME_WAIT.起到的效果就是给最后一次ACK提供重传机会,表面上看起来A发送完ACK之后就没有A的事了,按理说A此时就应该销毁连接释放资源了,但是并没有直接释放,而是会进入TIME_WAIT状态等待一段时间,一段时间之后再进行释放,目的是,怕最后一个ACK丢包,如果最后一个ACK丢包了就意味着B会重传FIN,这时则需要发起者重新回应ACK

总结:

TIME_WAIT应该持续多久:2*MSL

为什么呢?

TIME_WAIT状态是为了防止最后一个ACK在传输过程中的丢包问题,正常接收端接收到ACK的时间是MSL,但是由于在这个时间内没有收到ACK,那么接收端则重新发起FIN请求,这时FIN传输到发送端也需要FIN的时间,所以为了避免ACK的丢包问题,TIME_WAIT时间应该存在2*MSL

四.TCP协议实现功能的完善

①滑动窗口

在数据传输过程中,正常我们所理解的是请求方传输一条数据,并从响应方获取响应,在等待响应的过程中,我们无法去执行有关该条数据有关的任务,但在真实的网络环境中数据是进行批量发送和批量响应的。

为了提高网络传输的效率(实现数据的批量请求和响应),滑动窗口这一数据结构应运而生。

通过滑动窗口,我们一次可以传输很多组数据,传输数据的最大编号同时也会被记录下来,当接收到ACK响应之后,继续从最大数据编号开始传输下一组数据。

通常而言,我们发送的数据会被维护在一种数据结构中,当接收到ACK回复之后,将这组数据从数据结构中删除,同时新的一组传输数据会加入到数据结构,形成类似滑动的效果,因此成为滑动窗口。

针对滑动窗口可能会出现的两种异常:

  1. ACK丢失:请求数据发送给服务端后,服务端也发出了响应,但是传输过程中ACK响应丢失,这种情况如何解决呢?

解决措施如下:在数据传输的规则中,前面数据没有发送ACK之前(接收到数据并给出响应),后面接收的数据不会给出ACK响应,因此即使前面的数据中ACK丢失,但是获得了后面的响应,也表明前面的数据也被成功接收了,异常消除。

  1. 发送方数据丢失

如图所示,当接收到1-1000的数据之后,1001-2000的数据在传输过程中丢失了,无论你后面其他数据怎样传输,接收端返回的ACK请求都是1001(请求编号为1001-2000的数据),此时后面的传输数据会被存放到缓冲区中,数据发送方接收到三次重复的ACK请求之后,会对缺失的部分数据进行重发,而在缓冲区的数据也按照正常的逻辑进行拼装。

②流量控制:既然存在了滑动窗口来提高数据传输的效率,那么数据数据的大小真的能让发送方毫无限制的进行控制吗?

流量控制就是来限制发发送方滑动窗口的大小,通过接收方返回给发送方的ACK,对发送方滑动窗口的大小进行反制。

③拥塞控制

根据网络的通畅程度来控制发送窗口的大小

其拥塞控制的步骤如下:

  1. 在程序刚启动时将容量设置的很小,比如1,如果能够接收到接收方的ACK,则证明数据传输没有问题,后面传输时不断加大滑动窗口的容量

  1. 在达到阈值之前,滑动窗口的增量是以指数级别增加的

  1. 达到一定的阈值之后以每次加1的方式增加(线性增加)

  1. 继续增大的过程中,增大到一定程度发现数据丢包严重,这时说明发生了网络阻塞,此时将容量重新设置为1,阈值也发生变化,此时的阈值为发生网络阻塞的值的一半

  1. 不断重复1到4的步骤

  1. 整个网络通信的过程中,窗口大小,拥塞的大小都是动态变化的。

综上所述,流量控制的是整个滑动窗口的大小,拥塞控制的也是滑动窗口的大小,那么滑动窗口最后以哪个为准呢?

应该是取两个的最小值。

④延迟应答

对于请求端发送的数据,接收方并不是一条一条的回答,通常会采用两种策略:①每次回应ACK相隔一定的条数,比如每两组请求消息回应一组ACK.....

②每隔一定的时间回应一次ACK

如果按照上面的策略,如果是3组请求数据,可以采用每隔一定的时间回应一次ACK

具体的数量和超时时间,不同的系统有所差异,一般是N(请求数据发送的组数)取2,超时时间为200ms

⑤捎带应答

⑥面向字节流

⑦TCP异常情况:

  1. 程序崩溃:操作系统回收进程资源,其中包括文件描述符表,主要是调用socket的close()方法,之后触发FIN操作,进而开始进行四次挥手,这和普通的四次挥手没有区别

  1. 正常关机:系统强制关闭所有进程,系统释放资源,执行流程和程序崩溃类似。

  1. 主机掉电:主要分成了两种情况:

①接收方掉电

发送方并不知道接收方掉电,继续发送SYN请求,同时等待ACK响应,但是没有收到ACK响应,这时候触发超时重传,在多次触发超时重传之后没有收到ACK响应,这时请求方就尝试连接重置(RST标志位),连接重置失败,此时就断开连接。

②发送方掉电

在一般的长连接中,客户端与服务端会维护一个心跳包(客户端每隔一秒给服务端发送数据,证明自己人仍然存活),如果10s服务端仍然没有收到客户端发来的数据,这时候就认为客户端挂了,此时断开连接,等待客户端回复后,重连即可

  1. 断网

与主机掉电情况类似,只不过主机仍然正常运行

Guess you like

Origin blog.csdn.net/m0_65431718/article/details/129165061