Computer network【1】TCP/IP

OSI seven-layer model and TCP/IP four-layer model

首先简单说一下OSI参考模型,OSI将网络分为七层,自下而上分别是物理层、数据链路层、网络层、传输层、会话层、表示层、应用层,而TCP/IP体系结构则将网络分为四层,自下而上分别是网络接口层、网络层、传输层、应用层。

insert image description here
Looking at the picture below, the sender wants to send data to the receiver. First, the application layer prepares the data to be sent, and then gives it to the transport layer. The main function of the transport layer is to provide reliable connection services for the sender and receiver. The transport layer processes the data and then sends it to the network layer. The function of the network layer is to manage the network. One of the core functions is path selection (routing). There are many paths from the sender to the receiver. The network layer is responsible for managing which router the next data should go to. After the path is selected, the data comes to the data link layer, which is responsible for sending data from one router to another. Then there is the physical layer. It can be simply understood that the physical layer is the most basic equipment such as network cables.
insert image description here
TCP/IP provides a point-to-point link mechanism to standardize how data should be encapsulated, addressed, transmitted, routed, and received at the destination. It abstracts the software communication process into four abstraction layers, adopts the method of protocol stack, and implements different communication protocols respectively. Various protocols under the protocol family are assigned to these four hierarchical structures according to their different functions, and are often regarded as a simplified seven-layer OSI model.

TCP/IP protocol family

The currently used network model is the TCP/IP model, which simplifies the OSI model and only includes four layers, from top to bottom are the application layer, transport layer, network layer and link layer (network interface layer). Each layer contains several protocols.

Protocol (Protocol) is the agreement or contract in the network communication process. Both parties in the communication must abide by it to send and receive data normally. There are many kinds of protocols, such as TCP, UDP, IP, etc. Both sides of the communication must use the same protocol to communicate. A protocol is a specification formulated by a computer organization that specifies many details, for example, how to establish a connection, how to identify each other, and so on.
A protocol is just a specification that must be implemented by computer software. For example, the IP protocol stipulates how to find the target computer, so each developer must abide by the agreement when developing their own software, and cannot start from scratch.
The TCP/IP model includes hundreds of interrelated protocols such as TCP, IP, UDP, Telnet, FTP, SMTP, etc. Among them, TCP and IP are the two most commonly used underlying protocols, so they are collectively referred to as "TCP/IP protocol". family".

In other words, the protocols involved in the "TCP/IP model" are called "TCP/IP protocol family". You can distinguish these two concepts, or consider them equivalent, whatever you want.

TCP

Baidu Encyclopedia definition: **Transmission Control Protocol (TCP, Transmission Control Protocol) is a connection-oriented, reliable, byte stream-based transport layer communication protocol. **The connection must be established before the data is transmitted, and the connection must be disconnected after the transmission. The client must use the connect() function to establish a connection with the server before sending and receiving data. The purpose of establishing a connection is to ensure that the IP address, port, and physical link are correct, and to open up a channel for data transmission.
When TCP establishes a connection, it needs to transmit three data packets, commonly known as three-way handshake (Three-way Handshaking).

TCP datagram structure and three-way handshake

http://c.biancheng.net/view/2351.html

The client should use the connect() function to establish a connection with the server before sending and receiving data. The purpose of establishing a connection is to ensure that the IP address, port, and physical link are correct, and to open up a channel for data transmission.

When TCP establishes a connection, it needs to transmit three data packets, commonly known as three-way handshake (Three-way Handshaking). It can be vividly compared to the following dialogue:
[Shake 1] Socket A: "Hello, Socket B, I have data to send to you here, let's establish a connection."
[Shake 2] Socket B: "Okay, my side is ready."
[Shake 3] Socket A: "Thank you for accepting my request."
TCP Datagram Structure
Let's first look at the structure of a TCP datagram:

insert image description here

Several shaded fields need to be highlighted:

  1. Sequence number: Seq (Sequence Number) occupies 32 bits and is used to identify the sequence number of the data packet sent from computer A to computer B, which is marked when the computer sends data.

  2. Acknowledgment number: Ack (Acknowledge Number) Acknowledgment number occupies 32 bits, both client and server can send, Ack = Seq + 1.

  3. Flag bit: Each flag bit occupies 1Bit, and there are 6 flag bits in total, namely URG, ACK, PSH, RST, SYN, and FIN. The specific meanings are as follows: URG
    : Urgent pointer (urgent pointer) is valid.
    ACK: Confirm that the sequence number is valid.
    PSH: The receiver should deliver this message to the application layer as soon as possible.
    RST: Reset connection.
    SYN: Establish a new connection.
    FIN: Disconnect a connection.
    Summary of English abbreviations: Seq is the abbreviation of Sequence, which means sequence; Ack(ACK) is the abbreviation of Acknowledge, which means confirmation; SYN is the abbreviation of Synchronous, willing to be "synchronous", here means to establish a synchronous connection; FIN is Finish An abbreviation for Done.

Connection establishment (three-way handshake)

When using connect() to establish a connection, the client and the server will send three data packets to each other, see the figure below:
insert image description here

After the client calls the socket() function to create a socket, the socket is in the CLOSED state because no connection is established; after the server calls the listen() function, the socket enters the LISTEN state and begins to listen to client requests.

At this time, the client starts to initiate a request:

  1. When the client calls the connect() function, the TCP protocol will build a data packet and set the SYN flag, indicating that the data packet is used to establish a synchronous connection. At the same time, a random number 1000 is generated and filled in the "Sequence Number (Seq)" field, indicating the sequence number of the data packet. After completing these tasks and starting to send data packets to the server, the client enters the SYN-SEND state.

  2. When the server receives the data packet and detects that the SYN flag has been set, it knows that this is the "request packet" sent by the client to establish a connection. The server will also build a data packet and set the SYN and ACK flags. SYN means that the data packet is used to establish a connection, and ACK is used to confirm receipt of the data packet sent by the client just now.

The server generates a random number 2000 and fills the "Seq" field. 2000 has nothing to do with client packets.

The server adds 1 to the sequence number of the client packet (1000) to get 1001, and fills the "Acknowledgment Number (Ack)" field with this number.

The server sends out the data packet and enters the SYN-RECV state.

  1. When the client receives the data packet and detects that the SYN and ACK flags have been set, it knows that this is the "acknowledgment packet" sent by the server. The client will check the "Ack" field to see if its value is 1000+1, if it is, it means the connection is established successfully.

Next, the client will continue to build data packets and set the ACK flag, indicating that the client has correctly received the "acknowledgment packet" sent by the server. At the same time, add 1 to the serial number (2000) of the data packet sent by the server just now to get 2001, and use this number to fill the "Acknowledgment Number (Ack)" field.

The client sends the data packet and enters the ESTABLISED state, indicating that the connection has been successfully established.

  1. When the server receives the data packet and detects that the ACK flag has been set, it knows that this is the "acknowledgment packet" sent by the client. The server will check the "Ack" field to see if its value is 2000+1. If it is, it means that the connection is successfully established and the server enters the ESTABLISED state.

So far, both the client and the server have entered the ESTABLISED state, the connection is successfully established, and then data can be sent and received.
The final note
The key to the three-way handshake is to confirm that the other party has received its own data packet. This goal is achieved through the "Ack" field. The computer will record the sequence number Seq of the data packet sent by itself. After receiving the data packet from the other party, check the "Ack" field to see if Ack = Seq + 1 is true. If it is true, it means that the other party has received its own data correctly. Bag.

TCP data transmission process

insert image description here
The above figure shows the process that host A transmits 200 bytes to host B in 2 times (in 2 packets). First, host A sends 100 bytes of data in one data packet, and the Seq number of the data packet is set to 1200. In order to confirm this, host B sends an ACK packet to host A and sets the Ack number to 1301.
In order to ensure that the data arrives accurately, the target machine must return an ACK packet immediately after receiving the data packet (including SYN packet, FIN packet, ordinary data packet, etc.), so that the sender can confirm that the data transmission is successful.
At this time, the Ack number is 1301 instead of 1201, because the increment of the Ack number is the number of transmitted data bytes. Assume that the number of transmitted bytes is not added to each Ack number, so that although the transmission of the data packet can be confirmed, it is impossible to determine whether all 100 bytes are correctly transmitted or part of them is lost, for example, only 80 bytes are transmitted. Therefore, confirm the Ack number according to the following formula:

Ack number = Seq number + number of bytes passed + 1

Same as the three-way handshake protocol, adding 1 at the end is to tell the other party the Seq number to be passed.

Let’s analyze the data packet loss during the transmission process, as shown in the figure below: The
insert image description here
above figure shows that 100 bytes of data are transmitted to host B through the Seq 1301 data packet, but an error occurred in the middle, and host B did not receive it. After a period of time, host A still has not received the ACK confirmation for Seq 1301, so it tries to retransmit the data.

In order to complete the retransmission of the data packet, the TCP socket will start a timer every time it sends a data packet. If the ACK packet sent back by the target machine is not received within a certain period of time, the timer will expire and the data packet will be retransmitted.
The above figure shows the situation of data packet loss, and there will also be the situation of ACK packet loss, which will also be retransmitted.

If the value of RTO (Retransmission Time Out)
is too large, it will cause unnecessary waiting, and if it is too small, it will cause unnecessary retransmission. In theory, it is better to be the network RTT time, but it is subject to the network distance and instantaneous The state delay changes, so in fact, an adaptive dynamic algorithm (such as Jacobson algorithm and Karn algorithm, etc.) is used to determine the timeout time.
Round-trip time (RTT, Round-Trip Time) indicates the total delay experienced from the time the sender sends data to the time when the sender receives the ACK confirmation packet from the receiver (the receiver confirms immediately after receiving the data).
Number of retransmissions
The number of retransmissions of TCP data packets varies according to different system settings. In some systems, a data packet will only be retransmitted 3 times. If the ACK confirmation of the data packet has not been received after 3 times of retransmission, no retransmission will be attempted. However, some business systems with high requirements will continuously retransmit lost data packets to ensure the normal interaction of business data as much as possible.

Finally, it should be noted that the sender will clear the data in the output buffer only after receiving the ACK confirmation packet from the other party.

TCP four-way handshake disconnects

Establishing a connection is very important, it is a prerequisite for the correct transmission of data; disconnecting is also important, it allows the computer to release resources that are no longer in use. If the connection cannot be disconnected normally, it will not only cause data transmission errors, but also cause the socket to fail to close, which will continue to occupy resources. If the concurrency is high, the server pressure will be worrying.

A three-way handshake is required to establish a connection, and a four-way handshake is required to disconnect, which can be vividly compared to the following dialogue:

[Shake 1] 套接字A:“任务处理完毕,我希望断开连接。”
[Shake 2] 套接字B:“哦,是吗?请稍等,我准备一下。”
等待片刻后……
[Shake 3] 套接字B:“我准备好了,可以断开连接了。”
[Shake 4] 套接字A:“好的,谢谢合作。”

The following figure demonstrates the scenario where the client actively disconnects:
insert image description here
after the connection is established, both the client and the server are in the ESTABLISED state. At this time, the client initiates a request to disconnect:

  1. After the client calls the close() function, it sends a FIN packet to the server and enters the FIN_WAIT_1 state. FIN is an abbreviation for Finish, which means disconnection is required to complete the task.

  2. After the server receives the data packet, it detects that the FIN flag is set and knows to disconnect, so it sends a "confirmation packet" to the client and enters the CLOSE_WAIT state.

Note: The server does not disconnect immediately after receiving the request, but sends a "confirmation packet" to the client first, telling it that I know, and I need to prepare before disconnecting.

  1. After receiving the "confirmation packet", the client enters the FIN_WAIT_2 state, and waits for the server to send the data packet again after it is ready.

  2. After waiting for a while, the server is ready and can disconnect, so it actively sends a FIN packet to the client to tell it that I am ready and disconnect. Then enter the LAST_ACK state.

  3. After the client receives the FIN packet from the server, it sends an ACK packet to the server, telling it that you should disconnect. Then enter the TIME_WAIT state.

  4. After the server receives the ACK packet from the client, it disconnects, closes the socket, and enters the CLOSED state.

Explanation about the TIME_WAIT state
The client enters the TIME_WAIT state after sending the ACK packet for the last time, instead of directly entering the CLOSED state to close the connection. Why?

TCP is a connection-oriented transmission method. It must ensure that the data can reach the target machine correctly without loss or error. However, the network is unstable and the data may be destroyed at any time. Therefore, every time machine A sends a data packet to machine B, it requires Machine B "confirms", returns the ACK packet, and tells machine A that I have received it, so that machine A can know that the data transmission is successful. If machine B does not return an ACK packet, machine A will resend until machine B returns an ACK packet.

When the client returns the ACK packet to the server for the last time, the server may not receive it due to network problems, and the server will send the FIN packet again. If the client completely closes the connection at this time, the server will not receive the ACK anyway. package, so the client needs to wait for a while to confirm that the other party has received the ACK package before entering the CLOSED state. So, how long to wait?

The data packet has a survival time in the network, and it will be discarded before reaching the target host after this time, and the source host will be notified. This is called the maximum segment lifetime (MSL, Maximum Segment Lifetime). TIME_WAIT waits for 2MSL before entering the CLOSED state. It takes MSL time for the ACK packet to arrive at the server, and it also takes MSL time for the server to retransmit the FIN packet. 2MSL is the maximum round-trip time of the data packet. If the FIN packet retransmitted by the server has not been received after 2MSL, it means that the server has received the ACK packet.

Sticky packet problem of TCP protocol

The TCP protocol sticky packet problem is caused by the wrong design of application layer protocol developers. They ignore the core mechanism of TCP protocol data transmission—based on byte streams, which do not contain concepts such as messages and data packets. All data transmission is It is streaming, and the application layer protocol needs to design the boundary of the message itself, that is, message frame (Message Framing)

For example, A communicates with B in TCP, and A sends a 100-byte and 200-byte data packet to B successively, so how does B receive it? B may receive 100 bytes first, then 200 bytes; it may also receive 50 bytes first, then 250 bytes; or it may receive 100 bytes first, then 100 bytes, and then 100 bytes

Solve the sticky package problem

  1. fixed-length packets
  2. End the packet with the specified character (string)
  3. Package header + package body format

The following is the basic flowchart of the header + body method:
insert image description here

Big and small endian problems in network transmission

There are two ways for the CPU to save data to memory:

Big Endian: high-order bytes are stored in low-order addresses (high-order bytes come first)
Little-Endian (Little Endian): high-order bytes are stored in high-order addresses (low-order bytes come first)
insert image description here
Why are there big and small endian modes? points:

"Because in the computer system, we use bytes as the unit, and each address unit corresponds to a byte, and a byte is 8bit. But in the C language, in addition to the 8bit char, there is also a 16bit short type, 32bit long type (depending on the specific compiler), in addition, for processors with more than 8 bits, such as 16-bit or 32-bit processors, since the register width is greater than one byte, there must be a The problem of how to arrange multiple bytes. Therefore, it leads to big-endian storage mode and little-endian storage mode"

Suppose you now want to transfer and exchange data between machines that use different byte orders, what should you do? (The same data, different machines may have different understandings, isn’t it contrary to the original intention!) There are two methods, one is to convert all of them into text for transmission, and the other is to transmit both sides according to the byte order of a certain party (At this time, there is a problem of mutual conversion between different byte orders).

IP, MAC and port number - three elements to confirm identity information in network communication

IP

IP address is the abbreviation of Internet Protocol Address, translated as "Internet Protocol Address".

At present, most software uses IPv4 addresses, but IPv6 is also being accepted by people, especially in the education network, which has been widely used.

A computer can have an independent IP address, and a LAN can also have an independent IP address (it looks like there is only one computer externally). For the widespread use of IPv4 addresses at present, its resources are very limited, and it is unrealistic for one computer to have one IP address, and often only one local area network has one IP address.

When communicating on the Internet, it is necessary to know the IP address of the other party. In fact, the IP address has already been attached to the data packet. After sending the data packet to the router, the router will find the other party’s location in the field according to the IP address, and complete a data transfer. The router has a very efficient and intelligent algorithm and will find the target computer very quickly.

special IP

  1. Everyone needs to remember 127.0.0.1, it is a special IP address, indicating the address of this machine
  2. Strictly speaking, 0.0.0.0 is no longer a real IP address. It represents such a set: all unknown hosts and destination networks. "Unclear" here means that there is no specific entry in the local routing table to indicate how to get there. For this machine, it is a "shelter", and all the "three noes" people who don't know will be sent there. If you set the default gateway in the network settings, the Windows system will automatically generate a default route with the destination address 0.0.0.0.
  3. 255.255.255.255 restricts broadcast addresses. For this machine, this address refers to all hosts in this network segment (same broadcast domain). If translated into human language, it should be like this: "Everyone in this room is paying attention!" This address cannot be forwarded by routers
  4. 224.0.0.1 multicast address, pay attention to the difference between it and broadcast. This is the address from 224.0.0.0 to 239.255.255.255. 224.0.0.1 refers to all hosts, and 224.0.0.2 refers to all routers. Such addresses are mostly used for some specific programs and multimedia programs. If your host has enabled the IRDP (Internet Route Discovery Protocol, using multicast function), then there should be such a route in your host routing table.
  5. 169.254.XX If your host uses the DHCP function to automatically obtain an IP address, then when your DHCP server fails, or the response time is too long and exceeds the time specified by a system, the Wingdows system will assign you such an address. If you find that your host IP address is something like this, unfortunately, nine times out of ten your network is not working properly
  6. 10.XXX, 172.16.XX~172.31.XX, 192.168.XX Private addresses, these addresses are widely used in the enterprise internal network. Some broadband routers also often use 192.168.1.1 as the default address. Since the private network is not interconnected with the outside world, it may use random IP addresses. Such address is reserved for its use to avoid address confusion when accessing the public network in the future. When a private network using a private address accesses the Internet, address translation (NAT) is used to translate the private address into a public legal address. On the Internet, such addresses cannot appear. For a host on a network, there are three legal destination network addresses that it can normally receive: the local IP address, broadcast address, and multicast address

MAC

The reality is that a LAN can often have an independent IP; in other words, an IP address can only be located on a LAN, not a specific computer. What can I do? There is no way to communicate like this.

In fact, what can truly uniquely identify a computer is the MAC address, and the MAC address of each network card is unique in the world. When the computer leaves the factory, the MAC address has been hard-coded into the network card (of course, it can also be modified through some "kicky skills"). The router/switch in the LAN keeps track of each computer's MAC address.
MAC address is the abbreviation of Media Access Control Address, literally translated as "Media Access Control Address", also known as LAN Address (LAN Address), Ethernet Address (Ethernet Address) or Physical Address (Physical Address).
In addition to the IP address of the other party, the data packet will also be accompanied by the MAC address of the other party. When the data packet reaches the LAN, the router/switch will find the corresponding computer according to the MAC address in the data packet, and then forward the data packet to it. This completes the data transfer.

The port number

With an IP address and a MAC address, although the target computer can be found, it is still impossible to communicate. A computer can provide multiple network services at the same time, such as Web service (website), FTP service (file transfer service), SMTP service (mailbox service), etc., only the IP address and MAC address, although the computer can correctly receive the data packet , but it does not know which network program to hand over the data packet for processing, so the communication fails.

In order to distinguish different network programs, the computer will assign a unique port number (Port Number) to each network program, for example, the port number of the Web service is 80, the port number of the FTP service is 21, and the port number of the SMTP service is 25 .

Port (Port) is a virtual, logical concept. A port can be understood as a door through which data flows in and out. Each door has a different number, which is the port number. As shown below:
insert image description here

subnet mask

Guess you like

Origin blog.csdn.net/qq_41224270/article/details/127917883