Network programming (TCP and UDP protocols)

1. Network programming

Java is a language on the Internet. It provides support for network applications at the language level, and programmers can easily develop common network applications.

The network class library provided by Java can realize painless network connection. The underlying details of networking are hidden in Java's native installation system and controlled by JVM. And Java implements a cross-platform network library, 程序员面对的是一个统一的网络编程环境.

1.1 Software Architecture

  • C/S architecture : The full name is Client/Serverstructure, which refers to the client and server structure. Common programs include QQ, Meituan app, 360 Security Guard and other software.

insert image description here

B/S architecture : The full name is Browser/Serverstructure, which refers to the browser and server structure. Common browsers include IE, Google, Firefox, etc.

insert image description here

Both architectures have their own advantages, but no matter what kind of architecture, they are inseparable from the support of the network. Network programming is a program that realizes communication between two computers under a certain protocol.

1.2 Network Basics

  • Computer network:
    interconnect computers distributed in different geographic regions and specialized external devices with communication lines to form a large-scale and powerful network system, so that many computers can easily transmit information to each other and share hardware, software, and data information and other resources.

  • The purpose of network programming: to exchange data and communicate with other computers directly or indirectly through network protocols.

  • There are three main problems in network programming:

    • Question 1: How to accurately locate one or more hosts on the network
    • Question 2: How to locate a specific application on the host
    • Question 3: After finding the host, how to transmit data reliably and efficiently

2. Network Communication Elements

2.1 How to realize the communication between the hosts in the network

  • Address of both parties
    • IP
    • The port number
  • Certain rules: communication between different hardware, operating systems, all of which require a rule. And we call this rule a protocol, that is, a network communication protocol.

2.2 Communication element 1: IP address and domain name

2.2.1 IP address

IP address: refers to the Internet Protocol Address (Internet Protocol Address) , commonly known as IP. An IP address is used to uniquely number a computer device on a network. If we compare "personal computer" to "a phone", then "IP address" is equivalent to "telephone number".

IP address classification method 1:

  • IPv4: It is a 32-bit binary number, usually divided into 4 bytes, expressed a.b.c.din the form of dots 十进制, for example 192.168.65.100. Among them, a, b, c, and d are all decimal integers between 0 and 255.

insert image description here

  • This way can represent up to 4.2 billion. Among them, 3 billion are in North America, 400 million in Asia, and 290 million in China. Early 2011 was exhausted.

  • IP address = network address + host address

    • Network address: Identifies the network segment where a computer or network device resides
    • Host Address: Identifies a specific host or network device

insert image description here

其中,E类用于科研。
  • IPv6: Due to the vigorous development of the Internet, the demand for IP addresses is increasing, but the network address resources are limited, making the distribution of IP more and more tense.

    In order to expand the address space, it is planned to redefine the address space through IPv6, using 128-bit address length, a total of 16 bytes, written as 8 unsigned integers, each integer is represented by four hexadecimal digits, and colons are used between numbers (:)separate. For example: ABCD:EF01:2345:6789:ABCD:EF01:2345:6789According to the conservative method to estimate the addresses that can actually be allocated by IPv6, more than 1,000 addresses can still be allocated per square meter of the entire earth, thus solving the problem of insufficient network address resources. On June 6, 2012, the Internet Society held the World IPv6 Launch Day. On this day, the global IPv6 network was officially launched. Many well-known websites, such as Google, Facebook, and Yahoo, began to permanently support IPv6 access at 0:00 GMT (8:00 Beijing time) on the same day. In June 2018, the three major operators and Alibaba Cloud announced that they will provide IPv6 services in an all-round way, and plan to help China's Internet truly realize "IPv6 Only" by 2025.

    In the design process of IPv6, in addition to solving the problem of address shortage once and for all, other problems that cannot be solved in IPv4 are also considered, mainly including end-to-end IP connection, quality of service (QoS), security, multicast, mobile compatibility, plug and play, etc.

IP address classification method two:

Public addresses (used by the World Wide Web) and private addresses (used by local area networks). The address starting with 192.168. is a private address, the range is 192.168.0.0–192.168.255.255, which is specially used for internal use by the organization.

Common commands:

  • To view the IP address of the machine, enter in the console:
ipconfig
  • To check whether the network is connected, enter in the console:
ping 空格 IP地址
ping 192.168.1.222

Special IP address:

  • Local loopback address (hostAddress) :127.0.0.1
  • host name (hostName) :localhost

2.2.2 Domain name

Hosts on the Internet have two ways to represent addresses:

Domain name resolution: Because the IP address numbers are not easy to remember, domain names appear. The domain name is easy to remember. After entering the domain name of a host when connecting to the network, the domain name server (DNS, Domain Name System, Domain Name System) is responsible for converting the domain name into an IP address, so that a connection can be established with the host.

insert image description here

  1. Enter the www.qq.com domain name in the browser , the operating system will first check whether the local hosts文件URL mapping relationship exists, and if so, it will call the IP address mapping to complete the domain name resolution.
  2. If there is no mapping of this domain name in hosts, check 本地DNS解析器缓存whether there is a mapping relationship of this URL. If so, return directly to complete domain name resolution.
  3. If there is no corresponding URL mapping relationship between hosts and the local DNS resolver cache, it will first find the preferred DNS server set in the TCP/IP parameters. We call it here. When this server receives a query, if the domain name to be queried 本地DNS服务器contains In the local configuration area resource, the resolution result is returned to the client to complete the domain name resolution, which is authoritative.
  4. If the domain name to be queried is not resolved by the local DNS server area, but the server has 缓存the URL mapping relationship, call this IP address mapping to complete the domain name resolution, which is not authoritative.
  5. If the local zone file and cache resolution of the local DNS server are both invalid, the query will be performed according to the settings of the local DNS server (whether to set a forwarder). If the forwarding mode is not used, the local DNS will send the request to 13 root DNS, the root DNS server After receiving the request, it will determine who is authorized to manage the domain name (.com), and will return an IP responsible for the top-level domain name server. After the local DNS server receives the IP information, it will contact the server responsible for the .com domain. After the server responsible for the .com domain receives the request, if it cannot resolve it, it will find a lower-level DNS server address (http://qq.com) that manages the .com domain to the local DNS server. When the local DNS server receives this address, it will look for the (http://qq.com) domain server, repeat the above actions, and query until it finds the www.qq.com host.
  6. If the forwarding mode is used, the DNS server will forward the request to the upper-level DNS server, and the upper-level server will analyze it. If the upper-level server cannot resolve it, it will either find the root DNS or transfer the request to the upper level. this loop. Regardless of whether the local DNS server uses forwarding or root hints, the result is finally returned to the local DNS server, and the DNS server then returns to the client.

2.3 Communication Element 2: Port Number

Network communication is essentially communication between two processes (applications). Each computer has many processes, so how to distinguish these processes during network communication?

If the IP address can uniquely identify a device in the network, then the port number can uniquely identify the process (application) in the device.

Different processes, set different port numbers.

  • Port number: an integer represented by two bytes, and its value range is 0~65535 .
    • Recognized ports: 0~1023. Occupied by predefined service communication, such as: HTTP (80), FTP (21), Telnet (23)
    • Register port: 1024~49151. Assigned to a user process or application. Such as: Tomcat (8080), MySQL (3306), Oracle (1521).
    • Dynamic/private ports: 49152~65535.

If the port number is occupied by another service or application, it will cause the current program to fail to start.

insert image description here

2.4 Communication Element 3: Network Communication Protocol

Multiple computers can be connected through a computer network. Computers in the same network need to abide by certain rules when connecting and communicating, just like a car driving on the road must obey the traffic rules.

  • 网络通信协议: In a computer network, these rules for connection and communication are called network communication protocols, which make uniform regulations on data transmission format, transmission rate, transmission steps, error control, etc., and both communication parties must abide by them at the same time to complete data exchange.

New problem: the network protocol involves too much content and is too complicated. How to solve?

Computer network communication involves a lot of content, such as specifying source and destination addresses, encryption and decryption, compression and decompression, error control, flow control, routing control, how to implement such a complex network protocol? 通信协议分层思想.

When developing a protocol, break down complex components into simpler ones and compound them. The most commonly used way of compounding is the hierarchical way, ie 同层间可以通信、上一层可以调用下一层,而与再下一层不发生关系. Each layer does not affect each other, which is conducive to the development and expansion of the system.

There are two sets of reference models

  • OSI reference model: the model is too idealized and has not been widely promoted on the Internet
  • TCP/IP Reference Model (or TCP/IP Protocol): The de facto international standard.

insert image description here

In the figure above, the OSI Reference Model: Model 过于理想化, which has not been widely promoted on the Internet. TCP/IP Reference Model (or TCP/IP Protocol): De facto 国际标准.

  • TCP/IP protocol: Transmission Control Protocol/Internet Internet Protocol (Transmission Control Protocol/Internet Protocol), TCP/IP is named after its two main protocols: Transmission Control Protocol (TCP) and Internet Protocol (IP), in fact It is a set of protocols, including multiple interrelated protocols with different functions. It is the most basic and extensive protocol of the Internet.

insert image description here

Introduction to the four layers in the TCP/IP protocol:

  • 应用层: The application layer determines the communication activities when providing application services to users. The main protocols are: HTTP protocol, FTP protocol, SNMP (Simple Network Management Protocol), SMTP (Simple Mail Transfer Protocol) and POP3 (short for Post Office Protocol 3, that is, the third version of the Post Office Protocol), etc.
  • 传输层: Mainly to enable network programs to communicate. During network communication, TCP protocol or UDP protocol can be used. The TCP (Transmission Control Protocol) protocol, that is, the Transmission Control Protocol, is a connection-oriented, reliable, byte-stream-based transport layer communication protocol. UDP (User Datagram Protocol, User Datagram Protocol): It is a connectionless transport layer protocol that provides transaction-oriented simple and unreliable information transmission services.
  • 网络层: The network layer is the core of the entire TCP/IP protocol, supporting data communication between networks. It is mainly used to group the transmitted data and send the packet data to the target computer or network. The IP protocol is a very important protocol. IP (Internet Protocol) is also known as Internet Protocol. The responsibility of IP is to transfer data from source to destination. It transmits something called a data packet between the source address and the destination address, and it also provides the function of reassembling the data size to meet the requirements of different networks for the packet size.
  • 物理+数据链路层: The link layer is used to define physical transmission channels, usually a driver protocol for certain network connection devices, such as drivers for optical fibers and network cables.

insert image description here

3. Transport layer protocol: TCP and UDP protocols

The communication protocol is relatively complex, and java.netthe classes and interfaces contained in the package provide low-level communication details. We can use these classes and interfaces directly to focus on network program development without considering the details of communication.

java.netThe package provides support for two common network protocols:

  • UDP : User Datagram Protocol (User Datagram Protocol).
  • TCP : Transmission Control Protocol (Transmission Control Protocol).

3.1 TCP protocol and UDP protocol

TCP protocol:

  • Two application processes that communicate with the TCP protocol: client and server.
  • Before using the TCP protocol, you must first 建立TCP连接form a byte stream-based transmission data channel
  • Before transmission, use the "three-way handshake" method, point-to-point communication, is可靠的
    • The TCP protocol is used 重发机制. When a communication entity sends a message to another communication entity, it needs to receive confirmation information from another communication entity. If no confirmation information is received from another communication entity, it will repeat the message just sent again.
  • available in connection大数据量的传输
  • After the transfer is complete, the释放已建立的连接,效率低

UDP protocol:

  • Two application processes for UDP protocol communication: the sending end and the receiving end.
  • Encapsulate data, source, and destination into data packets (the basic unit of transmission),不需要建立连接
  • Regardless of whether the sender is ready or not, the receiver will not confirm receipt, and the integrity of the data cannot be guaranteed, so it is不可靠的
  • The size of each datagram is limited 64Kin
  • At the end of sending data无需释放资源,开销小,通信效率高
  • Applicable scenarios: transmission of audio, video and common data. e.g. video conferencing

TCP Life Case: Making a Phone Call

UDP life case: sending text messages, sending telegrams

3.2 Three-way handshake

In the TCP protocol, in the preparation stage of sending data, there are three interactions between the client and the server to ensure the reliability of the connection.

  • The first handshake , the client initiates a TCP connection request to the server
  • The second handshake , the server sends an acknowledgment for the client TCP connection request
  • The third handshake , the client sends an acknowledgment of confirmation

insert image description here

1. The client will randomly set an initial sequence number seq=x, and set SYN=1, indicating that this is a SYN handshake message. Then you can send this SYN message to the server, indicating that a connection is initiated to the server, and then the client is in the state 同步已发送.

2. After the server receives the SYN message from the client, it also randomly assigns an initial sequence number (seq=y), and sets ack=x+1, indicating that it has received the data before x from the client, and hopes that the client will send the data next time Start at x+1.
Set SYN=1 and ACK=1. Indicates that this is a SYN handshake and ACK confirmation response message. Finally, the message is sent to the client, and the message does not contain application layer data, and then the server is in the 同步已接收state.

3. After the client receives the message from the server, it needs to respond to the server with the last response message, and set ACK to 1, indicating that this is a response message ack
=y+1, indicating that it has received the server’s y before It is hoped that the data sent by the server next time will start from y+1.
Finally, send the message to the server, this time the message can carry data, and then the client is in the connection established state. After the server receives the response message from the client, it also enters 连接已建立the state.

After the three-way handshake is completed and the connection is established, the client and server can start data transmission. Due to this connection-oriented feature, the TCP protocol can guarantee the security of transmitted data, so it is widely used, such as downloading files, browsing web pages, etc.

3.3 Four waves

In the TCP protocol, after sending data, it needs to wave four times when releasing the connection.

  • The first wave : the client proposes to the server to end the connection, 让服务器做最后的准备工作. At this time, the client is in a half-closed state, which means that it no longer sends data to the server, but it can still receive data.
  • The second wave : After the server receives the client's request to release the connection, 会将最后的数据发给客户端. And inform the upper application process to no longer receive data.
  • The third wave : After the server sends the data, it will send it to the client 发送一个释放连接的报文. Then the client knows that the connection can be officially released after receiving it.
  • Fourth waving : After the client receives the server's last release connection message, it wants to 回复一个彻底断开的报文. In this way, the server will completely release the connection after receiving it. Here the client, after sending the last message, will wait for 2MSL, because it is possible that the server has not received the last message, and if the server has not received it for a long time, it will send the message to the client again to release the connection. If the client receives it within the waiting time, it will resend the last message and restart the timer. If you do not receive it after waiting for 2MSL, then disconnect completely.

insert image description here

1. The client intends to disconnect and sends a FIN message to the server (the FIN flag is set to 1, 1 means FIN, 0 means not), a serial number will be specified in the FIN message, and then the client enters the FIN_WAIT_1 state . That is, the client sends a connection release message segment (FIN message), specifies the sequence number seq = u, actively closes the TCP connection, and waits for the server's confirmation.

2. After the server receives the connection release message segment (FIN message), it sends an ACK response message to the client, and uses the serial number seq+1 of the client's FIN message as the confirmation sequence number ack of the ACK response message segment = seq+1 = u + 1. Then the server enters the CLOSE_WAIT (waiting to close) state, at this time the TCP is in a half-closed state (what will be described below as a half-closed state), and the connection from the client to the server is released. After the client receives the ACK response segment from the server, it enters the FIN_WAIT_2 state.

3. The server also intends to disconnect, and sends a connection release (FIN) segment to the client, and then the server enters the LASK_ACK (final confirmation) state, waiting for the client's confirmation. The server's connection release (FIN) message segment has FIN=1, ACK=1, sequence number seq=m, and confirmation sequence number ack=u+1.

4. After the client receives the connection release (FIN) message segment from the server, it will send an ACK response message segment to the server, and use the confirmation sequence number ack of the connection release (FIN) message segment as the ACK response message segment The sequence number seq uses the sequence number seq+1 of the connection release (FIN) message segment as the confirmation sequence number ack.

After that, the client enters the TIME_WAIT (time waiting) state, and after the server receives the ACK response message segment, the server enters the CLOSE (closed) state, and the connection to this server has been closed. When the client is in the TIME_WAIT state, the TCP has not been released at this time, and it needs to wait for 2MSL before the client enters the CLOSE state.

Guess you like

Origin blog.csdn.net/weixin_43847283/article/details/130396371