A brief introduction to the principle of website access and the principle of GFW

1. Network Layering - OSI

OSI stands for Open Systems Interconnection. The International Organization for Standardization (International Organization for Standardization) developed the OSI (Open System Interconnection) model. This model divides the work of network communication into seven layers, namely physical layer, data link layer, network layer, transport layer, session layer, presentation layer and application layer. Layers 1 to 4 are considered lower layers and these layers are closely related to data movement. Layers 5 to 7 are higher layers and contain application-level data. Each layer is responsible for a specific job, and then passes the data to the next layer.

Please add image description

2. Website access process

1. Domain name resolution

Domain name resolution is a service that points the domain name to the web space IP, allowing people to easily access the website through the registered domain name. An IP address is a digital address that identifies a site on the network. For the convenience of memory, a domain name is used instead of an IP address to identify the site address. Domain name resolution is the process of converting a domain name to an IP address. The resolution of the domain name is done by the DNS server.
Domain name resolution is also called domain name pointing, server settings, domain name configuration, and reverse IP registration. To put it simply, it is to resolve a memorable domain name into an IP, and the service is completed by a DNS server, which resolves the domain name to an IP address, and then binds a subdirectory to the domain name on the host of this IP address.
Addresses in the Internet are digital IP addresses, and the function of domain name resolution is mainly to facilitate memory.

Please add image description

When the browser gets a domain name (such as visiting Baidu: https://www.baidu.com/), it first obtains its IP address (202.108.22.5) through domain name resolution. There is a mapping relationship between the domain name and the IP address.

Domain name resolution level (priority):

  1. Browser cache (short term)

Browser Caching is to save network resources and speed up browsing. The browser stores the recently requested documents on the user's disk. When the visitor requests the page again, the browser can display the document from the local disk. , so that you can speed up the page reading.

  1. Local DNS cache

Domain Name System (English: Domain Name System, abbreviation: DNS) is a service of the Internet. It acts as a distributed database that maps domain names and IP addresses to each other, making it easier for people to access the Internet. DNS uses UDP port 53. Currently, the limit on the length of each domain name is 63 characters, and the total length of the domain name cannot exceed 253 characters.

  1. hosts file

Hosts is a system file without an extension, which can be opened with tools such as Notepad. Its function is to establish an associated "database" between some commonly used URL domain names and their corresponding IP addresses. When the URL is found, the system will first automatically find the corresponding IP address from the Hosts file. Once found, the system will immediately open the corresponding web page. If it is not found, the system will submit the URL to the DNS domain name resolution server for IP address resolution.
It should be noted that the mapping configured in the Hosts file is static.

  1. The DNS server and DNS root domain name server in the network card configuration information

  2. The ip address is requested from the DNS server through the UDP protocol (transport layer), and the correct ip address is returned after the DNS server domain name system resolution (application layer) resolution

2. Send HTTP request

The Hyper Text Transfer Protocol (HTTP) is a simple request-response protocol that typically runs on top of TCP. It specifies what kind of messages the client might send to the server and what kind of response it gets. The headers of the request and response messages are given in ASCII form; the message content has a MIME-like format. This simple model was instrumental in the early success of the Web because it made development and deployment very straightforward.

The TCP and UDP protocols are the core of the TCP/IP protocol. TCP transmission protocol: TCP protocol is a TCP (Transmission Control Protocol) and UDP (User Datagram Protocol) protocols belong to the transport layer protocol. Among them, TCP provides reliable data transmission in the IP environment. The services it provides include data streaming, reliability, effective flow control, full-duplex operation and multiplexing. Sent via connection-oriented, end-to-end and reliable packets. In layman's terms, it opens up a connected channel for the data to be sent in advance, and then sends the data; while UDP does not provide reliability, flow control or error recovery for IP. Generally speaking, TCP corresponds to applications with high reliability requirements, while UDP corresponds to applications with low reliability requirements and economical transmission.

Because HTTP usually runs on top of the TCP protocol, a TCP connection needs to be established before sending an HTTP request.

TCP three-way handshake

The three-way handshake protocol refers to the three-way interaction between the server and the client in the preparation stage for sending data: the first handshake: the client sends a syn packet (syn=j) to the server, and enters the SYN_SEND state, waiting for the server Confirmation; second handshake: the server receives the syn packet and must confirm the client's syn (ack=j+1), and also sends a SYN packet (syn=k), that is, the SYN+ACK packet, at this time the server enters SYN_RECV Status; the third handshake: the client receives the SYN+ACK packet from the server, and sends an acknowledgment packet ACK (ack=k+1) to the server. After the packet is sent, the client and the server enter the ESTABLISHED state and complete the three-way handshake. After the connection is established, the client and server can begin data transfer.

Please add image description
Use C to represent the client and S to represent the server. The three-way handshake is as follows:
C->S
S knows that the sending function of C is normal
S knows that the receiving function of S is normal
S->C
C knows that the sending function of C is normal
C knows that the receiving function of C is normal C knows that the receiving function of
S is normal The function is normal
C knows that the sending function of S is normal
C->S
S knows that the sending function of S is normal ,
S knows that the receiving function of C is normal
, and the communication can be established after repeated confirmation.

3. Packet transmission

The Internet data transmission service refers to the Internet data transmission service provided by operators with relevant authority (the three major ISPs of China Mobile, China Telecom and China Unicom) through the establishment of the Internet backbone network and metropolitan area network and the use of international Internet entrances and exits. Operators without the right to operate domestic communication facilities service business shall not build domestic transmission facilities, and shall lease domestic transmission facilities from operators with corresponding experience rights.

Internet Service Provider (ISP), referred to as ISP, refers to an operator that provides the following information services to the public: First, access services, which help users access the Internet; second, navigation services, which help users navigate the Internet Find the information you need; the third is information service, that is, to establish a data service system, collect, process, store information, maintain and update it regularly, and provide information content services to users through the network.

The transmission of data packets mainly relies on the huge backbone network and domain area network established by several state-owned operators.
Please add image description
There are also data exchange centers between different backbone networks established by different ISPs, so that information and data packets can flow freely from anywhere in the country to other places.
And the possibility of data packets getting lost in the vast "online highway". In each level of network (LAN -> Area Network -> Wide Area Network), there are countless routing nodes, each backbone network has its own routing group and node, the whole group is collectively called as autonomous system ( Autonomous system)
Please add image description

Each AS AS managed by the backbone network is assigned a unique identification code by an international organization called the Internet Assigned Numbers Bureau. For example, the AS AS number of the China Telecom 163 backbone network is AS4134. Each backbone network has an internal routing protocol, and each node is exchanging the IP address information they are connected to according to certain regulations, as a guide for data packets in the "travel" process. The national backbone networks also rely on external routing protocols to exchange the "server maps" they master, typically BGP.

Border Gateway Protocol (BGP) is a routing protocol for autonomous systems that runs over TCP. BGP is the only protocol designed to handle Internet-sized networks, and the only protocol that can properly handle multiple connections between unrelated routing domains. BGP builds on the experience of EGP. The main function of the BGP system is to exchange network reachability information with other BGP systems. The network reachability information includes information about the listed autonomous systems (AS). This information effectively constructs a topology map of AS interconnections and thereby clears routing loops, while policy decisions can be implemented at the AS level.

Please add image description

4. Transfer data, close the connection

2. GFW principle

The Great Firewall (English: Great Firewall, commonly referred to as: GFW, also known as the National Firewall of China, commonly known as the Wall, the Great Wall of the Internet, Kung Fu Network, etc.) is my country's Internet border censorship system (including related administrative censorship systems). This system started in 1998.

1. Domain name resolution service hijacking/DNS cache pollution based on UDP protocol

GFW will perform Intrusion Detection Systems (Intrusion Detection System) detection on all UDP-based DNS domain name query requests that pass through the backbone egress route. Once a domain name query request matching blacklist keywords is found, the Great Firewall, as an intermediate device, will query the query request. or returns false results.
That is to say, the browser cannot query the correct IP corresponding to the domain name, so it cannot access it, and the data packet will be transmitted to the fake IP, so there is no response.
Please add image description

2. Manual blocking of IP addresses or transport layer ports - BGP route hijacking/"routing black hole"

BGP hijacking is to forge the routing table of the routing node located on the main road, import the IP address that it has no or impossible connection into the routing table, and trick the neighboring nodes into believing that the node has the access channel of this IP address. GFW manually maintains a block list for a specific IP address, so as to achieve routing spoofing for a specific IP address. We call such a node a "routing black hole".
Please add image description

3. TCP RST reset

The way of bypass monitoring is generally to mirror the data of the main switch to the control system, and the control system can use libpcap to capture data packets. In this case, to block the establishment of the tcp connection, as long as the control system forges the server to initiate the second handshake response when monitoring the first handshake, the establishment of the connection between the client and the server can be blocked. Because our system is on the intranet, the outgoing message must be faster than the server, so the client will respond to the third handshake after receiving our forged message, and the client will no longer process the real message when the server arrives. , at this time the client requests data from the server again, because the seq number and the ack number are wrong, the server will not accept the client's request.
Please add image description

4. Protocol detection → unpacking according to traffic protocol → keyword matching → blocking

The HTTP protocol has very obvious characteristics, which can be easily detected and identified by the GFW system. GFW then disassembles the data packets according to the HTTP protocol rules. Since it appears as plaintext, it can directly perform keyword matching. For example, get the requested URL from an HTTP GET request. The GFW then takes the requested URL to match keywords, such as finding out if Twitter is in the requested URL. However, keyword matching still uses some efficient regular expressions.
insert image description here

5. Deep packet inspection (machine learning to identify traffic over the wall → direct blocking)

For obfuscated traffic and non-traditional encryption protocols, GFW is using the well-known "artificial intelligence" technology to distinguish these various kinds of traffic that are difficult to judge and identify from the regular cross-border traffic of government and enterprises.
Please add image description

6. DDoS attack

Distributed Denial of Service attack (English meaning Distributed Denial of Service, DDoS for short) refers to that multiple attackers in different locations launch attacks on one or several targets at the same time, or one attacker controls multiple machines located in different locations. And use these machines to attack the victim at the same time. Since the origin of the attack is distributed in different places, this type of attack is called a distributed denial of service attack, and there can be multiple attackers.
Please add image description

Copyright Instructions
For term explanations, please refer to Baidu Encyclopedia
. Part of the content refers to Weibo blogger Lili-Storm Spirits

Guess you like

Origin blog.csdn.net/qq_50216270/article/details/121211597