Linux · What happens from URL input to page display?

Open the browser from entering the URL to presenting the webpage in front of everyone, what happened behind it? What kind of process did you go through? Let me give you an overall flow chart first, please see the breakdown below for specific steps!

From URL input to page display

Generally speaking, it is divided into the following processes:

  • DNS resolution: resolve domain names into IP addresses
  • TCP connection: TCP three-way handshake
  • send HTTP request
  • The server processes the request and returns an HTTP message
  • The browser parses and renders the page
  • Disconnect: TCP waved four times

1. What is a URL?

URL (Uniform Resource Locator), Uniform Resource Locator, used to locate resources on the Internet, commonly known as URL.
For example  http://www.w3school.com.cn/ht... , obey the following grammatical rules:


The components of scheme://host.domain:port/path/filename
are explained as follows: scheme - defines the type of Internet service. The common protocols are http, https, ftp, file, among which the most common type is http, and https is for encrypted network transmission.
host - defines the domain host (the default host for http is www)
domain - defines the Internet domain name , such as w3school.com.cn
port - defines the port number on the host (the default port number for http is 80)
path - defines the path on the server ( If omitted, the document must be located in the root directory of the site).
filename - defines the name of the document/resource

2. Domain name resolution (DNS)

After the browser enters the URL, it must first go through domain name resolution, because the browser cannot directly find the corresponding server through the domain name, but through the IP address. You may have a question here - a computer can be assigned an IP address, or a host name and a domain name. For example  www.hackr.jp. Then why not assign an IP address from the beginning? This saves you the trouble of parsing. Let's first understand what is an IP address

1.IP address

An IP address refers to an Internet Protocol address, which is an abbreviation of IP Address. The IP address is a unified address format provided by the IP protocol. It assigns a logical address to each network and each host on the Internet to shield the differences in physical addresses. The IP address is a 32-bit binary number, for example, 127.0.0.1 is the local IP.
A domain name is the equivalent of an IP address pretender in disguise, wearing a mask. Its role is to facilitate the memory and communication of a group of server addresses . Users typically use host names or domain names to access each other's computers, rather than directly via IP addresses. Because compared with a set of pure numbers in the IP address, using letters combined with numbers to specify the computer name is more in line with human memory habits. But getting a computer to understand a name is relatively difficult. Because computers are better at dealing with long strings of numbers. In order to solve the above problems, DNS service came into being.

2. What is domain name resolution

The DNS protocol provides the service of looking up an IP address through a domain name, or reversely looking up a domain name from an IP address. DNS is a network server, and our domain name resolution is simply to record an information record on DNS .

For example, baidu.com 220.114.23.56 (the IP address of the external network of the server) 80 (the port number of the server)

3. How does the browser query the IP corresponding to the URL through the domain name?

  • Browser cache: Browsers cache DNS records at a certain frequency.
  • Operating System Cache: If the desired DNS record cannot be found in the browser cache, look for it in the operating system.
  • Routing cache: Routers also have DNS caches.
  • ISP's DNS server: ISP is the abbreviation of Internet Service Provider (Internet Service Provider), and ISP has a dedicated DNS server to respond to DNS query requests.
  • Root server: If the ISP’s DNS server can’t find it yet, it will send a request to the root server for recursive query (the DNS server first asks the IP address of the root domain name server.com domain name server, and then asks the .baidu domain name server, in turn analogy)

DNS resolution process

4. Summary

The browser sends the domain name to the DNS server, the DNS server queries the IP address corresponding to the domain name, and then returns it to the browser, and the browser then prints the IP address on the protocol, and the request parameters will also be carried in the protocol, and then together sent to the corresponding server. Next, the stage of sending HTTP request to the server is introduced. The HTTP request is divided into three parts: TCP three-way handshake, http request response information, and closing the TCP connection.

3. TCP three-way handshake

Before the client sends data, a TCP three-way handshake is initiated to synchronize the serial numbers and confirmation numbers of the client and server, and to exchange TCP window size information .

1. The process of TCP three-way handshake is as follows:

  • The client sends a packet with SYN=1, Seq=X to the server port (the first handshake, initiated by the browser, tells the server that I am going to send a request)
  • The server sends back a response packet with SYN=1, ACK=X+1, Seq=Y to convey the confirmation information (the second handshake, initiated by the server, tells the browser that I am ready to accept, you send it quickly)
  • The client sends back a data packet with ACK=Y+1, Seq=Z, which means "end of handshake" (the third handshake, sent by the browser, tells the server that I will send it immediately, ready to accept it)

2. Why do you need a three-way handshake?

The purpose of the "three-way handshake" in Xie Xiren's "Computer Network" is " to prevent the invalid connection request segment from being suddenly transmitted to the server, resulting in errors ."

4. Send HTTP request

After the TCP three-way handshake ends, it begins to send HTTP request packets .
The request message consists of four parts: request line, request header, and request body, as shown in the following figure:

1. The request line contains the request method, URL, and protocol version

  • There are 8 request methods: GET, POST, PUT, DELETE, PATCH, HEAD, OPTIONS, TRACE.
  • URL is the request address, consisting of <protocol>://<host>:<port>/<path>?<parameter>
  • The protocol version is the http version number
POST  /chapter17/user.html HTTP/1.1

In the above code, "POST" represents the request method, "/chapter17/user.html" represents the URL, and "HTTP/1.1" represents the protocol and protocol version. Now more popular is the Http1.1 version

2. The request header contains the additional information of the request, consisting of keyword/value pairs, one pair per line, and the keyword and value are separated by a colon ":".

Request headers inform the server about the client's request. It contains a lot of useful information about the client environment and the request body. Among them, for example: Host, indicating the host name and virtual host; Connection, added by HTTP/1.1, using keepalive, that is, persistent connection, one connection can send multiple requests; User-Agent, request sender, compatibility and customization requirements.

3. The request body can carry the data of multiple request parameters, including carriage return, line feed and request data, not all requests have request data.

name=tom&password=1234&realName=tomson

The above code carries three request parameters: name, password, and realName.

5. The server processes the request and returns an HTTP message

1. server

A server is a high-performance computer in a network environment. It listens to service requests submitted by other computers (clients) on the network and provides corresponding services, such as web page services, file download services, mail services, and video services. The main function of the client is to browse the web, watch videos, listen to music, etc., and the two are completely different. An application that handles requests—the web server—is installed on each server. Common web server products include apache, nginx, IIS or Lighttpd, etc.
The web server plays the role of management and control . For the requests sent by different users, it will combine the configuration files to entrust different requests to the programs on the server that process the corresponding requests for processing (such as CGI scripts, JSP scripts, servlets, ASP scripts, server-side JavaScript, Or some other server-side technology, etc.), and then return the result of background program processing as a response.

difference between server and client

2. MVC background processing stage

There are many frameworks for background development, but most of them are built according to the MVC design pattern.
MVC is a design pattern that divides the application into three core components: model (model)-view (view)-controller (controller), each of which handles its own tasks and realizes the separation of input, processing and output.

MVC architecture

1. View (view)

It is the operation interface provided to the user and the shell of the program.

2. Model

The model is mainly responsible for data interaction. Of the three components of MVC, the model has the most processing tasks. A model can provide data for multiple views.

3. Controller

It is responsible for selecting the data in the "model layer" according to the instructions input by the user from the "view layer", and then performing corresponding operations on it to produce the final result. The controller belongs to the manager role, receives the request from the view and decides which model component to call to process the request, and then determines which view to use to display the data returned by the model processing.
These three layers are closely related, but independent of each other, and changes in each layer do not affect other layers. Each layer provides an interface (Interface) for the upper layer to call.
As for what happens at this stage? In short, the request sent by the browser first passes through the controller, the controller performs logic processing and request distribution, and then calls the model. At this stage, the model will obtain the data of redis db and MySQL, and will render the data after obtaining the data page, the response information will be returned to the client in the form of a response message, and finally the browser presents the web page to the user through the rendering engine.

3. http response message

The response message consists of three parts: the response line (request line), the response header (header), and the response body. As shown below:

(1) The response line contains: protocol version, status code, status code description

The status code rules are as follows:
1xx: Indication information--indicates that the request has been received and continues to be processed.
2xx: Success - Indicates that the request has been successfully received, understood, and accepted.
3xx: Redirection - further action is necessary to complete the request.
4xx: Client Error - The request has a syntax error or the request cannot be fulfilled.
5xx: Server-Side Error--The server failed to fulfill a legitimate request.

(2) The response header contains additional information of the response message, consisting of name/value pairs

(3) The response body contains carriage return, line feed and response return data, not all response messages have response data

6. The browser parses and renders the page

After the browser gets the response text HTML, let's introduce the browser rendering mechanism

The browser parses and renders the page in five steps:

  • Parse out the DOM tree based on HTML
  • Generate CSS rule tree based on CSS analysis
  • Combine DOM tree and CSS rule tree to generate rendering tree
  • Calculate the information of each node according to the rendering tree
  • Draw the page based on the calculated information

1. Parse the DOM tree according to HTML

  • According to the HTML content, the tags are parsed into a DOM tree according to the structure, and the process of DOM tree parsing is a depth-first traversal. That is, all child nodes of the current node are constructed first, and then the next sibling node is constructed.
  • In the process of reading HTML documents and constructing DOM tree, if a script tag is encountered, the construction of DOM tree will be suspended until the execution of the script is completed.

2. Generate a CSS rule tree based on CSS analysis

  • js execution will pause while parsing the CSS rule tree until the CSS rule tree is ready.
  • Browsers don't render until the CSS rule tree is generated.

3. Combine DOM tree and CSS rule tree to generate rendering tree

  • After the DOM tree and CSS rule tree are all ready, the browser will start to build the rendering tree.
  • Simplifies CSS and can speed up the construction of CSS rule trees, thereby speeding up page response.

4. Calculate the information (layout) of each node according to the rendering tree

  • Layout: Calculate the position and size of each rendered object by rendering the information of the rendered object in the tree
  • Reflow: After the layout is completed, it is found that a certain part has changed and affects the layout, so it needs to go back and re-render.

5. Draw the page according to the calculated information

  • In the drawing phase, the system traverses the rendering tree and calls the "paint" method of the renderer to display the contents of the renderer on the screen.
  • Redrawing: The background color, text color, etc. of an element do not affect the attributes of the surrounding or internal layout of the element, and will only cause the browser to redraw.
  • Reflow: If the size of an element changes, the rendering tree needs to be recalculated and re-rendered.

7. Disconnect

When the data transmission is completed, the tcp connection needs to be disconnected. At this time, tcp is initiated to wave four times .

  • The initiator sends messages to the passive party, Fin, Ack, Seq, indicating that there is no data transmission. And enter the FIN_WAIT_1 state . (The first wave: initiated by the browser, sent to the server, my request message has been sent, you are ready to close)
  • The passive party sends a message, Ack, Seq, indicating that it agrees to the closing request. At this point the host initiator enters the FIN_WAIT_2 state . (Second waving: initiated by the server, tell the browser that I have finished accepting the request message, and I am going to close it, so you should too)
  • Passively sends message segments to the initiator, Fin, Ack, Seq, requesting to close the connection. And enter the LAST_ACK state . (The third wave: initiated by the server, telling the browser that I have finished sending the response message, you are ready to close)
  • The initiator sends message segments, Ack and Seq, to ​​the passive party. Then enter the waiting TIME_WAIT state. The passive side closes the connection after receiving the initiator's segment. If the initiator waits for a certain period of time and does not receive a reply, it will shut down normally . (The fourth wave: initiated by the browser, tell the server that I have finished receiving the response message, and I am going to close it, so you should too)

Guess you like

Origin blog.csdn.net/m0_64560763/article/details/131792974