What is the process from entering the URL in the browser to seeing the content of the webpage?

The steps in this article are based on the fact that the request is a simple HTTP request, no HTTPS, HTTP2, the simplest DNS, no proxy, and the server has no problems, although this is impractical.

First of all, we will type letters one by one on the keyboard, and then the screen will be displayed in the output box of the browser. How is this achieved?
Let's take a look at the hardware architecture diagram of the CPU:
insert image description here

The memory interface in the CPU directly communicates with the system bus, and then the system bus is connected to an I/O bridge. The other side of the I/O bridge is connected to the memory bus, allowing the CPU to communicate with the memory. On the other side, an I/O bus is connected to connect I/O devices, such as keyboards and monitors.
When the user inputs keyboard characters, the keyboard controller will generate scan code data and buffer it in the register of the keyboard controller, and then the keyboard controller sends an interrupt request to the CPU through the bus.
After the CPU receives the interrupt request, the operating system will save the CPU context of the interrupted process, and then call the interrupt handler of the keyboard.
The keyboard interrupt handler is registered when the keyboard driver is initialized. The function of the keyboard interrupt handler is to read the scan code from the register buffer of the keyboard controller, and then find the character entered by the user on the keyboard according to the scan code. If If the input character is a display character, the scan code will be translated into the ASCII code of the corresponding display character. For example, if the user enters the letter A on the keyboard, which is the display character, the scan code will be translated into the ASCII code of the A character.
After getting the ASCII code of the displayed character, it will put the ASCII code into the "read buffer queue", and the next step is to display the displayed character on the screen, and the driver of the display device will regularly read data from the "read buffer queue" Put it into the "write buffer queue", and finally write the data in the "write buffer queue" to the data buffer in the register of the controller of the display device one by one, and finally display the data on the screen.
After displaying the results, restore the context of the interrupted process.
insert image description here

After typing the url, we need to press the Enter key to access http, or let the CPU process the access request by interrupting the request.

First, the browser program will parse the url:

  1. Determine whether your input is a legal url (or a keyword to be searched), and perform operations such as auto-completion and character encoding based on the content you input, and analyze the transmission protocol to be used and the path of the requested resource. Illegal characters are escaped.
  2. Due to security implications, HSTS is used to force clients to use HTTPS to access pages. For details, see: HSTS you don't know.
  3. The browser may also perform some additional operations, such as security checks and access restrictions (previously domestic browsers restricted 996.icu).
  4. If there is a valid cache, use the cache directly, otherwise initiate a new request to the server.

After parsing the url, if it is not pure ip, dns domain name resolution will be performed. The location in the network depends on ip for identity location, so the first step of url access is to get the ip address of the server. To get the ip address of the server, you need to use dns (Domain Name System, Domain Name System) domain name resolution, and dns domain name resolution is to find the corresponding ip address through url. Of course, there are also basic steps:

  1. First check the DNS cache in the browser, if there is a corresponding record in the browser, it will be used directly and the resolution will be completed;
  2. If the browser does not have a cache, then query the cache of the operating system. If the record is found, it can directly return the IP address and complete the analysis;
  3. If the operating system does not have a dns cache, it will check the local host file. Under the Windows operating system, the host file is generally located at "C:\Windows\System32\drivers\etc\hosts". If the host file has records, use it directly;
  4. If there is no corresponding record in the local host file, the local dns server will be requested, and the local dns server is generally provided by local network service providers such as China Mobile and China Telecom. Normally, it can be automatically assigned by DHCP, of course, you can also configure it manually. Currently, the public dns provided by Google is 8.8.8.8 and the domestic public dns is 114.114.114.114.
  5. If there is no corresponding record in the local dns server, it will go to the root domain name server to query. At present, there are 13 groups of root domain name servers in the world (this does not refer to 13 servers, but 13 ip addresses, numbered by the letter am), in order to It can more efficiently complete the resolution requests of all domain names in the world. The root domain name server itself will not directly resolve domain names, but will assign different resolution requests to other servers below for completion.

After obtaining the corresponding ip address, a tcp connection request will be initiated to port 80 of the web server program with a random port (1024~~65535), and this connection request will enter the kernel's TCP/IP protocol stack (used to identify the connection request , depacketize, peel off layer by layer), and may have to go through the filtering of the Netfilter firewall (a module belonging to the kernel), and finally reach the web program, and finally establish a TCP/IP connection. For the TCP/IP connection between the client and the server The link is the tcp three-way handshake.
After the tcp connection is established, the http message is assembled and the http request is initiated (generally the expected message can be carried during the third handshake), and then the message has to be transmitted through layer upon layer encapsulation and decapsulation (the above The tcp handshake is the same way to transfer messages).
insert image description here

The web server parses the user request after receiving the request, and then returns the response data to the browser client through the web server after logic processing.

When the browser gets the response data, in order to avoid the resource occupation and loss of both the server and the client, according to the value of Connection in the request header or other methods, either party can initiate the closing of the tcp connection, that is, wave tcp four times .
If the value of Connection is close, the server will actively close the tcp connection, and the client will passively close the connection and release the tcp connection. If the Connection is keep-alive, the connection will be kept for a period of time, during which time any request can continue to be processed.

If the corresponding data is a non-html file, the browser will help to make an html file, that is, nest a layer of html outside the data. Then start rendering the interface, the browser is a process of parsing and rendering:

  1. Parsing html files to form a DOM tree
  2. Parsing CSS files to form a rendering tree
  3. After the rendering tree is built, the browser starts to lay out the rendering tree and draw it to the screen.
  4. JS parsing is done by the JS parsing engine in the browser. JS runs single-threaded, and JS may modify the DOM structure, which means that before JS execution is completed, subsequent downloads of all resources are unnecessary, so JS is single-threaded. Will block subsequent resource downloads

Guess you like

Origin blog.csdn.net/h295928126/article/details/126961900