What happened from entering the url to the page display

 

1. Enter the address

    When we start to enter URLs in the browser, the browser is already intelligently matching possible URLs. It will find the URLs that may correspond to the entered strings from history, bookmarks, etc., and then give Smart prompts allow you to complete url addresses. For Google's chrome browser, it will even display the web page directly from the cache, that is to say, the page will come out before you press enter.

2. The browser looks up the IP address of the domain name  

  1. Once the request is initiated, the first thing the browser has to do is to resolve the domain name. Generally speaking, the browser will first check the hosts file of the local hard disk to see if there are any rules corresponding to the domain name. Use the IP address in the hosts file directly.

      2. If the corresponding ip address cannot be found in the local hosts file, the browser will send a DNS request to the local DNS server. The local DNS server is generally provided by your network access server provider, such as China Telecom and China Mobile.

    3. After the DNS request for querying the URL you entered reaches the local DNS server, the local DNS server will first query its cache record. If there is such a record in the cache, the result can be returned directly. This process is a recursive query. If not, the local DNS server will also query the DNS root server.

  4. The root DNS server does not record the corresponding relationship between the specific domain name and IP address, but tells the local DNS server that you can go to the domain server to continue the query and give the address of the domain server. This process is an iterative process.

  5. The local DNS server continues to make requests to the domain server, in this example, the request object is the .com domain server. After the .com domain server receives the request, it will not directly return the correspondence between the domain name and the IP address, but will tell the local DNS server the address of the resolution server for your domain name.

  6. Finally, the local DNS server sends a request to the resolution server of the domain name. At this time, it can receive a corresponding relationship between the domain name and the IP address. The local DNS server not only returns the IP address to the user's computer, but also saves the corresponding relationship in the In the cache, for the next time another user queries, the result can be returned directly to speed up network access.

 

The following picture perfectly explains the process:

Knowledge expansion:

1) What is DNS?

  DNS (Domain Name System, Domain Name System), a distributed database on the Internet that maps domain names and IP addresses to each other , enables users to access the Internet more conveniently without having to remember IP strings that can be directly read by machines. The process of obtaining the IP address corresponding to the host name through the host name is called domain name resolution (or host name resolution).

  In layman's terms, we are more accustomed to remembering the name of a website, such as www.baidu.com, rather than remembering its ip address, such as: 167.23.10.2. And computers are better at remembering the IP address of a website than a link like www.baidu.com. Because DNS is equivalent to a phone book, for example, if you want to find the domain name www.baidu.com, then I look through my phone book, and I know, oh, its phone (ip) is 167.23.10.2.

 

2) Two ways of DNS query: recursive query and iterative query

1. Recursive analysis

    When the local DNS server cannot answer the client's DNS query by itself, it needs to query other DNS servers. At this time, there are two ways, as shown in the figure is the recursive way. The local DNS server itself is responsible for querying other DNS servers, generally first querying the root domain server of the domain name, and then querying downward from the root domain name server level by level. The finally obtained query result is returned to the local DNS server, which is then returned to the client by the local DNS server.

2. Iterative analysis

  When the local DNS server itself cannot answer the client's DNS query, it can also resolve by iterative query, as shown in the figure. The local DNS server does not query other DNS servers by itself, but returns the IP addresses of other DNS servers that can resolve the domain name to the client DNS program, and the client DNS program continues to query these DNS servers until the query result is obtained. until. That is to say, iterative parsing just helps you find the relevant server, not searches for you. For example: the server IP address of baidu.com is at 192.168.4.5, you can check it yourself, I am busy, so I can only help you here.

 

3) How the DNS Domain Namespace is Organized

 We talked about the root DNS server and the domain DNS server earlier. These are the organization of the DNS domain name space. The five categories used to describe DNS domain names in the namespace by their function are described in the table below, along with examples of each name type

(Pirates)

 

4) DNS load balancing

  When a website has enough users, if the resources requested for each time are located on the same machine, then the machine may jump at any time. The solution is to use DNS load balancing technology. Its principle is to configure multiple IP addresses for the same host name in the DNS server. When answering DNS queries, the DNS server will use the IP address recorded by the host in the DNS file for each query. Return different parsing results in order, and guide the client's access to different machines, so that different clients access different servers, so as to achieve the purpose of load balancing. For example, according to the load of each machine, the machine is far from the user. geographic distance, etc.

3. The browser sends an HTTP request to the web server

  After getting the IP address corresponding to the domain name, the browser will initiate a TCP connection request to port 80 of the server's WEB program (commonly used httpd, nginx, etc.) with a random port (1024<port<65535), and the connection request reaches the server. Then (through various routing devices in the middle, except in the local area network), enter the network card, and then enter the TCP/IP protocol stack of the kernel (used to identify the connection request, decapsulate, and peel it layer by layer), It may also be filtered by the Netfilter firewall (a module belonging to the kernel), and finally reach the WEB program, and finally establish a TCP/IP connection.

The TCP connection is shown in the figure:

  After the TCP connection is established, an http request is made. A typical http request header generally needs to include the request method, such as GET or POST, etc. PUT, DELETE, HEAD, OPTION and TRACE methods are not commonly used. General browsers can only initiate GET or POST requests.

  When the client initiates an http request to the server, there will be some request information. The request information contains three parts:

  | Request Method URI Protocol/Version

      | Request Header

  | Request body:

Here is an example of a complete HTTP request:

copy code
GET/sample.jspHTTP/1.1
Accept:image/gif.image/jpeg,*/*
Accept-Language:zh-cn
Connection:Keep-Alive
Host:localhost
User-Agent:Mozila/4.0(compatible;MSIE5.01;Window NT5.0)
Accept-Encoding:gzip,deflate
username=jinqiao&password=1234
copy code

 Note: After the last request header is a blank line, a carriage return and a newline are sent to inform the server that there are no more request headers below.

(1) The first line of the request is "method URL protocol/version": GET/sample.jsp HTTP/1.1
(2) Request Header
   The request header contains a lot of useful information about the client environment and the request body. For example, request headers can declare the language used by the browser, the length of the request body, etc.

Accept:image/gif.image/jpeg.*/*
Accept-Language:zh-cn
Connection:Keep-Alive
Host:localhost
User-Agent:Mozila/4.0(compatible:MSIE5.01:Windows NT5.0)
Accept-Encoding:gzip,deflate.

(3) Request
    body There is a blank line between the request header and the request body. This line is very important. It indicates that the request header has ended, and the next is the request body. The request body can contain query string information submitted by the client:

username=jinqiao&password=1234

 

Knowledge expansion:

1) TCP three-way handshake

The first handshake: Client A sets the flag SYN to 1, and randomly generates a data packet with a value of seq=J (the value range of J is = 1234567) to the server. Client A enters the SYN_SENT state and waits for the server B confirms;

The second handshake: After server B receives the data packet, the flag bit SYN=1 knows that client A requests to establish a connection. Server B sets both the flag bits SYN and ACK to 1, ack=J+1, and randomly generates a The value seq=K, and the packet is sent to client A to confirm the connection request, and server B enters the SYN_RCVD state.

The third handshake: After client A receives the confirmation, it checks whether the ack is J+1 and whether the ACK is 1. If it is correct, the flag bit ACK is set to 1, ack=K+1, and the packet is sent to Server B and server B check whether ack is K+1 and whether ACK is 1. If it is correct, the connection is established successfully. Client A and server B enter the ESTABLISHED state and complete the three-way handshake. Then client A and server B Data can be transferred between.

as the picture shows:

 

 

2) Why three handshakes are needed?

  The purpose of the "three-way handshake" in the fourth edition of "Computer Network" is to "prevent the invalid connection request segment from being suddenly transmitted to the server, resulting in an error"

   The example in the book is like this, the "invalid connection request segment" is generated in such a situation: the first connection request segment sent by the client is not lost, but at a certain network node. It has been stuck for a long time, so that the server is delayed until a certain time after the connection is released. Originally this was a long-defunct segment. However, after the server receives the invalid connection request segment, it mistakenly believes that it is a new connection request sent by the client again. So it sends a confirmation segment to the client and agrees to establish a connection.

  Assuming that the "three-way handshake" is not used, as long as the server sends an acknowledgement, a new connection is established. Since the client has not issued a request to establish a connection now, it will ignore the confirmation of the server and will not send data to the server. But the server thinks that a new transport connection has been established and has been waiting for the client to send data. In this way, many resources of the server are wasted in vain. The "three-way handshake" approach can prevent the above phenomenon from happening. For example, in the case just now, the client will not issue confirmation to the server's confirmation. Since the server does not receive the confirmation, it knows that the client has not requested to establish a connection. ". The main purpose is to prevent the server side from waiting all the time and wasting resources.

 

3) TCP waves four times

The first wave: The client sends a FIN to close the data transfer from the client to the server, and the client enters the FIN_WAIT_1 state.
The second wave: After the server receives the FIN, it sends an ACK to the client to confirm that the serial number is the received serial number + 1 (same as SYN, one FIN occupies one serial number), and the server enters the CLOSE_WAIT state.
The third wave: The server sends a FIN to close the data transfer from the server to the client, and the server enters the LAST_ACK state.
The fourth wave: After the client receives the FIN, the client enters the TIME_WAIT state, and then sends an ACK to the server, confirming that the serial number is the received serial number + 1, the server enters the CLOSED state, and completes four waveds.

 

4) Why is it a three-way handshake to establish a connection, but a four-way wave to close the connection?

  This is because the server in the LISTEN state, after receiving the SYN message for the connection establishment request, sends the ACK and SYN in one message to the client. When closing the connection, when receiving the FIN message from the other party, it only means that the other party no longer sends data but can still receive data, and not all data is sent to the other party, so the party can immediately close or send some data. After the data is sent to the other party, the FIN message is sent to the other party to express the agreement to close the connection now. Therefore, the own ACK and FIN are generally sent separately.

 

4. The server's permanent redirect response

   The server responds to the browser with a 301 Permanent Redirect response, so that the browser will visit "http://www.google.com/" instead of "http://google.com/".

  Why does the server have to redirect instead of directly sending the content of the web page the user wants to see? One of the reasons has to do with search engine rankings. If a page has two addresses, like http://www.yy.com/ and http://yy.com/, search engines will think they are two websites, resulting in each search link is reduced and thus lower ranking. And search engines know what 301 permanent redirects mean, so they will assign visits to addresses with www and without www to the same website ranking. Also, using different addresses will make it less cache friendly, when a page has several names, it may appear several times in the cache.

expand knowledge

1) The difference between 301 and 302.

  Both 301 and 302 status codes indicate redirection, which means that the browser will automatically jump to a new URL address after getting the status code returned by the server. This address can be obtained from the Location header of the response (the user sees The effect is that the address A he entered instantly becomes another address B) - this is what they have in common.

  Their difference is. 301 indicates that the resource of the old address A has been permanently removed (the resource is inaccessible), and the search engine also exchanges the old URL with the redirected URL while crawling the new content ;

  302 indicates that the resources of the old address A are still available (still accessible). This redirection is only a temporary jump from the old address A to the address B. The search engine will crawl the new content and save the old URL. SEO302 is better than 301

 

2) Redirect reason:

(1) Website adjustment (such as changing the directory structure of the webpage);
(2) the web page is moved to a new address;
(3) The extension of the webpage is changed (for example, the application needs to change .php to .Html or .shtml).
        In this case, if no redirection is performed, the old address in the user's favorites or search engine database can only allow the visiting customer to get a 404 page error message, and the access traffic will be lost in vain; in addition, some websites with multiple domain names registered , it is also necessary to redirect users who visit these domain names to automatically jump to the main site, etc.
 

3) When will the 301 or 302 jump be made?

        When a website or web page temporarily moves to a new location within 24-48 hours, a 302 jump is required, and the scenario of using a 301 jump is that the previous website needs to be removed for some reason, and then go to Access to the new address is permanent.
To be clear and clear: the approximate scenario of using a 301 jump is as follows:
1. I don’t want to renew the domain name when it expires (or I found a domain name that is more suitable for the website), I want to change the domain name.
2. The domain name without www appears in the search results of the search engine, but the domain name with www is not included. At this time, we can use 301 redirection to tell the search engine which domain name our target is.
3. The space server is unstable, when changing the space.
 

5. Browser tracking redirection address

   Now the browser knows that "http://www.google.com/" is the correct address to visit, so it sends another http request. Nothing to say here

 

6. The server processes the request

  After the previous steps, we finally sent our http request to the server. In fact, the previous redirection has already reached the server. So, how does the server process our request?

  When the backend receives a TCP message on a fixed port, it processes the TCP connection, parses the HTTP protocol, and further encapsulates it into an HTTP Request object according to the message format for the upper layer to use.

  Some larger websites will send your requests to the reverse proxy server, because when the website traffic is very large, the website is getting slower and slower, and one server is not enough. Therefore, the same application is deployed on multiple servers, and the requests of a large number of users are distributed to multiple machines for processing. At this time, the client does not directly access the application server of a website through the HTTP protocol, but first requests to Nginx, Nginx then requests the application server, and then returns the result to the client, where Nginx acts as a reverse proxy server. At the same time, it also brings an advantage. If one of the servers hangs, as long as there are other servers running normally, it will not affect the use of users.

as the picture shows:

Through the reverse proxy of Nginx, we reach the web server, and the server-side script processes our request, accesses our database, obtains the content that needs to be obtained, etc. Of course, this process involves a lot of complicated operations of the back-end script. Since I am not familiar with this piece, I can only introduce so much about this piece.

 

Further reading:

1) What is a reverse proxy?

The client can directly access the application server of a website through the HTTP protocol. The website administrator can add a Nginx in the middle, the client requests Nginx, Nginx requests the application server, and then returns the result to the client. At this time, Nginx is a reverse proxy server.

 

7. The server returns an HTTP response 

  After the previous 6 steps, the server receives our request and also processes our request. At this step, it will return its processing result, that is, return an HTTP response.

The HTTP response is similar to the HTTP request. The HTTP response also consists of 3 parts, namely:

l Status line

l Response Header

l Response body

copy code
HTTP/1.1 200 OK
Date: Sat, 31 Dec 2005 23:59:59 GMT
Content-Type: text/html;charset=ISO-8859-1
Content-Length: 122

<html>
<head>
<title>http</title>
</head>
<body>
<!-- body goes here -->
</body>
</html>
copy code

Status line:

The status line is described by the protocol version , the status code in numerical form , and the corresponding status , and the elements are separated by spaces.

格式:    HTTP-Version Status-Code Reason-Phrase CRLF

Example : HTTP/1.1 200 OK \r\n

--Protocol  version : whether to use http1.0 or another version

--Status description : The  status description gives a brief textual description about the status code. For example, when the status code is 200, the description is ok

--Status code : The  status code consists of three digits, the first digit defines the category of the response, and there are five possible values. as follows

 

1xx : Informational status code, indicating that the server has received the client request, and the client can continue to send the request.

    100 Continue

    101 Switching Protocols

2xx : Success status code, indicating that the server has successfully received and processed the request.

    200 OK means the client request was successful

    204 No Content Success, but no body part of the entity is returned

    206 Partial Content Successfully performed a Range request

3xx : Redirection status code, indicating that the server requires the client to be redirected.

    301 Moved Permanently Permanent redirection, the Location header of the response message should have the new URL of the resource

    302 Found Temporary redirection, the URL given in the Location header of the response message is used to temporarily locate the resource

    303 See Other The requested resource has another URI, and the client should use the GET method to obtain the requested resource.

    304 Not Modified The server content has not been updated, you can directly read the browser cache

     307 Temporary Redirect Temporary redirection. Same as 302 Found meaning. 302 prohibits the conversion of POST to GET, but it is not necessarily in actual use. 307 More browsers may follow this standard, but it also depends on the specific implementation of the browser.

4xx : Client error status code, indicating that the client's request has illegal content.

       400 Bad Request indicates that the client request has a syntax error and cannot be understood by the server

       401 Unauthonzed indicates that the request is not authorized, this status code must be used with the WWW-Authenticate header field

       403 Forbidden indicates that the server received the request, but refused to provide the service, usually the reason for not providing the service is given in the response body

       404 Not Found The requested resource does not exist, for example, the wrong URL was entered

5xx : Server error status code, indicating that the server failed to properly process the client's request and an unexpected error occurred.

        500 Internet Server Error Indicates that an unexpected error occurred on the server, resulting in the inability to complete the client's request

        503 Service Unavailable indicates that the server is currently unable to process the client's request. After a period of time, the server may return to normal.

 

Response header:

  Response header: It consists of keyword/value pairs, one pair per line. The keyword and value are separated by an English colon ":". Typical response headers are:

 

response body

Contains some specific information we need, such as cookies, html, image, request data returned by the backend, and so on. It should be noted here that there is a line of space between the response body and the response header, indicating that the information in the response header is up to the space. The following figure is the request body captured by fiddler, in the red box: the response body:

8. The browser displays HTML

  When the browser does not fully accept the entire HTML document, it has already started to display the page. How does the browser render the page on the screen? Different browsers may have different parsing processes. Here we only introduce the rendering process of WebKit. The following figure corresponds to the rendering process of WebKit. This process includes:

Parse html to build dom tree -> build render tree -> layout render tree -> draw render tree

  When the browser parses the html file, it will be loaded "top-down", and will be parsed and rendered during the loading process. During the parsing process, if you encounter a request for external resources, such as pictures, CSS of external links, iconfont, etc., the request process is asynchronous and will not affect the loading of the html document.

  During the parsing process, the browser first parses the HTML file to build the DOM tree, and then parses the CSS file to build the rendering tree. After the rendering tree is constructed, the browser starts to lay out the rendering tree and draw it to the screen. This process is more complicated and involves two concepts: reflow (reflow) and repain (redraw).

  Each element in the DOM node exists in the form of a box model, which requires the browser to calculate its position and size, etc. This process is called relow; when the position, size and other properties of the box model, such as color, font, After it is determined, the browser starts to draw the content, a process called repain.

  Pages are bound to undergo reflow and repain when they are first loaded. The reflow and repain process is very performance-intensive, especially on mobile devices, it can ruin the user experience and sometimes cause the page to freeze. So we should reduce reflow and repain as little as possible.

  

  When the js file is encountered during the document loading process, the html document will suspend the thread of rendering (loading, parsing and rendering synchronization). . Because JS may modify the DOM, the most classic document.write, which means that the download of all subsequent resources may not be necessary before the JS execution is completed. This is the fundamental reason why JS blocks subsequent resource downloads. So in my usual code, js is placed at the end of the html document.

  The parsing of JS is done by the JS parsing engine in the browser, such as Google's V8. JS is single-threaded, that is to say, only one thing can be done at the same time, all tasks need to be queued, the previous task can end, and the next task can start. However, there are some tasks that are time-consuming, such as IO reading and writing, so a mechanism is needed to execute the tasks in the back first, namely: synchronous tasks (synchronous) and asynchronous tasks (asynchronous).

  The execution mechanism of JS can be regarded as a main thread plus a task queue. Synchronous tasks are tasks that are executed on the main thread, and asynchronous tasks are tasks that are placed in the task queue. All synchronous tasks are executed on the main thread to form an execution stack; when the asynchronous task has the running result, an event will be placed in the task queue; when the script is running, the execution stack will be run in sequence, and then the event will be extracted from the task queue and run Tasks in the task queue, this process is repeated, so it is also called the event loop (Event loop). The specific process can be seen in this article: click here

9. The browser sends a request to obtain resources embedded in HTML (such as pictures, audio, video, CSS, JS, etc.)

  In fact, this step can be paralleled in step 8. When the browser displays the HTML, it will notice the tags that need to get the content of the other address. At this point, the browser will send a get request to retrieve the files. For example, I want to get external images, CSS, JS files, etc., similar to the following links:

Image: http://static.ak.fbcdn.net/rsrc.php/z12E0/hash/8q2anwu7.gif

CSS style sheet: http://static.ak.fbcdn.net/rsrc.php/z448Z/hash/2plh8s4n.css

JavaScript file: http://static.ak.fbcdn.net/rsrc.php/zEMOA/hash/c8yzb6ub.js

  These addresses go through a process similar to HTML reading. So the browser will look up these domains in DNS, send requests, redirects, etc...

Unlike dynamic pages, static files allow browsers to cache them. Some files may not need to communicate with the server, but can be read directly from the cache, or can be placed in the CDN

Original link: https://www.cnblogs.com/xianyulaodi/p/6547807.html#_label6

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325196908&siteId=291194637