What happens after the browser enters a URL and press Enter

What happens after the browser enters a URL and press Enter
###1. What happens after the browser enters a URL and press Enter

参考 What really happens when you navigate to a URL

As a software developer, you must know the overall workflow of the web app, and what technologies are involved: browser, HTTP, HTML, web server, request and so on.

1) First you enter an address from the browser as follows:

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-3JLhfGFl-1605454959578)(http://igoro.com/wordpress/wp-content/uploads/2010/02 /image4.png)]

2) The browser looks up the ip address corresponding to this domain name:

[External link image transfer failed. The origin site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-r2nth1ly-1605454959580)(http://igoro.com/wordpress/wp-content/uploads/2010/02 /image13.png)]

In order to improve the access resolution speed, the designer designed several layers of cache, including:

[1] Browser cache-The browser cache records DNS records for a certain period of time. This time is not controlled by the operating system, but is built into the browser, usually between 2-30 minutes. This cache is a cookie. The domain name cookie can only be set to the current domain name or top-level domain name, and others will not be generated.

[2] OS cache-When the browser cache does not find the corresponding DNS record, the browser uses the system call to obtain the IP, and the operating system has its own cache. This cache is actually the hosts file. It is located at c:/windows/system32/drivers/etc/hosts on Windows systems and /etc/hosts on Linux systems. If there is a corresponding key-value pair below, use gethostbyname("www.facebook .com”) to get 10.110.110.120.

10.110.110.120 www.facebook.com

[3] Router cache-Internet access in the LAN is generally based on a router. This router has a domain name cache. As for the specific cache method, time, etc., there is no specific study.

[4] ISP (Internet Service Provider) Cache-Look up the server that caches DNS, which is equivalent to a secondary DNS server, which can generally be found.

[5] Recursive search----If the ISP's domain name server cannot find the corresponding record, your ISP will search recursively from the domain name server. The search direction starts from the top-level domain name server to Facebook's domain name server. (I understand that the relationship between the domain name server and the ISP domain name server is the relationship between the father and the child. If the child can’t find the father, the father is found. Otherwise, it doesn’t work. Moreover, there is often more than one domain name server, but in the form of a cluster, so recursion is required Find)

The recursive search process is as follows:

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-G4sK8eaG-1605454959582)(http://igoro.com/wordpress/wp-content/uploads/2010/02 /500pxAn_example_of_theoretical_DNS_recursion_svg.png)]

One thing to note is that some facebook or org seem to only map an ip address, so the following methods are generally used to solve the bottleneck problem.

Round Robin DNS -polling to read DNS, how to understand this, it is assumed that Facebook has five servers, the above resources are the same, the hardware performance is also the same, when the customer service accesses the resources of Facebook, you can visit any one The server is enough, so a very simple way is designed. Each user accesses the domain name in turn to resolve the domain name to a different ip, so that the number of accesses received by each server reaches a balance. Of course, it is only the number. The requested resources will definitely be difference.

Load-balancer -load balancing, adding weights on the basis of the above method, so that the request is resolved to the corresponding server ip according to the server processing capacity.

Geographic DNS — Geographically divided DNS server, which resolves domain names to nearby or better servers according to the customer's geographic location. This method is mainly applied to static content, while dynamic content involves synchronization updates and other issues. The effect is not so obvious .

Anycast — is rarely used and does not adapt well to the TCP protocol. It is a routing technology that maps an IP address to multiple hosts. Most DNS servers use Anycast to obtain efficient and low-latency DNS lookups.

3) The browser sends an HTTP request to the web server

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-reVXwJ5q-1605454959583)(http://igoro.com/wordpress/wp-content/uploads/2010/02 /image22.png)]

For a dynamic page like the facebook homepage, the cache in the browser will expire soon after being visited once, so it is necessary to resend the request to the facebook server. The request includes three parts:

(1) The GET request defines the URL to be read: "http://facebook.com/".

(2) The definition of the browser itself (User-Agent)

(3) What type of response it expects to receive (Accept and Accept-Encoding).

GET http://facebook.com/ HTTP/1.1
Accept: application/x-ms-application, image/jpeg, application/xaml+xml, [...]
User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; [...]
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
Host: facebook.com
Cookie: datr=1265876274-[...]; locale=en_US; lsd=WW[...]; c_user=2101[...]

The Connection header is the state maintained after the tcp three-way handshake. This state ensures that the connection will not be disconnected temporarily. In addition, it includes the cookie requesting this domain name. As everyone knows, the role of cookie is to record and track the status of different website requests. These statuses include login name or password, a certain authentication token, and some user settings. These cookies are stored in The client will be brought along every time the same domain name is requested.

There are many tools for viewing access requests, such as Fiddler, httpwatch, firebug, wireshark, etc. Can imitate http request, including js file, cookie and other arbitrary construction.

Of course, those who do software development must be familiar with post requests, which are used to submit form forms.

There is a small tip. It is the ending slash http://facebook.com/ and http://example.com/folderOrFile, the latter will cause one more request, because there is no ending slash, it is not clear whether it is a file or a folder, and the file will be the first To access, if not correct, redirect to the folder.

4) Permanent redirection of facebook service

[External link image transfer failed, the source site may have an anti-leech link mechanism, it is recommended to save the image and upload it directly (img-edkDJDFq-1605454959584)(http://igoro.com/wordpress/wp-content/uploads/2010/02 /image8.png)]

The following is the request sent by the facebook server to the browser:

HTTP/1.1 301 Moved Permanently
Cache-Control: private, no-store, no-cache, must-revalidate, post-check=0,
      pre-check=0
Expires: Sat, 01 Jan 2000 00:00:00 GMT
Location: http://www.facebook.com/
P3P: CP="DSP LAW"
Pragma: no-cache
Set-Cookie: made_write_conn=deleted; expires=Thu, 12-Feb-2009 05:09:50 GMT;
      path=/; domain=.facebook.com; httponly
Content-Type: text/html; charset=utf-8
X-Cnection: close
Date: Fri, 12 Feb 2010 05:09:51 GMT
Content-Length: 0

The purpose of this request is to allow the browser to visit http://www.facebook.com/ instead of http://facebook.com/. Why should this happen? Why not send the content directly to the user? The explanation is as follows:

[1] The search ranking of browsers, such as domestic Baidu, will regard the above two domain names as different domain names. For the sake of formalization, application companies will use the former domain name for ranking. Imagine if two domain names are used in the same number Under user requests, it is definitely not as good as a domain name search for application providers. For example, if 1,000 user requests are all accessed on one domain name, the weight is greater than 500 users in the first domain name, and 500 users in another domain. A domain name.

[2] If two domain names are cached, this is actually not friendly, not a thousand Hamlet, but a Zhuge Liang. You can understand the truth after thinking about it.

5) The browser sends the real request to the server

At this point, the browser knows that the correct address is http://www.facebook.com/, so it sends the following redirect request caused by the server.

GET http://www.facebook.com/ HTTP/1.1
Accept: application/x-ms-application, image/jpeg, application/xaml+xml, [...]
Accept-Language: en-US
User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; [...]
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
Cookie: lsd=XW[...]; c_user=21[...]; x-referer=[...]
Host: www.facebook.com

Note that except for the request url, the header is the same as the first request.

####6) The server resolves the request

Web server software

After the server receives the request, it judges and decides to handle it with a GET request. The handler is actually a program that generates an HTML to send to the client. As above, the visit is http://www.facebook.com/. According to the server-side configuration or program, it will be located to a certain page, such as http://www.facebook.com/index.html or http://www.facebook.com/index without a suffix, of course as a uri, Eventually, it will be matched to a specific file to be accessed by the server according to its own suffix matching rules. If the latter one will add .html.

Request handler

Request processing to read its parameters and cookies, may read data for operation or view, and generate an html response according to specific needs. For a dynamic website, it often involves a database, which may also be distributed in different places, and may involve rpc calls.

7) The server sends back an HTML response

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-nemeu2YR-1605454959585)(http://igoro.com/wordpress/wp-content/uploads/2010/02 /image10.png)]

Here is the response:

HTTP/1.1 200 OK
Cache-Control: private, no-store, no-cache, must-revalidate, post-check=0,
    pre-check=0
Expires: Sat, 01 Jan 2000 00:00:00 GMT
P3P: CP="DSP LAW"
Pragma: no-cache
Content-Encoding: gzip
Content-Type: text/html; charset=utf-8
X-Cnection: close
Transfer-Encoding: chunked
Date: Fri, 12 Feb 2010 09:05:55 GMT

The size of the response is 35KB. It is mainly transmitted in the blob type, and is compressed according to the response body required by the client—gzip. After decompression, you can see the following html.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"   
      "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" 
      lang="en" id="facebook" class=" no_js">
<head>
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-language" content="en" />
...

In addition to providing the compression method, the header information also explains the cache Cache-Control, and the expiration time Expires. At the same time, notice that the Content-Type: text/html; charset=utf-8 is set, corresponding to http-equiv="Content-type" content="text/html; charset=utf-8" in html, which tells the browser to Use html to parse instead of files or other forms.

8) The browser starts to display HTML

The browser displays the html, you don't have to wait for the full analysis. Therefore, it is often seen that only half of the web pages are displayed.

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-nCVGHixS-1605454959586)(http://igoro.com/wordpress/wp-content/uploads/2010/02 /image6.png)]

9) The browser sends a request to parse other content embedded in the browser

For example, if the browser contains css files, pictures, js code, etc., it will request the corresponding resources to continue parsing, as follows:

  • Imageshttp://static.ak.fbcdn.net/rsrc.php/z12E0/hash/8q2anwu7.gif
    http://static.ak.fbcdn.net/rsrc.php/zBS5C/hash/7hwy7at6.gif

  • CSS style sheetshttp://static.ak.fbcdn.net/rsrc.php/z448Z/hash/2plh8s4n.css
    http://static.ak.fbcdn.net/rsrc.php/zANE1/hash/cvtutcee.css

  • JavaScript files
    http://static.ak.fbcdn.net/rsrc.php/zEMOA/hash/c8yzb6ub.js
    http://static.ak.fbcdn.net/rsrc.php/z6R9L/hash/cq2lgbs8.js

Each URL sends its own request as before, but notice that most of these resources are static, so the browser will cache them after the first visit. Except for the first visit, everything else is basically from the cache. Of course, the server’s response contains the retention period of these static files, telling the browser how long to cache them.

Another point is that Facebook uses CDN, the content distribution network. It uses CDN to distribute these static files. CDN often leaves backups in many CDN data centers. These static content are often processed by separate servers, such as dedicated image processing servers, such as Alibaba Cloud and Qiniu.

10) The browser sends an AJAX request

In web2.0, after the client parses the html, it can still communicate with the server through AJAX, using js code to construct requests, such as partial refresh, partial return of content, real-time monitoring and so on.

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-OUDeJHZ1-1605454959587)(http://igoro.com/wordpress/wp-content/uploads/2010/02 /image12.png)]

If you are interested in the working principle of webserver, you can take a look at my work to simply implement the function of webserver. The principle is: accept http requests and construct response requests on the server. This response request must strictly follow the http protocol. For example, header content, etc., finally generate an html file and send it to the client. The specific code can be seen in the myWebServer project under hulichao_framework .

Guess you like

Origin blog.csdn.net/hu_lichao/article/details/79191070