In-depth web request process

Foreword

With the advent of Web 2.0, the Internet network architecture has evolved from the traditional C/Stransformation architecture to a more convenient and efficient B/Sarchitecture, B / S architecture greatly simplifies the difficulty of the user network applications, improve the user experience.

B/S The architecture brings the following two benefits:

  • The client uses a unified browser ( Browser). Due to the unity of the browser, no special configuration and network connection are required. In addition, the interactive nature of the browser makes it very easy for users to use it, and the inheritance of user behavior is very strong, that is, as long as the user has learned to surf the Internet, no matter which application is used, once learned, he has the experience of using any other Internet service .
  • The server ( Server) is based on a unified one HTTP. It is different from the traditional C / S architecture using a custom application layer protocol. Using a unified HTTP simplifies the development mode, and has a lot of HTTP-based servers, such as Apache, Nginx, Tomcatand so on, these servers can directly use it, not only that, even the general framework for the development of services can also directly use it, no need to develop separate such as Spring, Spring MVC, MyBatisand so on, we can focus on business logic services, also simplifies our development work.

Overview of B / S network architecture

B/SBased unified application layer protocol HTTPto interact with the data, with the most C/Sdifferent interactive mode long connection Internet applications used. HTTPA stateless short connection communication method is usually used. Normally, a request completes a data interaction, and then the communication connection is disconnected this time. Using this method can effectively respond to more user requests.

When the input in the browser antoniopeng.com this URLtime and press Enter, many operations occur:

Conceptual diagram of B / S architecture network request

  1. First, a request DNSto resolve the domain name to the corresponding IPaddress.
  2. Then according to this IPaddress to find the corresponding server on the Internet, launched a (GET / POST / ...) request to the server. The server returns the default data resources to the accessed user, and there may also be very complicated business logic on the server side.
    • There may be many servers, and a load balancing device (such as Nginx) distributes all users' requests evenly.
    • And whether the requested data is stored in the cache or in a static file, or in the database.
  3. Finally, when the data back to the browser, resolve to find some static resources (such as CSS, , JS) IMGwill initiate additional time HTTPrequest, and these requests are likely to be in CDNon, then the CDNserver will process these requests.

How to make a request

The problem is simple and complex, simply means that when we are in a browser data URL, press the Enter key to initiate this HTTPrequest, will soon be able to return to see the result of this request. Complex means that the request can be initiated without the help of a browser.

And a HTTPconnection is essentially a Socketconnection, then we can fully simulate the browser to initiate a HTTPrequest. Apache HttpClientA process is implemented by a program open HTTPrequest toolkit.

The following is based on a HttpClientcall example:

Introduce dependencies

In pom.xmlthe Add org.apache.httpcomponents:httpclientdependent

<dependency>
    <groupId>org.apache.httpcomponents</groupId>
    <artifactId>httpclient</artifactId>
    <version>4.5.5</version>
</dependency>复制代码

Create Http Get request

The implementation code is as follows

import org.apache.http.HttpEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;

import java.io.IOException;

public class MyTest {
    public static void main(String[] args) {
        get();
    }

    private static void get() {
        // 创建 HttpClient 客户端
        CloseableHttpClient httpClient = HttpClients.createDefault();
        // 创建 HttpGet 请求
        HttpGet httpGet = new HttpGet("http://www.baidu.com");
        // 设置长连接
        httpGet.setHeader("Connection", "keep-alive");
        // 设置代理(模拟浏览器版本)
        httpGet.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36");
        // 设置 Cookie
        httpGet.setHeader("Cookie", "UM_distinctid=34342706a09352-0376059833914f-3c604504-1fa400-16442706a0b345; CNZZDATA1262458286=1603637673-1530123020-%7C1530123020; JSESSIONID=805587506F1594AE02DC45845A7216A4");

        CloseableHttpResponse httpResponse = null;
        try {
            // 请求并获得响应结果
            httpResponse = httpClient.execute(httpGet);
            HttpEntity httpEntity = httpResponse.getEntity();
            // 输出请求结果
            System.out.println(EntityUtils.toString(httpEntity));
        } catch (IOException e) {
            e.printStackTrace();
        }

        // 无论如何必须关闭连接
        finally {
            if (httpResponse != null) {
                try {
                    httpResponse.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }

            if (httpClient != null) {
                try {
                    httpClient.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    }
}复制代码

In addition to Javathe use of very common HttpClienttools, in addition to the command line of curlcommand, through the curl + URL you can simply initiate a HTTPrequest

  • input the command
curl https://www.baidu.com复制代码
  • Return HTML data results
<!DOCTYPE html>
<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta content=always name=referrer><link rel=stylesheet type=text/css href=https://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/bdorz/baidu.min.css><title>鐧惧害涓€涓嬶紝浣犲氨鐭ラ亾</title></head> <body link=#0000cc> <div id=wrapper> <div id=head> <div class=head_wrapper> <div class=s_form> <div class=s_form_wrapper> <div id=lg> <img hidefocus=true src=//www.baidu.com/img/bd_logo1.png width=270 height=129> </div> <form id=form name=f action=//www.baidu.com/s class=fm> <input type=hidden name=bdorz_come value=1> <input type=hidden name=ie value=utf-8> <input type=hidden name=f value=8> <input type=hidden name=rsv_bp value=1> <input type=hidden name=rsv_idx value=1> <input type=hidden name=tn value=baidu><span class="bg s_ipt_wr"><input id=kw name=wd class=s_ipt value maxlength=255 autocomplete=off autofocus=autofocus></span><span class="bg s_btn_wr"><input type=submit id=su value=鐧惧害涓€涓?class="bg s_btn" autofocus></span> </form> </div> </div> <div id=u1> <a href=http://news.baidu.com name=tj_trnews class=mnav>鏂伴椈</a> <a href=https://www.hao123.com name=tj_trhao123 class=mnav>hao123</a> <a href=http://map.baidu.com name=tj_trmap class=mnav>鍦板浘</a> <a href=http://v.baidu.com name=tj_trvideo class=mnav>瑙嗛</a> <a href=http://tieba.baidu.com name=tj_trtieba class=mnav>璐村惂</a> <noscript> <a href=http://www.baidu.com/bdorz/login.gif?login&amp;tpl=mn&amp;u=http%3A%2F%2Fwww.baidu.com%2f%3fbdorz_come%3d1 name=tj_login class=lb>鐧诲綍</a> </noscript> <script>document.write('<a href="http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u='+ encodeURIComponent(window.location.href+ (window.location.search === "" ? "?" : "&")+ "bdorz_come=1")+ '" name="tj_login" class="lb">鐧诲綍</a>');
                </script> <a href=//www.baidu.com/more/ name=tj_briicon class=bri style="display: block;">鏇村浜у搧</a> </div> </div> </div> <div id=ftCon> <div id=ftConw> <p id=lh> <a href=http://home.baidu.com>鍏充簬鐧惧害</a> <a href=http://ir.baidu.com>About Baidu</a> </p> <p id=cp>&copy;2017&nbsp;Baidu&nbsp;<a href=http://www.baidu.com/duty/>浣跨敤鐧惧害鍓嶅繀璇?/a>&nbsp; <a href=http://jianyi.baidu.com/ class=cp-feedback>鎰忚鍙嶉</a>&nbsp;浜琁CP璇?30173鍙?nbsp; <img src=//www.baidu.com/img/gs.gif> </p> </div> </div> </div> </body> </html>复制代码

HTTP parsing

To understand HTTPthe most important thing is to be familiar with HTTPthe HTTP Header, which controls the transmission of data. Most importantly, it controls the rendering behavior of the browser and the execution logic of the server. For example, when the server does not have the data requested by the user, it will return a 404 status code, telling the browser that there is no data to request, usually the browser will display a very unwilling to see "This page does not exist" error message.

Common HTTP request headers

Request header Explanation
Accept-Charset Specify the character set received by the client
Accept-Encoding Specify acceptable encoding (eg Accept-Encoding: gzip.deflate)
Accept-Language Specify a natural language (eg Accept-Language: zh-cn)
Host Specify the host and port number of the requested resource (eg Host: www.baidu.com)
User-Agent The client tells the server the operating system, browser and other attributes
Connection Specifies whether the current connection is maintained (eg Connection: Keep-Alive)

Common HTTP response headers

Response header Explanation
Server Server name (eg Server: nginx / 1.17.6)
Content-Type The type of entity sent to the recipient (eg Content-Type: text / html; charset = GBK)
Content-Encoding Corresponding to Accept-Encoding, the encoding adopted by the server
Content-Language Corresponding to Accept-Language, the natural language used by resources
Content-Length Body length
Keep-Alive The time to keep the connection (such as Keep-Alive: timeout = 5)

Common HTTP status codes

status code Explanation
200 Successful request
302 Temporary jump
400 The client request has a syntax error that cannot be recognized by the server
403 The server receives the request, but refuses to provide the service, ie no authority
404 The requested resource does not exist
500 An unexpected error occurred on the server

View HTTP information

Look at a HTTPrequest of request headers and response headers can open a browser through F12 shortcut key debugging tool to view, for example, we are visiting www.baidu.com, press F12 and open the Networkdebug bar to see this HTTP Headercontent

HTTP Header information

Browser caching mechanism

When viewing a Web page are found, usually to consider it is that the browser did not cached, so the general practice is to press Ctrl + F5the key combination once again request this page, so be sure the request is the latest page. Because the press Ctrl + F5key combinations directly to the target URLsends a request, instead of using the browser's cache data.

As shown in the figure, this request did not reach the server, using the browser's cached data

HTTP request header returns cached data

Press the Ctrl + F5key combination to refresh the page, you will find in HTTPthe request header is usually more than two parameters, namely, Cache-Control:no-cacheand Pragma:no-cachethe parameters of the role of the requested content is not cached

After pressing Ctrl + F5 to refresh the page, the HTTP request header returns the latest data

DNS domain name resolution

Internet is to publish by URL (Uniform Resource Locator) and request resources, and URLthe need to resolve the domain name into IPan address to establish a connection with the remote host, how to resolve a domain name into an IP address belongs to the scope of work of DNS resolution.

When the user enters www.baidu.com in the browser, the working steps of DNS resolution are as follows

DNS resolution process

  1. First, the browser will check whether there is a resolved IP address corresponding to this domain name in the cache. If there is in the cache, this parsing process will end. Cache time limit domain name can TTLbe set properties.
  2. If the browser cache does not, it checks the operating system whether there is the domain name corresponding DNSanalytical results, in Windows can C:\Windows\System32\drivers\etc\hostsbe set up file, in Linux this profile is /etc/hosts, to modify this file can also configure the IP results of name resolution.
  3. If the above steps cannot complete the domain name resolution, it will really request the domain name server to resolve the domain name. The operating system will first send Local DNS Serverthe domain name to the domain name server in the region. For example, you access the campus network in the school, then the local domain name server is certainly in your school, if you are in a community access to the Internet, then this Local DNS Serveris the application provider to provide you access the Internet (Telecom, China Mobile and China Unicom), Usually in a corner of the city, not very far.
  4. If you Local DNS Serverstill have not hit directly to ROOT DNS Server(root domain name server) request resolution.
  5. The root domain name server will return to the local domain name server an gLTD Serveraddress of the queried domain name (primary domain name server), which gLTDis an international top-level domain name server, such as .com, .cnetc.
  6. Local DNS Server(Local domain name server) will again have just returned from gTLD Serversending a request.
  7. Accepted the request gTLD Serverto find and return this domain name corresponding to the Name Serveraddress of the DNS server, this Name Serveris usually your domain name registration service provider (such as Ali cloud - million net).
  8. Name ServerAnd then query the storage domain names and IP mapping table, under normal circumstances, to obtain IP domain name record, along with a TTLvalue back to the Local DNS Server(local name server).
  9. Local DNS ServerCaches correspondence between the domain name and IP, cache by the time TTLvalue control, the final result of the analysis is returned to the user.

Domain name resolution

Domain name resolution records are mainly divided into A records, MX records, CNAME records, NS records, and TXT records.

  • A record: Specify the IP address corresponding to the domain name (multiple domain names can be resolved to the same IP, and an IP can only point to one domain name).
  • MX record: Point the mail server under some other domain name to its own mail server.
  • CNAME record: Point one domain name to another domain name.
  • NS record: Specify the DNS resolution server.
  • TXT record: Set a description for a host name or domain name.

CDN working mechanism

CDNThat is, the content distribution network, mainly caches the static data in the website, such as CSS, JS, IMG and other data. After a user request to start the master server dynamic content, then the CDNdownload static data, thereby accelerating the speed of web page data downloaded content.

In general, CDNto achieve scalability, security, reliability, several objectives. The working steps are as follows:

CDN Workflow

  • First, the Local DNS Serverrequest to initiate a local DNS server, usually after iterative resolution back to the domain name registration service provider to resolve.
  • There is usually a DNSparsing the domain name server will again CNAMEresolve to another domain name, the domain name will eventually be directed to CDNGlobal in the DNSload balancing server, and then by GTMaccording to access a user's address, recently returned to the user from the access CDNnode.
  • To get CDNthe result of the analysis, the user directly to the CDNnode access the static files, and if this node in the requested file does not exist, it will go back to the source station to get the file, and then returned to the user.

Load balancing

Load balancing ( Load Balance) is to balance and distribute work tasks to multiple operation units to perform tasks together.

It can improve server response speed and utilization efficiency, avoid single point of failure of software, and solve network congestion problems.

There are usually three load balancing architectures:

  • Link load balancing: The advantages are: no need to go through other proxy servers, usually the access speed will be fast, the disadvantage is that there is a cache, it is difficult to update the domain name resolution structure in time.
  • Cluster load balancing
    • Hardware load balancing: The advantage is that the performance is very good, and the disadvantage is that it is very expensive and cannot be dynamically expanded.
    • Software load balancing: The advantage is that the cost is very low, and the disadvantage is that a single access request generally passes through multiple proxy servers, increasing network delay.
  • Operating system load balancing: Use operating system-level soft or hard interrupts to achieve load balancing, such as setting up multiple network cards.

CDN dynamic acceleration

Technical principle: In CDNthe DNSparsing detecting dynamic link back to the source to find the best path, then all DNS requests scheduling scheduled on this selected path back to the source, thereby accelerating the efficiency of the user's access.

Link detection: In each CDNdownload a certain file size of the station from the source node to see which link the shortest total time, so that you can form a link list, and then bind to DNSthe resolution, to update Local DNS Server.

  • Author: Chao Peng
  • This article first appeared on a personal blog: antoniopeng.com/2020/04/07/…
  • Copyright notice: All articles in this blog use the CC BY-NC-SA 4.0 license agreement unless otherwise stated. Reproduced please specify from Chao Peng | Blog !

Guess you like

Origin juejin.im/post/5e9471826fb9a03c930572dd