Full analysis of WEB page analysis process

1. Understand the entire process of domain name resolution

Resolution process:
(1) Initiate a query for the domain name www.baidu.com to the ISPDNS or DNS server set on the local computer network
(2) After ISPDNS receives the request, check whether there is an address record corresponding to the domain name in its cache. If there is a record, the ip address is returned directly, but this record will be marked as a reply from the non-authoritative server.
(3) If the DNS server has no record of this domain name, ISPDNS will read the addresses of 13 root domain name servers from the configuration file ([AM] .root-servers.net)
Insert picture description here
(4) According to the location, load, ISPDNS Initiate a request to the mirror server of one of the root domain name servers
(5) After the root domain name server receives the request, it resolves that the top-level domain name of the request is com., And then returns the NS record in the com domain, generally 13 hosts Name and ip (hidden)
Insert picture description here
(6) After receiving the returned result, ISPDNS sends another request to one
of the servers. (7) The server in the com domain parses this request after receiving the request. The secondary domain is baidu.com, so the domain name is retrieved. NS record, return the result
(8) After receiving the returned result, ISPDNS initiates a request to
Insert picture description here
the authoritative server in the baidu.com domain (9) After receiving the request, the authoritative server in the baidu.com domain finds that there is www this host, and this host ip returned to ISPDNS
Insert picture description here
(10) ISPDNS after receipt of the return result, return it to the client, and this record is stored in its cache
Insert picture description here
to resolve illustration:

Insert picture description here

2. Understand the entire process of web page request and draw a flowchart (11 processes handled by nginx)

Request process:
(1) DNS resolution

  1. · View browser cache

  2. · View the hosts file cache

  3. · View router cache

  4. · View ISP resolution server cache

  5. · Use ISP-DNS to initiate recursive query request

  6. · The client obtains the IP address corresponding to the URL

(2) Establish a TCP socket and initiate a TCP connection

  1. · Three handshake

(3) Server permanent redirect response
(4) The browser tracks the redirect address and sends an http request to the web server
(5) The server accepts the request

(6) Server processing request (Nginx)

  1. · NGX_HTTP_POST_READ_PHASE After receiving the complete HTTP header post-processing stage

  2. · NGX_HTTP_SERVER_REWRITE_PHASE URI modification stage, used for redirection

  3. · NGX_HTTP_FIND_CONFIG_PHASE according to the URI to find the location block configuration items before matching

  4. · NGX_HTTP_REWRITE_PHASE modify the URI after finding the location block in the previous stage

  5. · NGX_HTTP_POST_REWRITE_PHASE prevents the endless loop caused by rewriting URL

  6. · NGX_HTTP_PREACCESS_PHASE Preparation before the next stage

  7. NGX_HTTP_ACCESS_PHASE lets the HTTP module determine whether to allow this request to enter the Nginx server

  8. · NGX_HTTP_POST_ACCESS_PHASE sends a denial of service error code to the user

  9. · NGX_HTTP_TRY_FILES_PHASE set for accessing static file resources

  10. · NGX_HTTP_CONTENT_PHASE Stage of processing HTTP request content

  11. · NGX_HTTP_LOG_PHASE Logging phase after processing the request

(7) The server returns an HTML response
(8) The browser starts parsing and rendering after receiving the response

  1. · Build a dom tree, using deep traversal method

  2. · Parse CSS into CSS Rule Tree

  3. · Build Render tree according to DOM tree and CSSOM

  4. · Layout Render Tree

  5. · Draw Render Tree

3. Study the fields and meanings in the http protocol

HTTP request / response message structure:
· Request line · Response line
· Request header · Response header
· Request body · Blank line
· Response body
Example of request message: Example of
Insert picture description here
response message:
Insert picture description here
(1) Request method: POST, GET, PUT , DELETE
(2) Request URL: http: //chapter17/user.html
(3) HTTP protocol and version: HTTP1.1
(4) Accept: indicates that the client expects the media format returned by the server, but because the server may not have all The expected resource type, so Accept will set multiple types and set the priority
(5) Accept-Charset: indicates the encoding format of the content that the client expects the server to return. Multiple codes can be specified.
(6) Accept-Language: indicates the content language that the client expects to return.
(7) Content-Type: indicates the media type and encoding format of the content. The GET method usually uses text / html format. The POST method usually uses application / x-www-form-urlencoded.
(8) Content-Language: This field is the response to Accept-Language. Through this field, the server informs the client of the language of the body information returned
(9) Content-Length: indicates the body length of the transmitted request / response. GET does not have this body because there is no body. When the body is too large, this field is not required for block transmission.
(10) Content-Location: indicates that the server informs the client of other optional addresses of the requested resource
(11) Content-MD5: This field is used to verify Body content.
(12) Date: If the server does not have a cache, it indicates the instant generation time of the response message. If the server has a cache, it indicates the time when the response content is cached.
(13) Age: indicates the time that the resource has been cached, in seconds
(14) Expires: indicates that the server informs the client when the resource is invalid.
(15) ETag: indicates a resource tag, each resource can provide multiple tag information commonly used to judge the effectiveness of the resource.
(16) Allow: Indicates the HTTP Method type that the resource supports to access.
(17) Connection: indicates the properties of the client and server negotiating the connection. The common value is close, telling the other party to close the connection after the current request ends.
(18) Except: used to ask the server for permission before sending the request.
(19) From: Generally used to mark the email address of the request originator, which is equivalent to assigning a responsible person to the request.
(20) Host: The RFC protocol stipulates that all HTTP requests must carry the Host header, even if there is no value, an empty string should be appended. Indicates the address of the host that initiated the request.
(21) Last-Modified: Mark the last modification time of the resource.
(22) IF-Modified-Since: When the browser requests static resources from the server, if the browser has a cache locally, it will carry this header, the value is the resource's Last-Modified time, and ask the server whether it has been modify.
(23) Range: indicates the request byte range specified when the client requests a part of the resource.
(24) Content-Range: indicates the byte range of the transmitted Body data in the overall resource block when the server responds to the request.
(25) Location: indicates that when the server sends a response message 302 to the client, it points to the target URL.
(26) Max-Forwards: indicates that the number of gateways or proxies is limited, that is, the maximum number of forwards.
(27) Referer: Commonly used in the same-origin restriction strategy, which represents the URI of the origin of the request, that is, the parent page of the current page resource. By tracing Referer, we can draw complex jump chains between resource pages.
(28) Server: Used to return server-related software information to inform the client that the current HTTP service is provided by XYZ software.
(29) User-Agent: carries the current user agent information, generally including the version and model information of the browser, browser kernel and operating system.
(30) Transfer-Encoding: indicates what kind of transformation needs to be adopted for Body data when responding to the Body information transmitted by the packet.
(31) Upgrade: indicates that the server recommends that the client upgrade the transmission protocol
(32) Vary: This field is used for cache control, and then adding this header to the request packet can tell the cache server to use different cache units for responses to different Vary parameters.
(33) Via: This field is used to mark a gateway routing node that a request passes through, indicating gateway information
(34) Warning: Used to add some additional warning information in the response, including error codes and error descriptions.
(35) WWW-Authenticate: It is the header that must be carried when the 401 Unauthorized error code is returned. This header will carry a Challenge to the client, informing the client that the client needs to carry the answer to this question to request the server to continue to access the target resource.
(36) Authorization: For some resources that require special permissions to access, the client needs to provide user name and password authentication information in the request. It is a response to WWW-Authenticate
(37) cache-control: This field can be used for both requests and responses.
When the value of the field is no-cache, the cache is not allowed. For the request, the server must not use the cache content to return directly. For the response, the client must not cache the resource content of the response.
When the value of the field is no-store, it means that the request / response data should not be persisted to other places. This kind of information is sensitive, and it should be kept volatile.
When the value of the field is no-transform, it means that the other party should not transform the data.
When the value of the field is only-if-cached, it is only used in the request header, telling the server not to reload as long as the content has been cached.
When the value of the field is max-stale, it is only used in the request header, indicating that the client allows the server to return the resource content of the cache that has expired, but the maximum expiration time is limited.
When the value of the field is min-fresh, it is only used in the request header, indicating that the client restricts the server from those resource contents that are about to expire.
When the value of the field is public, it is only used in the response header, indicating that the client is allowed to cache the response information and can be used by others.
When the value of the field is private, it is only used in the response header, which means that the client is only allowed to cache the response information for its own use, and must not be shared with others.

4. Learn the HTTP request method and the type and meaning of the returned status code

1xx: Message
100 Continue The server only received part of the request, but once the server did not reject the request, the client should continue to send the remaining requests
101 Switching Protocols Server switching protocol, the server will follow the client's request to switch to another protocol

2xx: Successful
200 OK request succeeded
201 Created request was created and a new resource was created
202 Accepted The request for processing has been accepted, but the processing has not been completed
203 Non-authoritative Information
document has returned normally, but some response headers may be Incorrect, because a copy of the document is used
204 No Content No new document, the browser should continue to display the original document. If the user refreshes the page regularly, and the Server can determine that the user's document is sufficiently new, this status code is useful.
205 Reset Content There is no new document, but the browser should reset the content it displays to force the browser to clear the form Input content
206 Partial Content The client sent a GET request with a Range header and the server completed it

3xx: Redirect
300 Multiple Choices multiple choices, link list, users can select a connection to reach the destination, allow up to five addresses
301 Moved Permanently The requested page has been transferred to the new URL
302 Found The requested page has been temporarily transferred to The new URL
303 See Other The requested page can be found under another URL.
304 Not Modified The document was not modified as expected. The client's buffered document developed a conditional request (generally providing an IF-Modified-Since header indicating that the customer only wants documents that are newer than the specified date). The server tells the customer that the original The buffered document can continue to be used.
305 Use Proxy The document requested by the client should be extracted from the proxy server specified in the Location header
and used in the previous version. This code is no longer in use but the code remains.
307 Temporary Redirect The requested page has been temporarily transferred to a new URL

4xx: Client error
400 Bad Request The server failed to understand the request
401 Unauthorized The requested page requires a username and password
402 Payment Required This code is not yet available
403 Forbidden The requested page requires a username and password
404 Not Found The server cannot find the requested Page
405 Method Not Allowed The method specified in the request is not allowed
406 Not Acceptable The response generated by the server cannot be accepted by the client
407 Proxy Authentication Required The user must first use a proxy server to verify, so that the request will be processed
408 Request Timeout request The server's waiting time is exceeded
409 Conflict Due to conflict, the request cannot be completed
410 Gone The requested page is not available
411 Length Required "Content-Length" is not defined, if there is no such content server will not accept the request
412 Precondition Failed in the request Prerequisites are evaluated as failure by the server
413 Request Entry Too Large The request is not accepted due to the large size of the request
414 Request-url Too Long The request will not be accepted due to the long URL
415 U nsupported Media Type The server does not accept the request because the media type is not supported
416 The server cannot satisfy the Range
417 Expectation Failed specified by the client in the request

5xx: Server error
500 Internal Server Error request is not completed, the server encountered an unpredictable situation
501 Not Implemented request is not completed, the server does not support the requested function
502 Bad Gateway request is not completed, the server received an invalid response from the upstream server
503 Service Unavaliable request is not completed, the server is temporarily overloaded or down
504 Gateway Timeout Gateway timeout
505 HTTP Version Not Supported The server does not support the HTTP protocol version specified in the request

Published 21 original articles · won 14 · visited 4075

Guess you like

Origin blog.csdn.net/m0_38103658/article/details/101538556