browser running process

insert image description here

The basic process of http request

  1. After the browser gets the ip corresponding to the domain name, it first initiates a request to the url in the address bar and gets a response
  2. In the returned response content (html), there will be url addresses such as css, js, pictures, and ajax codes. The browser sends other requests in sequence according to the order in the response content, and obtains the corresponding response
  3. Every time the browser gets a response, it adds (loads) the displayed results, js, css and other content will modify the content of the page, and js can also resend the request to get the response
  4. From obtaining the first response and displaying it in the browser, until finally obtaining all the responses, and adding content or modifying the displayed results - this process is called the browser's渲染

Notice:

But in the crawler, the crawler will only request the url address, and get the response corresponding to the url address (the content of the response can be html, css, js, pictures, etc.)

The page rendered by the browser is often different from the page requested by the crawler, because the crawler does not have the ability to render (of course, other tools or packages will be used to help the crawler render the response content in subsequent studies)

  • The final result displayed by the browser is the result of multiple responses corresponding to multiple requests sent by multiple url addresses.
  • Therefore, in the crawler, it is necessary to extract data based on the response corresponding to a url address that sends the request.

Guess you like

Origin blog.csdn.net/m0_67268191/article/details/131712225
Recommended