Lecture 12: The principle and analysis of Ajax

When we use requests to fetch a page, the results we get may be different from what we see in the browser: the page data normally displayed in the browser, using requests but no results. This is because the requests obtained are all original HTML documents, and the pages in the browser are the results generated after JavaScript data processing. There are many sources of these data, which may be loaded through Ajax, may be contained in an HTML document, or generated after calculation by JavaScript and specific algorithms.

For the first case, data loading is an asynchronous loading method. The original page will not contain some data. Only after loading, will it request an interface to obtain the data from the server, and then the data will be processed and presented to the web page Above, this process actually sends an Ajax request to the server interface.

According to the development trend of the Web, there will be more and more pages of this form. The original HTML document of the web page does not contain any data. The data is presented after being uniformly loaded through Ajax, so that the front and back ends can be separated in Web development, and the pressure brought by the server directly rendering the page can be reduced.

So if you encounter such a page, it is impossible to obtain valid data by directly using libraries such as requests to fetch the original page. At this time, we need to analyze the Ajax request sent to the interface by the backend of the web page. If requests can be used to simulate the Ajax request, it can be successfully captured.

So, in this lesson, we will understand what Ajax is and how to analyze and capture Ajax requests.

What is Ajax

Ajax, the full name is Asynchronous JavaScript and XML, that is, asynchronous JavaScript and XML. It is not a programming language, but a technology that uses JavaScript to exchange data with the server and update part of the web page while ensuring that the page will not be refreshed and the page link will not change.

For traditional web pages, if you want to update its content, you must refresh the entire page. With Ajax, you can update the content of the page without being completely refreshed. In this process, the page actually performs data interaction with the server in the background. After the data is obtained, JavaScript is used to change the page, so that the content of the page will be updated.

You can go to W3School to experience a few demos to get a feel for it: http://www.w3school.com.cn/ajax/ajax_xmlhttprequest_send.asp .

Instance introduction

When browsing the web, we will find that many web pages have scroll down to view more options. Take the homepage of my Weibo as an example: https://m.weibo.cn/u/2830678474 . We switched to the Weibo page and found that after sliding down a few Weibo, the following content will not be displayed directly, but a loading animation will appear. After the loading is completed, new Weibo content will continue to appear below. This process is actually Ajax The loading process is shown in the figure:

Insert picture description here
We noticed that the page was not completely refreshed, which means that the link on the page has not changed, but there is new content on the page, which is the new Weibo that was posted later. This is the process of obtaining new data and presenting it through Ajax.

Fundamental

After a preliminary understanding of Ajax, let's learn more about its basic principles. The process of sending an Ajax request to a web page update can be simply divided into the following 3 steps:

send request
Parse content
Render webpage

We will introduce these processes in detail below.

send request

We know that JavaScript can implement various interactive functions of the page, and Ajax is no exception. It is implemented by JavaScript, and the following code is actually executed:

var xmlhttp;
if (window.XMLHttpRequest) {
    
    
    //code for IE7+, Firefox, Chrome, Opera, Safari
    xmlhttp=new XMLHttpRequest();} else {
    
    //code for IE6, IE5
    xmlhttp=new ActiveXObject("Microsoft.XMLHTTP");
}
xmlhttp.onreadystatechange=function() {
    
    if (xmlhttp.readyState==4 && xmlhttp.status==200) {
    
    document.getElementById("myDiv").innerHTML=xmlhttp.responseText;
    }
}
xmlhttp.open("POST","/ajax/",true);
xmlhttp.send();

This is JavaScript's lowest level implementation of Ajax. This process is actually creating a new XMLHttpRequest object, then calling the onreadystatechange property to set the monitor, and finally calling the open() and send() methods to send a request to a link (that is, the server).

After we used Python to send the request, we can get the response result, but here the request is sent by JavaScript. Since the monitoring is set up, when the server returns a response, the method corresponding to onreadystatechange will be triggered. We can parse the response content in this method.

Parse content

After getting the response, the method corresponding to the onreadystatechange attribute will be triggered. At this time, the response content can be obtained by using the responseText attribute of xmlhttp. This is similar to the process of using requests to initiate a request to the server in Python and then get a response.

The returned content may be HTML or JSON, and then we only need to use JavaScript in the method for further processing. For example, if the returned content is JSON, we can parse and transform it.

Render webpage

JavaScript has the ability to change the content of a web page. After parsing the response content, you can call JavaScript to perform the next step of processing the web page based on the parsed content. For example, through operations such as document.getElementById().innerHTML, the source code in a certain element is changed, so that the content displayed on the web page is changed. This kind of operations such as changing and deleting the Document web page document is also called Do DOM manipulation.

In the above example, the operation document.getElementById("myDiv").innerHTML=xmlhttp.responseText changes the HTML code inside the node whose ID is myDiv to the content returned by the server, so that the new element returned by the server will appear inside the myDiv element Data, part of the content of the web page seems to be updated.

As you can see, the three steps of sending the request, parsing the content, and rendering the web page are actually completed by JavaScript.

Ajax analysis

Take the previous Weibo as an example. We know that the content to be refreshed by dragging is loaded by Ajax, and the URL of the page has not changed. At this time, where should we check these Ajax requests?

Here also need to use the browser's developer tools, the following takes the Chrome browser as an example to introduce.

First, use the Chrome browser to open the Weibo link https://m.weibo.cn/u/2830678474 , then click the right mouse button on the page, and select the "check" option from the pop-up shortcut menu, and then the development will pop up Maker tool, as shown:
Insert picture description here

As mentioned earlier, here are all records of requests sent and responses received between the browser and the server during the page loading process.

Ajax has its special request type, which is called xhr. In the figure, we can find a request starting with getIndex whose Type is xhr, which is an Ajax request. Click this request with the mouse to view the detailed information of this request.
Insert picture description here
Information such as Request Headers, URL and Response Headers can be observed on the right. There is a message in the Request Headers as X-Requested-With: XMLHttpRequest, which marks the request as an Ajax request, as shown in the figure:

Then we click Preview, and you can see the content of the response, which is in JSON format. Here Chrome does the analysis automatically for us, click the arrow to expand and collapse the corresponding content.

We can observe that the returned result is my personal information, including nickname, profile, avatar, etc. This is also the data used to render the personal homepage. After JavaScript receives these data, it executes the corresponding rendering method, and the entire page is rendered.
Insert picture description here
In addition, we can also switch to the Response tab, which observed the real return data, as shown:

Next, cut back to the first request, look at what its Response, as shown below:

This is The result returned by the original link https://m.weibo.cn/u/2830678474 has less than 50 lines of code and a very simple structure, except that some JavaScript is executed.

Therefore, the real data of the Weibo page we see is not returned from the original page, but after executing JavaScript, it sends an Ajax request to the background again, and the browser gets the data and renders it further.

Filter request

Next, we use the filtering function of Chrome Developer Tools to filter out all Ajax requests. There is a layer of filter bar above the request, click XHR directly, at this time all the requests displayed below are Ajax requests, as shown in the figure:
Insert picture description here
Next, keep sliding the page, you can see that there are new ones at the bottom of the page Weibo was posted, and Ajax requests continued to appear under the developer tools, so that we could capture all Ajax requests.

If you click on an item at will, you can clearly see its Request URL, Request Headers, Response Headers, Response Body and other content. At this time, it is very simple to simulate the request and extract.

The content shown in the figure below is the list information of a certain page of my Weibo:
Insert picture description here
Up to now, we have been able to analyze some detailed information of Ajax requests, and then we only need to simulate these Ajax requests with a program to easily extract them The information we need.