Analyze Ajax requests and capture today's headlines

Ready to work

requests、Beautiful Soup、MongoDB

Crawl analysis

Before crawling, first analyze the logic of crawling, and open the home page of today's headline https://www.toutiao.com/ as shown in the figure

There is a search entrance in the upper right corner, here we try to capture street photos, all enter the word "street shooting" and search, the results are as shown below:

At this time, open the developer tool to view all network requests. First, open the first network request. The URL of this request is the current link: https://www.toutiao.com/search/?keyword=Street Shooting ,

Refresh the interface and check the response result. The content on the page is not found as follows

Switch to XHR view and find the information we need

article_url is the link to the detailed content

Look at the Headers again, this is the request parameter we need to construct

Enter the content detail page to view the response information, and find the page linked to each image in the Doc:

Practical drill

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324647109&siteId=291194637