Ready to work
requests、Beautiful Soup、MongoDB
Crawl analysis
Before crawling, first analyze the logic of crawling, and open the home page of today's headline https://www.toutiao.com/ as shown in the figure
There is a search entrance in the upper right corner, here we try to capture street photos, all enter the word "street shooting" and search, the results are as shown below:
At this time, open the developer tool to view all network requests. First, open the first network request. The URL of this request is the current link: https://www.toutiao.com/search/?keyword=Street Shooting ,
Refresh the interface and check the response result. The content on the page is not found as follows
Switch to XHR view and find the information we need
article_url is the link to the detailed content
Look at the Headers again, this is the request parameter we need to construct
Enter the content detail page to view the response information, and find the page linked to each image in the Doc: