When you learn the Python crawler, the online picture material will be free

Preface

The text and pictures in this article are from the Internet and are for learning and communication purposes only. They do not have any commercial use. The copyright belongs to the original author. If you have any questions, please contact us for processing.

Join the author’s python learning circle: 1156465813, you can get it for free, all the information is in the group file. Materials can be collected, including but not limited to Python practical exercises, PDF electronic documents, interview highlights, learning materials, etc.

 

1. Project background

If you want to find a suitable picture on the material network, you need to scroll down page by page. Now that you learn python, you can use the program to save all the pictures, and slowly select the appropriate picture.

 

2. Project goals

1. Obtain the source code of the web page according to the given URL.

2. Use regular expressions to filter out the image addresses in the source code.

3. Download the material picture from the filtered picture address.

 

3. Libraries and websites involved

1. The URL is as follows:

https://www.51miz.com

 

2. Libraries involved:

requests、lxml

Fourth, project analysis

First, we need to solve the problem of how to request the URL of the next page. You can click the button on the next page and observe the changes of the website as follows:

https://www.51miz.com/so-sucai/1789243.html

https://www.51miz.com/so-sucai/1789243/p_2/

https://www.51miz.com/so-sucai/1789243/p_3/

We can find that the number of picture pages is 1789243/p{}, and the number in curly brackets of p{} indicates which page of the picture.

5. Project implementation

1. Open Mizhi.com and enter the picture material you want in the search (take the picture of the year of the rat as an example).

 

2. According to the analysis of the URL in the previous step, first we define a class called ImageSpider, which defines the initialization function, send request to get response data function, analysis function, and main function. First initialize the function, prepare the URL address and headers, the code is shown in the figure below.

3. Send a request to obtain the corresponding data function.

 

4. Analyze the data, use xpath to get the secondary page link, and finally store the picture in the folder. Use Google Chrome to select developer tools or directly press F12, and find that the image src we need is under the img tag, so use Python requests to extract the component.

 

5. The main function, the code is shown in the figure below.

 

Six, effect display

1. Run the program and enter the number of pages you want to crawl in the console, as shown in the figure below.

2. You can see the effect picture locally, as shown in the figure below.

 

At last

If you want to learn Python or are learning Python, there are a lot of Python tutorials, but are they the latest? Maybe you learned something that people learned two years ago. Let me share a wave of the latest Python tutorials for 2020. Join my learning circle: 1156465813, you can receive learning materials

Guess you like

Origin blog.csdn.net/qq_38887171/article/details/109129543