When we are roaming the online world, when we see some favorite pictures, we can't help but linger and even want to save them, but saving them one by one is too troublesome, then we have to use our python. .
As we all know, one of Python's specialties is batch downloading, so let's try it together today.
Life is too short, I use python
1. If you want to do good work, you must first sharpen your tools
Knowledge points:
- 1. System analysis of the landing page
- 2, html tag data analysis method
- 3. One-click saving of massive image data
software:
python 3.8
pycharm 2021专业版
Those who do not have software can download it on the official website, or scan it on the left to get it
Modules that need to be installed:
requests # 第三方 模块
parsel # 解析数据
Press win+r on the keyboard and enter cmd to open the command prompt window, enter pip install requests to install, the installation method of the two modules is the same.
2. The idea of this article
1. Analyze the website (thought analysis)
① Determine the content to be crawled
The address where the data is located, kanxiaojiejie, please improve the address yourself, including the following code.
②Analyze the data content by viewing the source code of the webpage
2. The crawler process
send request - get data - parse data - save data
- Send a network request kanxiaojiejie to the target website
- Get data (web page source code)
- Parse data (extract data) Album details page address title
- Send a network request to the detail page
- Get data (web page source code)
- Parse data (extract data image link)
- Send request to image link
- save data
3. Code display
module import
import requests
import parsel
send request
def get_response(html_url):
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36'
}
# 为什么这里要 requests.get() post() 请求会更安全...
response = requests.get(url=html_url, headers=headers)
return response
save data
def save(img_url):
img_data = requests.get(img_url).content
img_name = img_url.split('/')[-1]
print("正在下载: " + img_name)
with open("img\\" + img_name, mode='wb') as f:
f.write(img_data)
Analytical data
Get image url address and title to extract data
def parse_1(html_data):
selector = parsel.Selector(html_data)
link_list = selector.css('.entry-title a::attr(href)').getall()
return link_list
Get the image url address and title
ef parse_2(html_data):
selector_1 = parsel.Selector(html_data)
img_list = selector_1.css('.entry.themeform p img::attr(src)').getall()
return img_list
main function
def run(url):
data_html = get_response(url).text
link_list = parse_1(data_html)
for link in link_list:
data_html_1 = get_response(link).text
img_list = parse_2(data_html_1)
for img in img_list:
save(img)
Call the main function to run
for page in range(1, 112):
url = f'kanxiaojiejie/page/{page}'
run(url)
I won't show the running effect, let's try it yourself~