Preface
The text and pictures in this article are from the Internet and are for learning and communication purposes only, and do not have any commercial use. If you have any questions, please contact us for processing.
Python crawler, data analysis, website development and other case tutorial videos are free to watch online
https://space.bilibili.com/523606542
Preamble content
Python crawler beginners introductory teaching (1): crawling Douban movie ranking information
Python crawler novice introductory teaching (2): crawling novels
Python crawler beginners introductory teaching (3): crawling Lianjia second-hand housing data
Python crawler novice introductory teaching (4): crawling 51job.com recruitment information
Python crawler beginners' introductory teaching (5): Crawling the video barrage of station B
Python crawler novice introductory teaching (6): making word cloud diagrams
Python crawler beginners introductory teaching (7): crawling Tencent video barrage
Python crawler novice introductory teaching (8): crawl forum articles and save them as PDF
Python crawler beginners introductory teaching (9): multi-threaded crawler case explanation
Python crawler novice introductory teaching (ten): crawling the other shore 4K ultra-clear wallpaper
Python crawler beginners introductory teaching (11): recent king glory skin crawling
Python crawler novice introductory teaching (12): the latest skin crawling of League of Legends
Basic development environment
- Python 3.6
- Pycharm
Use of related modules
import requests
import re
import os
Install Python and add it to the environment variables, pip installs the required related modules.
One, clear needs
Crawl the HD wallpapers inside as shown
2. Web page data analysis
Click to download the original image, it will automatically download the wallpaper image for you.
So just get this link to crawl the wallpaper image.
When you return to the list, you can find that the web page is loaded in a waterfall flow mode, and data will only appear when you slide down. Therefore, you can open the developer tools before sliding down the webpage, and the newly loaded data will appear when the webpage is scrolled down.
Through the comparison, we can know that this data package contains the address of the wallpaper image download.
Note that this data link is a post request, not a get request
The data parameter that needs to be submitted is the corresponding page number.
Three, code implementation
1. Get the image ID
for page in range(1, 11):
url = 'https://wallpaper.wispx.cn/cat/%E5%8A%A8%E6%BC%AB'
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36',
'x-requested-with': 'XMLHttpRequest',
}
data = {
'page': page
}
response = requests.post(url=url, headers=headers)
result = re.findall('detail(.*?)target=', response.text)
for index in result:
image_id = index.replace('\\', '').replace('" ', '')
page_url = f'https://wallpaper.wispx.cn/detail{image_id}'
2. Get the wallpaper url address and save it
def main(page_url):
html_data = get_response(page_url).text
image_url = re.findall('<a class="mdui-ripple mdui-ripple-white" href="(.*?)">', html_data)[0]
image_title = re.findall('<title>(.*?)</title>', html_data)[0].split(' - ')[0]
image_content = get_response(image_url).content
path = 'images\\'
if not os.path.exists(path):
os.makedirs(path)
with open(path + image_title + '.jpg', mode='wb') as f:
f.write(image_content)
print('正在保存:', image_title)
Points to note:
The request header must be anti-leech, otherwise it will not be downloaded.
def get_response(html_url):
header = {
'referer': 'https://wallpaper.wispx.cn/detail/1206',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'
}
resp = requests.get(url=html_url, headers=header)
return resp
Fourth, achieve the effect