Nowadays, when young people chat, they are embarrassed to say that they are young people without some emojis. Emojis have become an indispensable part of people-to-people chat.
A friend I just met throws a few emojis out and gets into the relationship every minute. My girlfriend is sullen and happy with the two emojis. It can also resolve the embarrassment. I don’t have time to type the whole two emojis.
Life is too short, I use python
One, want to promote first
Preparation is very important. First, we need to know what we are going to do, what to do with it, and how to do it, and then go step by step in real time and play steadily.
Development environment configuration
Python 3.6
Pycharm
Open your browser and search for the name of the software you want to install
Python
The official website is the official website. If there is an advertisement under the name, don't click on it. Be confident, it is an advertisement.
Just click Python 3.10.2 below to download the latest version, no need to click Download
pycharm
Just click on a Download
Professional Edition Community Edition is OK
The installation method is too long to talk about one by one, you can scan the code at the bottom of the article to have a video
Module installation configuration
requests
parsel
re
Turn on the computer, press and hold win+r, enter cmd, press Enter, enter pip install (plus the name of the module to be installed), press Enter to install.
2. Code
Goal: Let everyone complete the front and back of the fabiaoqing
address, including the code in the back, there should be no problem.
import module
import requests
import parsel
import re
import time
request url
url = f'fabiaoqing/biaoqing/lists/page/{page}.html'
request header
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36'
}
Return to the source code of the webpage
response = requests.get(url=url, headers=headers)
Analytical data
selector = parsel.Selector(response.text) # 把respons.text 转换成 selector 对象
The first extraction extracts all the div tag content
divs = selector.css('#container div.tagbqppdiv') # css 根据标签提取内容
Extract his image url address from the tag content
img_url = div.css('img::attr(data-original)').get()
extract title
title = div.css('img::attr(title)').get()
Get the suffix name of the image
name = img_url.split('.')[-1]
save data
new_title = change_title(title)
Send a request to the emoji image to get its binary data
img_content = requests.get(url=img_url, headers=headers).content
save data
def save(title, img_url, name):
img_content = get_response(img_url).content
try:
with open('img\\' + title + '.' + name, mode='wb') as f:
# 写入图片二进制数据
f.write(img_content)
print('正在保存:', title)
except:
pass
Replace special characters in title
Because the file name is unknown and there are special characters, we need to replace the special characters with regular expressions.
def change_title(title):
mode = re.compile(r'[\\\/\:\*\?\"\<\>\|]')
new_title = re.sub(mode, "_", title)
return new_title
record time
time_2 = time.time()
use_time = int(time_2) - int(time_1)
print(f'总共耗时:{use_time}秒')
Brothers, here is a single thread, the following is a multi-thread, I will go directly to the code.
import requests
import parsel
import re
import time
import concurrent.futures
def change_title(title):
mode = re.compile(r'[\\\/\:\*\?\"\<\>\|]')
new_title = re.sub(mode, "_", title)
return new_title
def get_response(html_url):
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36'
}
repsonse = requests.get(url=html_url, headers=headers)
return repsonse
def save(title, img_url, name):
img_content = get_response(img_url).content
try:
with open('img\\' + title + '.' + name, mode='wb') as f:
f.write(img_content)
print('正在保存:', title)
except:
pass
def main(html_url):
html_data = get_response(html_url).text
selector = parsel.Selector(html_data)
divs = selector.css('#container div.tagbqppdiv')
for div in divs:
img_url = div.css('img::attr(data-original)').get()
title = div.css('img::attr(title)').get()
name = img_url.split('.')[-1]
new_title = change_title(title)
save(new_title, img_url, name)
if __name__ == '__main__':
time_1 = time.time()
exe = concurrent.futures.ThreadPoolExecutor(max_workers=10)
for page in range(1, 201):
url = f'fabiaoqing/biaoqing/lists/page/{page}.html'
exe.submit(main, url)
exe.shutdown()
time_2 = time.time()
use_time = int(time_2) - int(time_1)
print(f'总共耗时:{use_time}秒')
Brothers, there are more than 1,000 pictures in 18 seconds. This is a bit too fast to end.
If you find it useful after reading it, please like and save it. I love you and touch it. You can see that the code runs so fast. So fast, not good~