Small series about the beginning of this month to write this piece of code, but unfortunately, probably No. 20, looking again Baidu picture page stream and found a waterfall and go back, but fortunately the code inside left page URL stream
So now come for everyone to share.
Language: python3.6
库:requests, re, urllib
In addition to the requests need to pip install, comes with python are two other modules, can be called directly.
The original URL code: https: //image.baidu.com/search/flip tn = baiduimage & ie = utf-8 & word = little sister & pn = 0?
Direct Copy the URL can view the first page of photos of Baidu picture page flow of small sister. Well, in the following code, as detailed comments.
#!/usr/bin/python3
# -*- coding:utf-8 -*-
# Author:water_chen
import requests
import re
from urllib import request
def get_picture_list(keyword,biggest_pages):
all_picture_list = []
for page in range(biggest_pages):
# 每一页20张图片, 所以翻页的是0 20 40 80 这样变化的
page = page * 20
url = 'https://image.baidu.com/search/flip?tn=baiduimage&ie=utf-8&word={}&pn={}'.format(keyword, page)
#
html = requests.get(url).content.decode('utf-8')
picture_list = re.findall('{"thumbURL":"(.*?)",', html)# 用正则匹配,获得图片的url
all_picture_list.extend(picture_list)
all_picture_list = set(all_picture_list)# 因为第二页也有后面两页的图片,所以要去重
download_picture(all_picture_list)
# 下载图片
def download_picture(all_picture_list):
for i, pic_url in enumerate(all_picture_list):
print(i)
# 在代码的路径下,新建一个picture,图片会由urlretrieve函数保存到本地
string = 'picture/{}.jpg'.format(str(i + 1))
request.urlretrieve(pic_url, string)
# 开始函数
def start():
# 你想搜索的关键词
keyword = '小姐姐'
# 你想搜索的页数
biggest_pages = 10
get_picture_list(keyword, biggest_pages)
if __name__ == '__main__':
start()
It quickly save the URL of the page flow, now Baidu is a waterfall picture, if you want to climb, you need selenium to scroll pages, too much trouble, this code can be relatively easy to obtain.
If helpful, please point a praise, thank you. Blog Park address: https: //www.cnblogs.com/chenyuan404/p/10192758.html