avant-propos
Python3 explore les images Jingdong et enregistre les fichiers image dans le fichier local.
1. Correspondance d'expression régulière HTML ?
url="https://search.jd.com/Search?keyword="+key+"&wq="+key+"&page="+str(i*2-1)
'data-lazy-img="(.*?)"'
Deux, le code
1. Importer la bibliothèque
import urllib.request
import re
import requests
2. Ajouter un en-tête
headers = ("User-Agent","Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:80.0) Gecko/20100101 Firefox/80.0")
opener =urllib.request.build_opener()
opener.addheaders = [headers]
urllib.request.install_opener(opener)
3. Configurer les produits
keyname = "洋河"#输入商品名称
key = urllib.request.quote(keyname)
4. Obtenir le lien de l'image et enregistrer l'image localement
for i in range(1,2):
url = "https://search.jd.com/Search?keyword="+key+"&wq="+key+"&page="+str(i*2-1);
data = urllib.request.urlopen(url).read().decode("utf-8","ignore")
print(data)
pat = 'data-lazy-img="(.*?)"'
imagelist = re.compile(pat).findall(data)
for j in range(1,len(imagelist)):
b1 = imagelist[j].replace('/n7', '/n0')
print("第"+str(i)+"页第"+str(j)+"张爬取成功")
newurl = "http:"+b1
print(newurl)
r = requests.get(newurl,stream=True)
with open('C:/Users/lishu/Desktop/tensorflow/pc/yh/'+"第"+str(i)+"页第"+str(j)+"张"+".jpg", 'wb') as f:
for html in r.iter_content():
f.write(html)
5. Tous les codes
import urllib.request
import re
import requests
keyname = "洋河"#输入商品名称
key = urllib.request.quote(keyname)
headers = ("User-Agent","Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:80.0) Gecko/20100101 Firefox/80.0")
opener =urllib.request.build_opener()
opener.addheaders = [headers]
urllib.request.install_opener(opener)
for i in range(1,2):#爬取页数
url = "https://search.jd.com/Search?keyword="+key+"&wq="+key+"&page="+str(i*2-1);
data = urllib.request.urlopen(url).read().decode("utf-8","ignore")
pat = 'data-lazy-img="(.*?)"'
imagelist = re.compile(pat).findall(data)
for j in range(1,len(imagelist)):
b1 = imagelist[j].replace('/n7', '/n0')
print("第"+str(i)+"页第"+str(j)+"张爬取成功")
newurl = "http:"+b1
print(newurl)
r = requests.get(newurl,stream=True)
with open('C:/Users/lishu/Desktop/tensorflow/pc/yh/'+"第"+str(i)+"页第"+str(j)+"张"+".jpg", 'wb') as f:
for html in r.iter_content():
f.write(html)
Résumer
Principalement pour la situation où le chemin du fichier de urllib.request.urlretrieve() ne peut pas enregistrer le répertoire chinois, utilisez requests.get() pour enregistrer l'image en local.