image JD du robot d'exploration python3

 

avant-propos

Python3 explore les images Jingdong et enregistre les fichiers image dans le fichier local.


1. Correspondance d'expression régulière HTML ?

url="https://search.jd.com/Search?keyword="+key+"&wq="+key+"&page="+str(i*2-1)
'data-lazy-img="(.*?)"'

Deux, le code

1. Importer la bibliothèque

import urllib.request
import re
import requests 

2. Ajouter un en-tête

headers = ("User-Agent","Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:80.0) Gecko/20100101 Firefox/80.0")
opener  =urllib.request.build_opener()
opener.addheaders = [headers]
urllib.request.install_opener(opener)

3. Configurer les produits

keyname = "洋河"#输入商品名称
key = urllib.request.quote(keyname)

4. Obtenir le lien de l'image et enregistrer l'image localement

for i in range(1,2):
    url = "https://search.jd.com/Search?keyword="+key+"&wq="+key+"&page="+str(i*2-1);
    data = urllib.request.urlopen(url).read().decode("utf-8","ignore")
    print(data)
    pat = 'data-lazy-img="(.*?)"'
    imagelist = re.compile(pat).findall(data)
    for j in range(1,len(imagelist)):
        b1 = imagelist[j].replace('/n7', '/n0')
        print("第"+str(i)+"页第"+str(j)+"张爬取成功")
        newurl = "http:"+b1
        print(newurl)
        r = requests.get(newurl,stream=True)
        with open('C:/Users/lishu/Desktop/tensorflow/pc/yh/'+"第"+str(i)+"页第"+str(j)+"张"+".jpg", 'wb') as f:
            for html in r.iter_content():
                f.write(html)

5. Tous les codes

import urllib.request
import re
import requests
keyname = "洋河"#输入商品名称
key = urllib.request.quote(keyname)
headers = ("User-Agent","Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:80.0) Gecko/20100101 Firefox/80.0")
opener  =urllib.request.build_opener()
opener.addheaders = [headers]
urllib.request.install_opener(opener)
for i in range(1,2):#爬取页数
    url = "https://search.jd.com/Search?keyword="+key+"&wq="+key+"&page="+str(i*2-1);
    data = urllib.request.urlopen(url).read().decode("utf-8","ignore")
    pat = 'data-lazy-img="(.*?)"'
    imagelist = re.compile(pat).findall(data)
    for j in range(1,len(imagelist)):
        b1 = imagelist[j].replace('/n7', '/n0')
        print("第"+str(i)+"页第"+str(j)+"张爬取成功")
        newurl = "http:"+b1
        print(newurl)
        r = requests.get(newurl,stream=True)
        with open('C:/Users/lishu/Desktop/tensorflow/pc/yh/'+"第"+str(i)+"页第"+str(j)+"张"+".jpg", 'wb') as f:
            for html in r.iter_content():
                f.write(html)

 


Résumer

Principalement pour la situation où le chemin du fichier de urllib.request.urlretrieve() ne peut pas enregistrer le répertoire chinois, utilisez requests.get() pour enregistrer l'image en local.

Je suppose que tu aimes

Origine blog.csdn.net/weixin_42748604/article/details/109098616
conseillé
Classement