Figure embarrassing - picture crawling

Figure embarrassing - picture crawling

The main idea

1. The law came home, see home page there is a useful picture of html

2. Write re extracted image path

3. Right picture to see the picture of the specific request path

4. The entire image request path

5. Check the path to the next screen, find the path request interface law

6.work, multi-interface specified picture crawling reptiles

import requests
import re
import os
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36"
}

def get_page(page_size):
     for i in range(1,page_size+1):
         url = f"https://www.qiushibaike.com/pic/page/{i}/?s=5222080"
         res=requests.get(url=url,headers=headers)
         #解析图片路径
         pic_list=re.findall('<div class="thumb">[\s\S]*?<img src="(.*?)" alt',res.text,re.S)
         for i in pic_list:
             i='https:'+i
             pic_res=requests.get(url=i,headers=headers).content
             file_name=i.split("/")[-1]
             #图片数据写入本地文件夹
             with open(f'pic/{file_name}',"wb")as fw:
                 fw.write(pic_res)
                 print(file_name+"写入成功")

if __name__ == '__main__':
    if not os.path.exists("./pic"):
        os.mkdir("./pic")
    #自定义爬取界面页数
    get_page(3)

Guess you like

Origin www.cnblogs.com/zx125/p/11404564.html