python crawler website images

Target

During the tiring life process, enjoy the fun brought by other aspects. The goal is web crawlers, and I also learned about the content related to web page production. I organize it here, so that I can review it in the future and let everyone learn together and put forward my valuable opinions.

website

Image site: https://www.58pic.com/
insert image description here
content
select site https://m.58pic.com/newpic/44666739.html

insert image description here

process:

crawl these small pictures
insert image description here

  1. Right mouse button selection检测
  2. Click the arrow gesture in the image below
    insert image description here
  3. After selecting the image, find the corresponding link (https://preview.qiantucdn.com/auto_machine/2023031…78c1be-f9a1-45bc-88bc-e9efe1269b58.jpg!qt_kuan320)
    insert image description here
  4. right click查看网页源代码
    insert image description here
  5. Links to related images can be seen
    insert image description here
  6. Just 正则化表达式query this link and download the image

Code

import os
import time

import requests
import re


if __name__ == "__main__":
    url = "https://m.58pic.com/newpic/44666739.html"
    headers = {
    
    'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Mobile Safari/537.36'}
    
    '''获取网页信息'''
    response = requests.get(url=url, headers=headers, timeout=300)
    html = response.text
    # print(ret.text)

    '''解析网页'''

    urls = re.findall('data-original=".*?"', html)
    print(urls)

    save_path = "images"
    os.makedirs(save_path, exist_ok=True)
    '''保存图像'''
    for idx, url in enumerate(urls):
        time.sleep(1) # 密集请求容易对他人服务器造成影响
        img = 'http://' + url.split('=')[-1][3:-1]
        response = requests.get(img, headers=headers)
        # 图像名称可以根据自己的情况进行设置
        with open(save_path+"/"+str(idx)+'.jpg', 'wb') as f:
            f.write(response.content)

Useful related content:
http://c.biancheng.net/view/2011.html

https://www.bilibili.com/video/BV1MK4y1n7TT/spm_id_from=333.337.search-card.all.click&vd_source=8bcf27281d52eb1d4b92e7d635cf444d

https://www.bilibili.com/video/BV1qJ411S7F6/?spm_id_from=333.337.search-card.all.click&vd_source=8bcf27281d52eb1d4b92e7d635cf444d

Guess you like

Origin blog.csdn.net/frighting_ing/article/details/129963587