Python爬虫福利第二弹---爬取妹子图最新图片

1. 采集网站:妹子图,点击直达

    采集内容:图片

     网站如图,(自己去网站查看),太...(主要是怕过审不了),基本类似

                                           

2.采集思路:

    如下图,翻页,图片链接都可以直接获取到,不涉及反爬,不详细分解,直接上code

3.整体代码: 

# -*- coding: UTF-8 -*-
'''
@Author :Jason
抓取妹子图,记得创建文件夹 images
'''
import requests
from bs4 import BeautifulSoup

def getMeizituImages():
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36",
        "referer":"https://www.mzitu.com/",
    }

    total_page = int(input("请输入要爬取的页数:"))
    for page in range(total_page+1,total_page+2):
        url = "https://www.mzitu.com/page/{}/".format(str(page))
        res = requests.get(url,headers=headers)
        soup = BeautifulSoup(res.text, 'lxml')
        imaUrl = soup.find_all(name="img", attrs={"class": "lazy"})
        for urlSrc in imaUrl:
            url = urlSrc["data-original"]
            title = url.split("/")[-1:]
            try:
                response = requests.get(url, headers=headers)
                filename = title[0]
                response.encoding = "utf-8"
                image = response.content
                with open("./images/" + filename, "wb")as f:
                    f.write(image)
                    # print("%s" % filename, "下载成功")
            except:
                print("图片{}保存失败".format(title[0]))
        print("第{}页保存成功".format(total_page))

if __name__ == "__main__":
    getMeizituImages()

4.最终采集效果:

                                                             

不放了,无法过审,自己运行代码看看效果

                        

发布了128 篇原创文章 · 获赞 95 · 访问量 35万+

猜你喜欢

转载自blog.csdn.net/qq_36853469/article/details/103799770