Python抓取CaoLiu上的图片 - 代码天地

Python抓取CaoLiu上的图片

其他 2019-01-24 09:41:03 阅读次数: 0

版权声明：本文为博主原创文章，未经博主允许也可以转载，注明转载即可。 https://blog.csdn.net/xiligey1/article/details/84866969

"""TODO: 只抓热门精华可以提高图片质量"""
import re

import requests
from bs4 import BeautifulSoup


def get_page_urls(page_url):
    """获取当前翻页的所有帖子的链接"""
    text = str(requests.get(page_url).content, encoding='gbk')
    soup = BeautifulSoup(text, 'lxml')
    css = 'html body div#main div.t table#ajaxtable tbody tr.tr3.t_one.tac td.tal'
    return list(map(lambda x: x.h3.a['href'], soup.select(css)))


def get_page_photourls(page_url):
    """根据详情页地址获取该页的所有图片的网址列表"""
    text = str(requests.get(page_url).content, encoding='gbk')
    pattern = re.compile('<input src=\'(http:.+?)\' type=\'image\' onclick="window.open')
    return re.findall(pattern, text)


def main():
	secret = "网址保密，需要请私聊"
    url = 'https://%s/thread0806.php?fid=16&search=&page=' % secret
    page_number = 173
    n = 0
    for i in range(page_number):
        try:
            print('正在打开第%s个翻页' % (i + 1))·
            page_url = url + str(i + 1)  # 第i页的地址
            article_urls = get_page_urls(page_url)  # 第i页的所有帖子的地址
            for article in article_urls:
                print('正在解析网址: https://%s/%s' % (secret,article))
                photos = get_page_photourls('https://%s/' % secret + article)
                for photo in photos:
                    n += 1
                    filename = photo.split('/')[-1]
                    with open(filename, 'wb') as f:
                        f.write(requests.get(photo).content)
                        print('成功下载第%s张图片' % n)

        except Exception as e:
            print(e)
            continue


if __name__ == '__main__':
    main()

猜你喜欢

转载自blog.csdn.net/xiligey1/article/details/84866969

Python抓取CaoLiu上的图片

抓取网页上的图片(一)

Python抓取图片

用python抓取图片代码

python抓取接口，保持图片

python网络爬虫抓取图片

用python抓取网页的图片

Ajax数据抓取及抓取今日头条上的图片

java爬虫抓取网络上的图片

Python实现的分析Ajax抓取今日头条上特定内容图片

[python应用]python简单图片抓取

python网络数据抓取二（bing图片抓取）

python网络数据抓取三（斗图网图片抓取）

Python3爬虫图片抓取

Python爬虫 —— 抓取美女图片

使用Python实现网站图片抓取

使用python实现简单网页图片抓取

Python爬虫之网页图片抓取

Python爬虫之gif图片抓取

python多任务抓取虎牙妹子图片

python 爬虫抓取网站img图片

python爬虫-- 抓取网页、图片、文章

Python3 抓取网页中的图片

Python的Requests的图片抓取和代理使用！

python学习-抓取知乎图片

python网络爬虫抓取网站图片

python3 简单抓取图片2

python批量抓取美女图片

python抓取页面文本及图片超链接

python 图片抓取并保存到本地

今日推荐

开源日报 | Chrome内置Gemini的意义不在于Gemini；中国AI追随之路的五大误区；ECharts创始人“下海”养鱼；谷歌I/O开发者大会什么都有，只是没有惊喜

微软回应中国区AI团队“打包赴美”传闻

基于大语言模型的开源知识库问答系统 MaxKB GitHub Star 数量突破 5,000 个！

美国拟限制 AI 大模型出口中国和俄罗斯

苹果将与 OpenAI 达成协议，将 ChatGPT 应用于 iPhone

openKylin 社区生态委员会第六次会议圆满召开

阿里云正式发布通义千问 2.5

Python 3.13 发布首个 Beta：实验性自由线程模式和 JIT、改进交互式解释器

Stack Overflow 拿我的代码去训练 AI 大模型，还封了我的账号

Pop!_OS 的 COSMIC 桌面完成 App Store 上架工作

《2024 年一季度互联网投融资运行情况》研究报告

报告：Django 仍然是 74% 开发者的首选

周排行

laravle中orm简单的增删改查

文本分类特征选取之CHI开方检验

Spark核心编程-WordCount

大数据开发实战系列之电信客服(1)

读书笔记 - 把时间当作朋友 by 李笑来

python 笔记--if else

SpringBoot/Mybatis/Druid, 多数据源MultiDataSource配置思路

排序三个整数

redis集群搭建【2】-Windows中Redis集群搭建

STM32F030驱动TM1650点亮4联数码管

每日归档

更多

2024-05-16(6)

2024-05-15(24)

2024-05-14(0)

2024-05-13(18)

2024-05-12(0)

2024-05-11(38)

2024-05-10(38)

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)