python +requests 实现爬取百度图片 - 代码天地

python +requests 实现爬取百度图片

其他 2019-01-26 20:28:36 阅读次数: 0

利用python +requests 实现爬取百度图片

#!/usr/bin/python
# -*- coding:utf-8 -*-
import requests
import json
import re
import os


class BaiduImage(object):

    def __init__(self):
        super(BaiduImage, self).__init__()

        self.page = 60  # 当前页数
        if not os.path.exists(r'./image'):
            os.mkdir(r'./image')

    def request(self):
        try:
            while True:
                request_url = 'http://image.baidu.com/search/avatarjson?tn=resultjsonavatarnew&ie=utf-8&word=%E7%BE%8E%E5%A5%B3&cg=girl&rn=60&pn=' + str(
                    self.page)
                headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0',
                           'Content-type': 'test/html'}

                response = requests.get(request_url, headers=headers)

                if response.status_code == 200:
                    data = response.text
                    decode = json.loads(data)             # 把数据转换成一个map
                    self.download(decode['imgs'])

                self.page += 60

        except Exception as e:
            print(e)
        finally:
            response.close()

    def download(self, data):

        for d in data:

            url = d['objURL']

            pattern = re.compile(r'.*/(.*?)\.jpg', re.S)
            print('pattern', pattern)
            item = re.findall(pattern, url)

            headers = {
                "User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36"}
            response = requests.get(url, headers=headers, stream=True)
            FileName = str('image/') + item[0] + str('.jpg')

            with open(FileName, "wb") as op:
                for chunk in response.iter_content(128):
                    op.write(chunk)


if __name__ == '__main__':
    bi = BaiduImage()
    bi.request()

本人亲测成功。有不懂的欢迎咨询

猜你喜欢

转载自blog.csdn.net/q1454739828/article/details/60593920

python +requests 实现爬取百度图片

Python图片爬取方案，百度图片爬取程序设计，requests模块的用法

python+requests爬取百度文库ppt

python爬虫爬取百度贴吧图片，requests方法

使用python中的requests爬取百度翻译实现中英互译功能

python 使用requests模块,爬取百度贴吧内容

Python 实现爬取百度图片

requests爬取百度图片示例

使用requests 库爬取百度图片

Python爬取百度图片

Python 爬取百度图片

【Python】爬取百度图片

Python 百度图片爬取

python 爬虫--利用百度图片处理OCR识图API进行验证码识别，并通过python、requests进行网站信息爬取（二）实战

python 爬虫--利用百度图片处理OCR识图API进行验证码识别，并通过python、requests进行网站信息爬取（一）

python 爬虫（二）requests模块的介绍 + 基于requests模块的get请求和post请求 + 相关爬取案例（百度贴吧 + 百度产品 + 有道翻译 + 百度翻译）

python百度搜索url爬取图片

用python爬取百度图片

Python爬虫案例：爬取百度图片

python3 爬取百度图片

python 3 爬取百度图片

使用python3爬取百度图片

python爬取百度贴吧张国荣图片

python爬虫，爬取百度图片

用Python 编写爬取百度图片，可用

python 多线程爬取百度图片

python爬虫爬取百度贴吧图片

Python爬取百度下载图片

python爬虫小程序,爬取百度图片

Python爬取百度贴吧图片

今日推荐

火速冲上 GitHub 热榜 —— 开源编程语言、框架哪有这么可爱？

北京人形机器人创新中心发布全球首个纯电驱拟人奔跑的全尺寸人形机器人“天工”

LFOSSA 源来如此公开课 | 掌握云原生未来：CNCF 认证全面攻略与备考秘籍

国产云输入法——仅华为无云端数据上传安全问题

开源日报 | 工业开源项目OGG 1.0；姐姐，你要和我一起配置火狐吗；苹果AI遥遥落后？Fedora 40

开放签电子签章：停止新增，优化体验，前进更进（五一假期前工作）

开源日报 | 中学生开源前端动画引擎；全球首个Llama3 8B中文版开源模型；联想电脑恐出局；Linus讽刺AI炒作

周排行

浏览器对同一域名进行请求的最大并发连接数

React Hook之自定义Hook

【转】MyBatis缓存机制

-Java-泛型

自动化测试常用脚本-发送邮件

LeetCode#859: Buddy Strings

java、Python处理字符串

第二篇の博客

Hadoop伪分布式环境安装

SQL Server进阶（十一）临时表、表变量

每日归档

更多

2024-04-27(56)

2024-04-26(39)

2024-04-25(22)

2024-04-24(36)

2024-04-23(26)

2024-04-22(39)

2024-04-21(0)

2024-04-20(6)

2024-04-19(5)

2024-04-18(0)