python 斗图图片爬虫 - 代码天地

python 斗图图片爬虫

其他 2018-06-13 21:27:51 阅读次数: 0

捣鼓了三小时，有一些小Bug，望大佬指导

废话不说，直接上代码：

#!/usr/bin/python3
# -*- coding:UTF-8 -*-
import os,re,requests
from urllib import request,parse

class Doutu_api(object):
    def __init__(self):
        self.api_html = r'http://www.doutula.com/search?keyword=%s'
        self.headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 '
                                      '(KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'}
        self.path = os.path.dirname(os.path.realpath(__file__))+'\\temp'

    def make_path(self,path=''):#返回假为已创建，否则创建新文件夹
        self.path = self.path+'\\'+path
        if os.path.exists(self.path):  # 判断文件夹是否存在
            return False
        else:
            os.mkdir(self.path)  # 创建文件夹
            return True

    def get_img_html(self,html):
        self.make_path(path=html)
        html = self.api_html%parse.quote(html)
        pattern = re.compile(u'<a.*?class="col-xs-6 col-md-2".*?href="(.*?)".*?style="padding:5px;">.*?</a>',re.S)
        pattern_img = re.compile(u'<td>.*?<img.*?src="(.*?)".*?alt="(.*?)".*?onerror=".*?">.*?</td>',re.S)
        try:
            req = request.Request(html, headers=self.headers)
            imgs = request.urlopen(req)
            imgs = imgs.read().decode('utf-8')
            imgs = re.findall(pattern, imgs)
            for img in imgs:
                req = request.Request(img, headers=self.headers)
                imgurl = request.urlopen(req).read().decode('utf-8')
                imgurl =re.findall(pattern_img, imgurl)
                with open(self.path+'\\{}.png'.format(imgurl[0][1].replace('/','-')), 'wb') as file:
                    response = requests.get(imgurl[0][0]).content  # 下载图片
                    file.write(response)  # 读取图片
            print('已完成下载,图片地址:',self.path)
        except Exception as e:
            print(e)
        return None

doutu = Doutu_api()
doutu.get_img_html(input('斗图内容关键字：'))

测试成功

猜你喜欢

转载自www.cnblogs.com/canmeng/p/9180046.html

python 斗图图片爬虫

python多线程爬虫+批量下载斗图啦图片

python多线程爬虫+批量下载斗图啦图片项目（关注、持续更新）

Python--爬虫之(斗图啦网站)图片爬取

python爬虫我是斗图之王

Python 爬取斗图啦图片

【python--爬虫】斗图啦表情包爬虫

python爬虫之一 —— 爱斗图图包抓取

PYTHON 2.7爬虫获取斗图啦网站的表情包数据（区分gif和jpg图片格式）

python爬虫：爬取斗图啦数据

Python3 网络爬虫(一) 斗图网

python网络数据抓取三（斗图网图片抓取）

Python多线程爬虫教你如何快速下载表情包，告别斗图斗不赢的烦恼！

python3爬虫 -----爬取斗图息-------www.doutula.com

Python爬虫入门教程，多线程采集斗图啦表情包！

【Python3 爬虫】U28_多线程爬取斗图啦的表情包

「网络爬虫」自从学会了python，斗图就没怕过谁

Python项目实战:爬取斗图网表情包图片

Python-写个gif图片生成器（斗图小神器）

斗图网爬虫

Python制作斗图工具，人称斗帝

Python爬虫入门教程 13-100 斗图啦表情包多线程爬取

Python爬虫入门教程第十三讲：斗图啦表情包多线程爬取

Python爬取斗图网站

python制作斗图生成器

学好Python轻松成为斗图大师！

python 采集斗图啦xpath

【你真的会斗图嘛？】Python爬虫实战项目——你想要的图都可以爬到（附安装地址）

[Python爬虫]使用Scrapy框架爬取图虫图片

python爬虫爬取《斗破苍穹》小说(入门必备)

今日推荐

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

《美国对全球网络空间安全与发展的威胁和破坏》报告发布

火速冲上 GitHub 热榜 —— 开源编程语言、框架哪有这么可爱？

北京人形机器人创新中心发布全球首个纯电驱拟人奔跑的全尺寸人形机器人“天工”

周排行

rbac——界面、权限

Apache CXF + SpringMVC 整合发布WebService

so插件化

Vue.js实战系列---图标字体制作（svg格式）

PAT乙级 1007 素数对猜想(孪生素数对) (20分) ---（C语言 + 详细注释）

被IRM保护的文档，打开失败

Calendar和Date计算日期差的小问题

win10子系统ubuntu18.4安装docker

利用Wrap Shell Script定位Android Native内存泄漏

MySQL: Transaction (Part I - Basic Concept)

每日归档

更多

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)

2024-04-27(56)

2024-04-26(39)

2024-04-25(22)

2024-04-24(36)