【Python】百度贴吧图片的爬虫实现（努力努力再努力） - 代码天地

【Python】百度贴吧图片的爬虫实现（努力努力再努力）

其他 2018-07-26 12:09:59 阅读次数: 0

学会爬取图片以后，第一时间去了张艺兴吧，哈哈哈哈哈哈

一定要放上一张爬取的照片，哼唧

import re

import requests

import urllib

class Baidutieba():

    def __init__(self):
        self.url = "http://tieba.baidu.com/p/4876047826?pn={}"#url地址
        self.headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36"}

    def parse_url(self,url):
        response = requests.get(url,headers = self.headers)
        return response.content.decode("utf-8")

    def get_image(self,html_str):
        reg = r'src="(http://img.*?\.jpg)"'
        imgre = re.compile(reg)
        image_list = re.findall(reg, html_str)
        print(image_list)
        return image_list

    def download(self,image_list):
        x=1
        for each in image_list:
            print(x)
            print(each)
            urllib.request.urlretrieve(each,"%s.jpg"%x)
            x+=1

    def run(self):
        #1.构造url_list
        url_list = [self.url.format(i) for i in range(1,3)]#页数
        for url in url_list:
            # 2.发送请求，获取响应
            html_str = self.parse_url(url)
            #3.提取数据
            image_list = self.get_image(html_str)
            #4.下载到本地
            self.download(image_list)

if __name__ == '__main__':
    tieba = Baidutieba()
    tieba.run()

参考页面：貌似这两个都是python2的环境

https://blog.csdn.net/qq_24421591/article/details/52596076

https://blog.csdn.net/z49434574/article/details/51552088

猜你喜欢

转载自blog.csdn.net/csdn___csdn/article/details/81208974

【Python】百度贴吧图片的爬虫实现（努力努力再努力）

爬虫_百度贴吧图片

爬虫实现百度贴吧的图片爬取

python爬虫爬取百度贴吧图片

Python实现百度贴吧数据爬虫

[python]百度贴吧爬虫

Python爬虫-百度贴吧

Python实现简单爬虫功能--批量下载百度贴吧里的图片

努力吧

努力吧！！！

努力！

努力

【努力努力再努力】Linux命令进阶

简单爬虫，爬去百度贴吧图片

百度贴吧帖子图片爬虫

Python-简单的爬虫案例（百度贴吧-图片）

python 爬虫（一）爬取百度贴吧图片

实战python 爬虫爬取百度贴吧图片

Python爬虫学习笔记二：百度贴吧网页图片抓取

python爬虫爬取百度贴吧图片，requests方法

go语言实现百度贴吧爬虫

Python爬虫之百度贴吧

Python爬虫实战：百度贴吧帖子

Python爬虫实践：获取百度贴吧内容

python爬虫学习之百度贴吧抓取

Python爬虫(一)爬百度贴吧

python爬虫爬取百度贴吧帖子

爬虫-百度贴吧

百度贴吧爬虫

[GO]百度贴吧的爬虫

今日推荐

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

面壁智能发布 Eurux-8x22B 开源大模型 —— 堪称「理科状元」

开源日报 | 谷歌扶持鸿蒙上位；开源Rabbit R1；Docker加持的安卓手机；微软的焦虑和野心；海尔电器把开放平台关了

中国码农的“35岁魔咒”

蘭雅 CorelDRAW 插件 2024.5.1 国际劳动节版，免费下载

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

周排行

Java自定义时间格式

同步整形电路

在开发中最最最常用的字符串的属性大集合

Linux 查看端口占用并杀掉

Java基础四：ArrayList

多线程之死锁就是这么简单

mysql 基础命令集

awk 命令详解

Centos6.3编译安装nginx+php步骤

OCR （Optical Character Recognition，光学字符识别）

每日归档

更多

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)