【分享】50行代码！批量爬取大量图片！ - 代码天地

【分享】50行代码！批量爬取大量图片！

其他 2019-03-08 22:21:10 阅读次数: 0

# -*- coding:utf-8 -*-
# coding=UTF-8
 
import os,urllib,urllib2,re
 
url = u"http://image.baidu.com/search/index?tn=baiduimage&ipn=r&ct=201326592&cl=2&lm=-1&st=-1&fm=index&fr=&sf=1&fmq=&pv=&ic=0&nc=1&z=&se=1&showtab=0&fb=0&width=&height=&face=0&istype=2&ie=utf-8&word=python&oq=python&rsp=-1"
outpath = "t:\\"
 
def getHtml(url):
    webfile = urllib.urlopen(url)
    outhtml = webfile.read()
    print outhtml
    return outhtml
 
def getImageList(html):
    restr=ur'('
    restr+=ur'http:\/\/[^\s,"]*\.jpg'
    restr+=ur'|http:\/\/[^\s,"]*\.jpeg'
    restr+=ur'|http:\/\/[^\s,"]*\.png'
    restr+=ur'|http:\/\/[^\s,"]*\.gif'
    restr+=ur'|http:\/\/[^\s,"]*\.bmp'
    restr+=ur'|https:\/\/[^\s,"]*\.jpeg'   
    restr+=ur'|https:\/\/[^\s,"]*\.jpeg'
    restr+=ur'|https:\/\/[^\s,"]*\.png'
    restr+=ur'|https:\/\/[^\s,"]*\.gif'
    restr+=ur'|https:\/\/[^\s,"]*\.bmp'
    restr+=ur')'
    htmlurl = re.compile(restr)
    imgList = re.findall(htmlurl,html)
    print imgList
    return imgList
 
def download(imgList, page):
    x = 1
    for imgurl in imgList:
        filepathname=str(outpath+'pic_%09d_%010d'%(page,x)+str(os.path.splitext(urllib2.unquote(imgurl).decode('utf8').split('/')[-1])[1])).lower()
        print '[Debug] Download file :'+ imgurl+' >> '+filepathname
        urllib.urlretrieve(imgurl,filepathname)
        x+=1
 
def downImageNum(pagenum):
    page = 1
    pageNumber = pagenum
    while(page <= pageNumber):
        html = getHtml(url)#获得url指向的html内容
        imageList = getImageList(html)#获得所有图片的地址，返回列表
        download(imageList,page)#下载所有的图片
        page = page+1
 
if __name__ == '__main__':
    downImageNum(1)
    ```
    ```
    char:925916955

猜你喜欢

转载自blog.csdn.net/qq_39363022/article/details/84995776

【分享】50行代码！批量爬取大量图片！

50 行代码爬取链家租房信息

批量爬取图片

12行python代码爬取网站图片

爬虫爬取大量高清壁纸图片

python批量爬取图片

爬取全国所有必胜客餐厅信息，只需要50行代码

50行Python代码轻松爬取抖音APP短视频，用心你就能学会

50行代码爬取微信公众号所有文章

爬虫神器之PyQuery实用教程（二），50行代码爬取穷游网

【爬虫】50 行代码爬取王者荣耀 98 个英雄所有皮肤

50行代码爬取Top500图书导入TXT文档

50行Python代码带你爬取黑丝美眉高清图

Python批量爬取唯美类图片

批量爬取百度图片

Python 批量爬取美女图片

python批量爬取猫咪图片

python爬虫：批量爬取网页图片

一百行代码爬取漫画喵

9行代码爬取B站专栏图片（BeautifulSoup4）

40行代码实现百度贴吧福利图片爬取（小白也能懂）

利用Python批量爬取XKCD动漫图片，并批量保存

50行代码实现批量下载小说，图片章节可自动识别转文字保存...

批量爬取某图片网站的图片

python爬取许多图片的代码

python 爬取图片封装代码

pyquery爬取图片(完整代码)

分享：50行代码监听watch小程序的globalData

Python批量爬取堆糖网图片

爬虫-某直播平台图片批量爬取url并下载

今日推荐

Linus “吃狗粮”最积极！

开源日报 | Winamp播放器即将开源；生成式AI之战升级第二轮；Linus“吃狗粮”最积极；AI进入泡沫前期；吴泳铭为阿里云带来了什么？

NetBSD 禁止提交由 AI 生成的代码

Apache Doris 2.0.10 版本正式发布！

开源日报 | 大模型开战；大模型独角兽被曝卖身；周鸿祎建议谷歌开源所有产品；最大开源AI社区提供1000万美元共享GPU

开源日报 | Chrome内置Gemini的意义不在于Gemini；中国AI追随之路的五大误区；ECharts创始人“下海”养鱼；谷歌I/O开发者大会什么都有，只是没有惊喜

微软回应中国区AI团队“打包赴美”传闻

周排行

SVN服务端安装在阿里云

实战 | 相机标定

webpack核心概念

note20——》只要肯低头吃苦，人生就会有救

PAT甲级 1062 Talent and Virtue （25 分）排序

NG Toolset开发笔记--5GNR Resource Grid（26）

如何对待上司

oracle命令

第9章 STL迭代器

logstash使用es映射模板

每日归档

更多

2024-05-20(36)

2024-05-19(0)

2024-05-18(4)

2024-05-17(34)

2024-05-16(6)

2024-05-15(24)

2024-05-14(0)

2024-05-13(18)

2024-05-12(0)

2024-05-11(38)