python的爬虫 - 代码天地

python的爬虫

编程语言 2018-07-07 01:05:15 阅读次数: 0

# -*- coding: utf-8 -*-
# 网上抄来的最简单的爬虫，用于批量下载图片
#
import urllib.request
import re

#该函数用于获取html内容
#使用到urlopen的函数
def getHtml(url):
    page = urllib.request.urlopen(url)
    #3.0直接使用read()函数会出现报错，提示是编码有问题。在后面加上编码就ok了。
    html = page.read().decode("utf-8")
    return html

def getImg(html):
    #reg为正则替换，这边是根据贴吧的帖子的图片在html中的状态拼的，只适用于贴吧帖子下图
    #正则的详细教程见：http://www.cnblogs.com/huxi/archive/2010/07/04/1771073.html
    reg = r'src="(.+?\.jpg)" pic_ext'
    #compile比较简单的解释是制作一个漏斗，规则如reg，只有符合的才能够通过。
    image = re.compile(reg)
    imgList = re.findall(image, html)
    x = 0
    for imgurl in imgList:
        #print ("for test %s" % x) 查看是否走到循环用的print
        #urlretrieve() 方法直接将远程数据下载到本地
        urllib.request.urlretrieve(imgurl,'D:/google//%s.jpg' % x)
        x += 1

#html变量定义一个需要读取的网址，这边选择的是某个贴吧的帖子。
html = getHtml("https://tieba.baidu.com/p/2460150866?red_tag=1993932268")
#html = getHtml("https://www..com")
#执行操作
getImg(html)

print("all over!")

猜你喜欢

转载自blog.csdn.net/iwilldoitx/article/details/80946569

【Python爬虫】爬虫实战

Python的爬虫与反爬虫

Python爬虫：爬虫demo

【python爬虫】初识爬虫

（爬虫）Python爬虫02

（爬虫）Python爬虫01

python爬虫--爬虫前奏

python爬虫

python 爬虫

python的爬虫

Python爬虫！

爬虫python

爬虫———python

Python 【爬虫】

Python——爬虫

【Python】爬虫

【python爬虫】python爬虫demo

python爬虫-初识爬虫/反爬虫

python---爬虫[3]：爬虫与反爬虫

python爬虫整理——爬虫简介

爬虫-Python爬虫常用库

Python爬虫（一）爬虫的原理

Python爬虫与反爬虫（7）

Python爬虫实战--WeHeartIt爬虫

Python爬虫实战--TripAdvisor爬虫

python爬虫：爬虫的工作原理

python爬虫-scrapy爬虫框架

python爬虫-入门-了解爬虫

Python爬虫（一）什么是爬虫？

python爬虫：初始爬虫一

今日推荐

《美国对全球网络空间安全与发展的威胁和破坏》报告发布

火速冲上 GitHub 热榜 —— 开源编程语言、框架哪有这么可爱？

北京人形机器人创新中心发布全球首个纯电驱拟人奔跑的全尺寸人形机器人“天工”

LFOSSA 源来如此公开课 | 掌握云原生未来：CNCF 认证全面攻略与备考秘籍

周排行

循环神经网络（rnn）讲解

Tigao教程四：单独的关节运动

金蝶K3WISE15.0-注册套打教程

如何在Mac上配置Kubernetes

Android应用结束自身进程的方法

SpringMVC学习十三拦截器栈

中国驻洛杉矶总领馆举行新春招待会

HttpClient get post 发送

11 - three.js 笔记 - 绘制三维字体模型

Mysql递归获取某个父节点下面的所有子节点和子节点上的所有父节点

每日归档

更多

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)

2024-04-27(56)

2024-04-26(39)

2024-04-25(22)

2024-04-24(36)

2024-04-23(26)

2024-04-22(39)