python 爬虫（一）爬取百度贴吧图片 - 代码天地

python 爬虫（一）爬取百度贴吧图片

其他 2018-05-11 09:54:45 阅读次数: 0

网址为https://tieba.baidu.com/f?ie=utf-8&kw=%E6%B5%B7%E8%B4%BC%E7%8E%8B&fr=search

基本思路就是：下载整个页面，然后用正则匹配要下载的内容，最后保存到本地。

1.下载整个页面

定义一个下载器

#首先定义一个下载器，用来下载网页
def load_page(my_url):
#设置代理IP
user_agent=user_agent='Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.94 Safari/537.36'
headers={'User-Agent':user_agent}
response=request.Request(my_url,headers=headers)
results=request.urlopen(response)
html=results.read().decode('utf-8')

return html

my_url=''https://tieba.baidu.com/f?ie=utf-8&kw=%E6%B5%B7%E8%B4%BC%E7%8E%8B&fr=search'

html=load_page(my_url)

接下来用正则匹配，

img_inf=r'/wh%3D200%2C90%3B/sign=\w+/\w+\.jpg'
img=re.compile(img_inf)

img_list=re.findall(img,html)

返回一个列表

最后保存到本地：

for i in img_list:
img_name=str(i).split('/')[-1]
img_url='https://imgsa.baidu.com/forum%s' % (str(i))

file_path='f:\\wd.python\\tieba_haizeiwang\\%s' %(img_name)

with open(file_path,'wb') as f:
f.write(request.urlopen(img_url).read())

f.close()

结果：

猜你喜欢

转载自blog.csdn.net/wojiaodabai/article/details/79324494

python爬虫爬取百度贴吧图片

python 爬虫（一）爬取百度贴吧图片

python爬虫爬取百度贴吧帖子

python爬取百度贴吧张国荣图片

Python爬取百度贴吧图片

python爬取百度贴吧Jpg图片

实战python 爬虫爬取百度贴吧图片

python爬虫爬取百度贴吧图片，requests方法

Python爬虫(一)爬百度贴吧

python 爬虫爬取百度贴吧图片 urllib.request.urlretrieve图片批量下载函数

python 爬取菜鸟教程python100题，百度贴吧图片反爬虫下载，批量下载

Python 爬百度贴吧里面的图片分页分帖子爬取

芝麻HTTP:Python爬虫实战之爬取百度贴吧帖子

Python3爬虫爬取百度贴吧

[Python爬虫之路2]爬取百度贴吧内容

Python爬虫系列之百度贴吧爬取

python爬虫爬取百度贴吧（入门练习）

Python爬虫【实战篇】百度贴吧爬取页面存到本地

Python爬虫之简单的爬取百度贴吧数据

python爬虫01-爬取静态页面（百度贴吧）

python爬取百度贴吧指定内容

python学习笔记--爬取百度贴吧

Python爬取百度贴吧内容

python百度贴吧爬取

13-爬取百度贴吧中的图片（python+xpath）

案例关于python百度贴吧图片爬取教程！

用Python爬取百度贴吧中的图片

【2019.05】python 爬取百度贴吧图片并保存（爬虫）有坑———解析不了网页！还有这种反爬策略！

Python爬虫实战，简单的爬虫案例，以及爬取百度贴吧网页原码和360翻译

Python爬虫实践 —— 8.百度情侣头像贴吧头像图片爬取（lxml+urllib.request）

今日推荐

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

《美国对全球网络空间安全与发展的威胁和破坏》报告发布

火速冲上 GitHub 热榜 —— 开源编程语言、框架哪有这么可爱？

北京人形机器人创新中心发布全球首个纯电驱拟人奔跑的全尺寸人形机器人“天工”

周排行

rbac——界面、权限

Apache CXF + SpringMVC 整合发布WebService

so插件化

Vue.js实战系列---图标字体制作（svg格式）

PAT乙级 1007 素数对猜想(孪生素数对) (20分) ---（C语言 + 详细注释）

被IRM保护的文档，打开失败

Calendar和Date计算日期差的小问题

win10子系统ubuntu18.4安装docker

利用Wrap Shell Script定位Android Native内存泄漏

MySQL: Transaction (Part I - Basic Concept)

每日归档

更多

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)

2024-04-27(56)

2024-04-26(39)

2024-04-25(22)

2024-04-24(36)