Python学习之爬虫07-糗事百科段子爬取 - 代码天地

Python学习之爬虫07-糗事百科段子爬取

编程语言 2019-01-27 16:01:18 阅读次数: 0

Python学习之爬虫07-糗事百科段子爬取

概述：

巩固练习。

#糗事百科段子爬虫
import urllib.request
import re
headers=("User-Agent","Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.22 Safari/537.36 SE 2.X MetaSr 1.0")
opener=urllib.request.build_opener()
opener.addheaders=[headers]
#安装为全局
urllib.request.install_opener(opener)
for i in range(0,35):
    thisurl="http://www.qiushibaike.com/8hr/page/"+str(i+1)+"/?s=4948859"
    data=urllib.request.urlopen(thisurl).read().decode("utf-8","ignore")
    pat='<div class="content">.*?<span>(.*?)</span>.*?</div>'
    rst=re.compile(pat,re.S).fiimport urllib.request
import re 
# 添加报头并将报头安装为全局属性
headers=("User-Agent","Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3493.3 Safari/537.36")
opener=urllib.request.build_opener()
opener.addheaders=[headers]
urllib.request.install_opener(opener)
pat='<div class="content">.*?<span>(.*?)</span>.*?</div>'
fh=open("G:\\Python_Test\\qiushibaike\\qiushibaike.txt","a+",encoding="utf-8")
for i in range(1,10):
    url="https://www.qiushibaike.com/text/page/"+str(i)+"/"
    data=urllib.request.urlopen(url).read().decode("utf-8","ignore")
    #print(data)
    rst=re.compile(pat,re.S).findall(data)  # re.S为模式修正符，使得 .可以匹配换行符,这样就可以匹配多行数据
    for j in range(0,len(rst)):
        fh.write(rst[j])
        #fh.write("\n")
        #print(rst[j])
        #print("----------------")
    print("当前(第"+str(i)+"页）爬取成功！")
fh.close()ndall(data)
    for j in range(0,len(rst)):
        print(rst[j])
        print("-------")

糗事百科段子爬取

猜你喜欢

转载自blog.csdn.net/xxydzyr/article/details/86665146

Python学习之爬虫07-糗事百科段子爬取

芝麻HTTP:Python爬虫实战之爬取糗事百科段子

Python爬虫（二）之使用标准库爬取糗事百科段子

Python爬虫实战之爬取糗事百科段子【华为云技术分享】

Python爬虫框架Scrapy之爬取糗事百科大量段子数据

爬虫实战（二）：爬取糗事百科段子

爬取糗事百科段子

糗事百科段子爬取

Python 爬取糗事百科段子

利用Python爬取糗事百科段子信息

python爬取糗事百科段子

Python :爬取糗事百科段子

Python爬虫实战(六)：爬取糗事百科段子

Python爬虫实现爬取糗事百科段子 (26行代码简单实现)

[爬虫]用python的requests模块爬取糗事百科段子

python网络爬虫-爬取糗事百科段子源码

Python多线程爬虫实战_爬取糗事百科段子的实例

python爬虫-糗事百科段子

python爬虫（二）爬取糗事百科

爬虫：python爬取糗事百科网页信息

python爬虫1、~爬取糗事百科

用BeautifulSoup爬取糗事百科段子

Spider—糗事百科段子爬取

一个简单的爬虫代码爬取糗事百科段子（selenium+ChromeDriver）

Python爬取糗事百科

python爬虫学习之路(7) 爬取糗事百科

自己手写使用python爬取糗事百科段子

python笔记之利用scrapy框架爬取糗事百科首页段子

python笔记之利用BeautifulSoup爬取糗事百科首页段子

使用python的requests、xpath和多线程爬取糗事百科的段子

今日推荐

美国拟限制 AI 大模型出口中国和俄罗斯

苹果将与 OpenAI 达成协议，将 ChatGPT 应用于 iPhone

openKylin 社区生态委员会第六次会议圆满召开

阿里云正式发布通义千问 2.5

Python 3.13 发布首个 Beta：实验性自由线程模式和 JIT、改进交互式解释器

Stack Overflow 拿我的代码去训练 AI 大模型，还封了我的账号

Pop!_OS 的 COSMIC 桌面完成 App Store 上架工作

报告：Django 仍然是 74% 开发者的首选

《2024 年一季度互联网投融资运行情况》研究报告

15 年前上了“FFmpeg 耻辱柱”，今天他还得谢谢咱——腾讯QQPlayer一雪前耻？

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

周排行

curl的POST请求，封装方法

8.1.1. Integer Types

Java基础 Day05(个人复习整理)

Python - Django - 中间件 process_exception

小L的试卷

【Shell编程】（函数）判断用户是否存在

python(css样式)

spring ant path 匹配原则 - 【笔记】

《JavaScript与JScript从入门到精通》(美)James.Jaworski.中译本.扫描版.pdf

Eclipse运行带参数的java程序

每日归档

更多

2024-05-12(0)

2024-05-11(38)

2024-05-10(38)

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)