python 爬虫实例 - 代码天地

python 爬虫实例

编程语言 2019-02-28 09:10:49 阅读次数: 0

//新手爬虫在线爬小说，大佬略过~~~~~
# -*- coding:utf-8 -*-

import re
import sys
import os
from time import sleep

from bs4 import BeautifulSoup
import requests
reload(sys)
sys.setdefaultencoding('utf-8')

//上面的是引得包和解决一些bug的，什么证书问题什么的



def xs2(url,):
    path = r'E:/Desktop/img/cc.txt'
    localPath = unicode(path, 'utf-8')//转译，如果路径中有中文可能报错
    req = requests.get(url, headers=headers).text//headers写自己浏览器的header是
    soup = BeautifulSoup(req, 'html.parser')//这里用的BeautifulSoup，因为比较容易匹配
    list = soup.find_all('p')//因为纵横的小说html正文都是写在<p>所有匹配p标签
    title_txtbox = soup.find_all(class_='title_txtbox')//匹配书名
    fn = open(localPath, 'a+')//写入
    fn.write(title_txtbox[0].get_text())
    for i in range(0, len(list)):
        pp = list[i].get_text()
        fn.write(pp)
        print "正在写入" + pp
    fn.write("\n")//写完1章来个换行
    fn.close()
    nextchapter = soup.find_all(class_='nextchapter')//获取下一章的链接
    ree=re.findall(r'href="(.*?)"',str(nextchapter))匹配href的属性，(.*?)表示这是我要的
    sleep(2)//睡2秒，太快可能被反爬虫封杀了ip可以换个headers继续使用，平常的话建议用比人的headers 23333
    xs2(str(ree).strip("['").strip("']"))//因为匹配的下一章的链接中前后有[ ]所有删掉，，循环调入直到下载完，但是可能会被 封杀警告
if __name__ == '__main__':
    url = 'http://book.zongheng.com/chapter/769917/43006084.html'//纵横小说网址
    xs2(url)

猜你喜欢

转载自blog.csdn.net/weixin_42789202/article/details/88012096

python爬虫实例 python爬虫实例

【Python】什么是爬虫，爬虫实例

Python网络爬虫实例

Python爬虫实例（一）

python爬虫简单实例

网络爬虫，Python实例

Python简单爬虫实例

python(六)爬虫实例

python爬虫小实例

python Scrapy 爬虫实例

python 爬虫实例（四）

python爬虫实例

python 爬虫实例（三）

python爬虫实例大全

python(爬虫)实例演示

python 爬虫实例

Python爬虫-小实例

Python爬虫原理与python爬虫实例大全

python爬虫实例项目大全

Python爬虫新闻实例代码

Python爬虫框架Scrapy实例

Python_爬虫小实例

python定向爬虫实例（二）

python定向爬虫实例（三）

Python Scrapy 爬虫框架实例

python多线程爬虫实例

python爬虫经典实例（二）

Python进阶(二十)-Python爬虫实例讲解

【Python爬虫9】Python网络爬虫实例实战

[python实例] 爬虫实现自动登录、签到

今日推荐

基于大语言模型的开源知识库问答系统 MaxKB GitHub Star 数量突破 5,000 个！

美国拟限制 AI 大模型出口中国和俄罗斯

苹果将与 OpenAI 达成协议，将 ChatGPT 应用于 iPhone

openKylin 社区生态委员会第六次会议圆满召开

阿里云正式发布通义千问 2.5

Python 3.13 发布首个 Beta：实验性自由线程模式和 JIT、改进交互式解释器

Stack Overflow 拿我的代码去训练 AI 大模型，还封了我的账号

Pop!_OS 的 COSMIC 桌面完成 App Store 上架工作

《2024 年一季度互联网投融资运行情况》研究报告

报告：Django 仍然是 74% 开发者的首选

15 年前上了“FFmpeg 耻辱柱”，今天他还得谢谢咱——腾讯QQPlayer一雪前耻？

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

周排行

记一下去大梅沙的准备（2018-05-26）

Spring 注解事务

基于HTTP协议的客户端缓存

阿里云rds 备份和还原

[PHP] 几个拖慢 PHP 程序/API 运行速度的点

python 代码风格------------PEP8规则

js控制json生成菜单——自制菜单（一）

将字符串: 'k:1|k1:2|k2:3|k3:4 ' ,处理成 python 字典: {'k':1, 'k1':2, ...}

微信小程序转支付宝小程序

Qt551.窗口滚动条

每日归档

更多

2024-05-13(18)

2024-05-12(0)

2024-05-11(38)

2024-05-10(38)

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)