豆瓣读书排名—beautifulsoup提取 - 代码天地

豆瓣读书排名—beautifulsoup提取

其他 2018-05-27 01:53:55 阅读次数: 0

import requests
import re
import json
from bs4 import beautiful.soup


#发送请求，获取响应
class A():
    def get_html(self):
        response = requests.get(url)
        html = response.read().decode("utf-8")
        return html

    beautifulsoup
    
    #解析响应，提取数据
    def parse_html(html):
        
        #正则表达式(排名、书名)
        pattern = re.compile('class="hd".*?class="pos">(.*?)</span>.*?class="title".*?target="_blank">(.*?)</a>', re.S)
        items = re.findall(pattern, html)
        for item in items:
            yield{
                'rank': item[0],
                'name': item[1].strip()
                }
    
        
#运行函数
if __name__ == '__main__':

    for perpage in range(0,3):
        with open('book_rank.txt', 'a', encoding = "utf-8") as br:
            url = 'https://www.douban.com/doulist/1264675/?start=' + str(perpage*25) + '&sort=seq&sub_type='
            html = A.get_html(url)
            for item in A.parse_html(html):
                br.write(json.dumps(item, ensure_ascii=False)+ '\n')

猜你喜欢

转载自blog.csdn.net/weixin_41512727/article/details/79548496

豆瓣读书排名—beautifulsoup提取

豆瓣读书排名——正则提取

BeautifulSoup爬取豆瓣电影排名

BeautifulSoup和re爬取豆瓣读书榜

BeautifulSoup（豆瓣例子）

豆瓣读书\豆瓣电影

豆瓣读书排名—简单爬取第一页

2020/04/12 02-HTML和URL提取、豆瓣读书爬虫编写

Python 爬虫-豆瓣读书

Python爬虫——豆瓣读书

使用BeautifulSoup去爬取豆瓣图片

使用BeautifulSoup方法抓取豆瓣电影信息

requests+beautifulsoup爬取豆瓣图书

BeautifulSoup库基本使用(演示豆瓣250)

豆瓣读书新书爬取

Python爬取豆瓣读书

python简单爬豆瓣电影排名

豆瓣排名前500的电视剧

自编Python程序: 豆瓣电影排名(爬虫)

python爬虫豆瓣排名前250的电影

beautifulsoup提取所有<a>标签内容 Python

BeautifulSoup 库 & 信息标记与提取方法

Beautifulsoup提取特定丁香园帖子回复

BeautifulSoup解析豆瓣即将上映的电影信息

爬取豆瓣电影数据（requests，基于lxml的BeautifulSoup，json）

python08豆瓣电影爬虫 BeautifulSoup + Reuqests

利用BeautifulSoup爬取豆瓣高分电影排行榜

selenium+PhantomJS爬取（豆瓣读书）

selenium + phantomJS 爬取（豆瓣读书）

豆瓣读书3.0 —— 书名列表查询

今日推荐

探索 api.maynor1024.live：一站式 AI 服务平台

AI一键去衣技术：窥见深度学习在图像处理领域的革命(最后有彩蛋)

艾体宝案例 | 使用Redis和Spring Ai构建rag应用程序

Apple M1 vs 高通8Gen2 vs Apple A12Z各方面比较

【升职加薪必备架构图】Springboot学习路线汇总_springboot四层架构流程图

与Apollo共创生态：Apollo7周年大会自动驾驶生态利剑出鞘

Spring Boot 3.0：未来企业应用开发的基石

Java 的 AI 前景光明

国内首个智能体生态大会！2024百度万象大会定档5月30日

开源一周年，青语言新版发布

深入浅出：大型语言模型（LLM）的全面解读

顶会ICLR2024论文Time-LLM：基于大语言模型的时间序列预测

周排行

第五讲：AbstractBean以及Ioc常见注解使用和自动装配

python-re模块学习-正则表达式

黑客攻击常用手段

正则表达式的规则

windwos::mutex

Spring中日志的使用（log4j）

Bootstra5 按钮处理

JVM内存结构-这一篇全部了解

Android的低级错误

Oracle中Cursor, A表a1字段值复制到B表b1字段

每日归档

更多

2024-06-02(4)

2024-06-01(60)

2024-05-31(47)

2024-05-30(4)

2024-05-29(65)

2024-05-28(2)

2024-05-27(56)

2024-05-26(6)

2024-05-25(68)

2024-05-24(65)