requests bs4 datetime re json - 代码天地

requests bs4 datetime re json

其他 2018-05-20 18:16:44 阅读次数: 2

#原来的一个案例 2016
import requests
from bs4 import BeautifulSoup
from datetime import datetime
import re
import json

commenturl = 'http://comment5.news.sina.com.cn/page/info?version=1&format=js&channel=gn&newsid=comos-{}&\
group=&compress=0&ie=utf-8&oe=utf-8&page=1&\
page_size=20'

 
def getCommentCounts(newsurl):
    m = re.search('doc-i(.*).shtml', newsurl)
    newsid = m.group(1)
    comments = requests.get(commenturl.format(newsid))
    print(commenturl.format(newsid))
    jd = json.loads(comments.text.strip('var data='))
    return jd['result']['count']['total']


def getNewsDetail(newsurl):
    result = {}
    res = requests.get(newsurl)
    res.encoding = 'utf-8'
    soup = BeautifulSoup(res.text, 'html.parser')
    result['title'] = soup.select('#artibodyTitle')[0].text
    result['newssource'] = soup.select('.time-source span a')[0].text
    timesource = soup.select('.time-source')[0].contents[0].strip()
    result['dt'] = datetime.strptime(timesource,'%Y年%m月%d日%H:%M')
    result['article'] = '@'.join([p.text.strip() for p in soup.select('#artibody p')[:-1]])
    result['editor'] = soup.select('.article-editor')[0].text.strip('责任编辑：')
    result['comments'] = getCommentCounts(newsurl)
    return result

newsurl = 'http://news.sina.com.cn/c/nd/2016-12-18/doc-ifxytqax6457791.shtml' #只要这条代码中的newsurl具体赋值就可以了
print(getNewsDetail(newsurl))

猜你喜欢

转载自www.cnblogs.com/leolaosao/p/9064013.html

requests bs4 datetime re json

爬虫（七）基于requests‐bs4‐re的淘宝&股票数据爬虫

安装requests 和bs4

python requests,bs4应用实例

python requests bs4练习

爬虫 requests，bs4 用法示例

简单网页爬虫（requests,bs4）

网络爬虫requests-bs4-re-1

python bs4 + requests4 简单爬虫

PYTHON爬虫（正则re模块 | bs4 | pyquery）

Python_爬虫_xpath/bs4/re小实战

HTML网页解析之Xpath,bs4及re

python爬虫基础知识——requests、bs4的使用

requests和bs4的python爬虫入门

Python爬虫需要requests和bs4

python--- bs4和requests模块

爬虫系列 requests和bs4 scrapy

requests的常用的方法和bs4的常用的方法：

Python之requests库和bs4库实例

学习用requests, bs4 抓取网页特定的内容

9.28 包/time/datetime/random/hashlib/hmac/typing/requests/re模块

包+time+datetime+random+hashlibhmac+typing+requests+re模块(day17整理)

ImportError: No module named 'requests'，No module named 'Bs4'（找不到requests模块）

用xpath、bs4、re爬取B站python数据

Xpath re bs4 等爬虫解析器的性能比较

基于bs4库和re库的天天基金网python爬虫

爬虫打卡2之定位工具xpath、bs4、re学习总结

利用bs4和requests爬取股票历史交易数据

python 安装模块requests、bs4、html5lib、 lxml、matplotlib

使用requests和bs4模块爬取虎扑爆照区照片

今日推荐

基于大语言模型的开源知识库问答系统 MaxKB GitHub Star 数量突破 5,000 个！

美国拟限制 AI 大模型出口中国和俄罗斯

苹果将与 OpenAI 达成协议，将 ChatGPT 应用于 iPhone

openKylin 社区生态委员会第六次会议圆满召开

阿里云正式发布通义千问 2.5

Python 3.13 发布首个 Beta：实验性自由线程模式和 JIT、改进交互式解释器

Stack Overflow 拿我的代码去训练 AI 大模型，还封了我的账号

Pop!_OS 的 COSMIC 桌面完成 App Store 上架工作

《2024 年一季度互联网投融资运行情况》研究报告

报告：Django 仍然是 74% 开发者的首选

15 年前上了“FFmpeg 耻辱柱”，今天他还得谢谢咱——腾讯QQPlayer一雪前耻？

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

周排行

记一下去大梅沙的准备（2018-05-26）

Spring 注解事务

基于HTTP协议的客户端缓存

阿里云rds 备份和还原

[PHP] 几个拖慢 PHP 程序/API 运行速度的点

python 代码风格------------PEP8规则

js控制json生成菜单——自制菜单（一）

将字符串: 'k:1|k1:2|k2:3|k3:4 ' ,处理成 python 字典: {'k':1, 'k1':2, ...}

微信小程序转支付宝小程序

Qt551.窗口滚动条

每日归档

更多

2024-05-13(18)

2024-05-12(0)

2024-05-11(38)

2024-05-10(38)

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)