爬取zol索尼相机排行榜 - 代码天地

爬取zol索尼相机排行榜

其他 2020-08-05 18:27:23 阅读次数: 0

import requests
import re
import json
from bs4 import BeautifulSoup

def get_one_page(url):
    user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36'
    headers = {'User-Agent':user_agent}
    response = requests.get(url,headers)
    return response.text

获取网页内容

def get_information(html_text):
    pattern = re.compile('shtml">(.{1,16})</a></div>.*?"rank__price">(.{1,8})</div>.*?<span>(.*?)</span>', re.S)
    items = re.findall(pattern,html_text)
    for item in items:
        yield {
            'index':item[0],
            'price':item[1],
            'score':item[2]
        }

正则匹配
yield整合起数据结构
finaall返回匹配到的列表，里面为元组

def recording(information):
  with open('豆瓣Top250.txt','a',encoding='utf-8') as f:
      f.write(json.dumps(information,ensure_ascii=False)+'\n')

将爬到的信息写入文件

def main():
    for i in range(0,1):
        response = get_one_page('https://top.zol.com.cn/compositor/15/manu_167.html')
        html_text = get_information(response)
        for m in html_text:
            recording(m)
        print('正在爬取第'+str(i)+'页')
    print('爬取完毕！')

main()

猜你喜欢

转载自blog.csdn.net/weixin_39025679/article/details/106175025

爬取zol索尼相机排行榜

豆瓣电影排行榜爬取

爬取猫眼电影排行榜

requests爬取猫眼排行榜

爬取360影视排行榜-总榜

python爬虫爬取酷狗音乐排行榜

python — 定时爬取猫眼电影排行榜

爬取起点小说总排行榜

Python爬虫实战：爬取全站小说排行榜

利用scrapy框架爬取网易新闻排行榜

爬取豆瓣电影排行榜前250

python爬取酷狗音乐排行榜

python爬取b站排行榜

使用xpath爬取猫眼电影排行榜

爬取豆瓣网电影排行榜

scrapy爬取猫眼电影排行榜

爬取中国大学排行榜

中国最好大学排行榜爬取

爬取微博热搜排行榜

爬取时代周报排行榜前十

Python爬取虾米音乐排行榜

爬取厦门特色小吃排行榜

爬取B站热门视频排行榜

爬取爱奇艺电影排行榜

爬取百度排行榜

爬取芒果TV电视剧排行榜

爬取酷狗TOP_排行榜

python 爬虫爬取网易新闻网易排行榜

Python爬取酷我音乐排行榜歌曲~

b站视频排行榜爬取

今日推荐

Linus “吃狗粮”最积极！

开源日报 | Winamp播放器即将开源；生成式AI之战升级第二轮；Linus“吃狗粮”最积极；AI进入泡沫前期；吴泳铭为阿里云带来了什么？

NetBSD 禁止提交由 AI 生成的代码

Apache Doris 2.0.10 版本正式发布！

开源日报 | 大模型开战；大模型独角兽被曝卖身；周鸿祎建议谷歌开源所有产品；最大开源AI社区提供1000万美元共享GPU

开源日报 | Chrome内置Gemini的意义不在于Gemini；中国AI追随之路的五大误区；ECharts创始人“下海”养鱼；谷歌I/O开发者大会什么都有，只是没有惊喜

微软回应中国区AI团队“打包赴美”传闻

周排行

SVN服务端安装在阿里云

实战 | 相机标定

webpack核心概念

note20——》只要肯低头吃苦，人生就会有救

PAT甲级 1062 Talent and Virtue （25 分）排序

NG Toolset开发笔记--5GNR Resource Grid（26）

如何对待上司

oracle命令

第9章 STL迭代器

logstash使用es映射模板

每日归档

更多

2024-05-20(36)

2024-05-19(0)

2024-05-18(4)

2024-05-17(34)

2024-05-16(6)

2024-05-15(24)

2024-05-14(0)

2024-05-13(18)

2024-05-12(0)

2024-05-11(38)