python+requests+re匹配抓取猫眼上映电影信息 - 代码天地

python+requests+re匹配抓取猫眼上映电影信息

其他 2019-06-15 16:04:45 阅读次数: 0

python+requests抓取猫眼中上映电影，re正则匹配获取对应电影的排名，图片地址，片名，主演及上映时间和评分

import requests
import re, json


def get_html(url):
    """
    获取网页html源码
    :return:
    """
    user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) " \
                 "AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"
    # 浏览器信息
    headers = {
        "User-Agent": user_agent
    }
    r = requests.get(url, headers=headers)  
    html = r.text
    # print(html)
    return html


def parse_one_page(html):
    """
    正则匹配需要内容
    :param html:
    :return:
    """
    # 排名+图片地址+主演+上映时间+评分
    pattern = re.compile('<dd>.*?board-index.*?>(\d+)</i>.*?data-src="(.*?)".*?name"><a'
                         + '.*?>(.*?)</a>.*?star">(.*?)</p>.*?releasetime">(.*?)</p>'
                         + '.*?integer">(.*?)</i>.*?fraction">(.*?)</i>.*?</dd>', re.S)

    items = re.findall(pattern, html)

    for item in items:
        yield {
            "排名": item[0],
            "图片地址": item[1],
            "片名": item[2],
            "主演": item[3].strip()[3:],
            "上映时间": item[4].strip()[4:],
            "分数": item[5] + item[6]
        }


# 数据存储

def write_file(content):
    with open("result.txt", 'a+', encoding='utf-8') as f:
        f.write(json.dumps(content, ensure_ascii=False) + "\n")


def main():
    """
    主函数
    :return:
    """
    url = "http://maoyan.com/board/4"
    html = get_html(url)
    for item in parse_one_page(html):
        print(item)
        write_file(item)


if __name__ == '__main__':
    main()

猜你喜欢

转载自www.cnblogs.com/CesareZhang/p/11027772.html

python+requests+re匹配抓取猫眼上映电影信息

Python requests + 正则表达式猫眼电影top100 信息抓取

反爬虫-python3.6抓取猫眼电影信息

抓取猫眼TOP100电影信息

利用Python3的requests和re库爬取猫眼电影笔记

python战反爬虫：爬取猫眼电影数据 (一）（Requests, BeautifulSoup, MySQLdb,re等库)

利用request和re抓取猫眼电影排行

Scrapy学习笔记-利用requests库抓取猫眼电影排行

python 抓取猫眼电影评分

正则匹配的抓取猫眼电影排行Top100

python爬虫开发之使用Python爬虫库requests多线程抓取猫眼电影TOP100实例

抓取猫眼电影排行

抓取猫眼电影

。。抓取猫眼电影排行

spider(猫眼电影Top100信息抓取)

python爬取猫眼电影信息

python学习(23)requests库爬取猫眼电影

Python爬虫之一：抓取猫眼电影TOP100

Python爬虫入门实战之猫眼电影数据抓取！

Python学习--猫眼电影TOP100榜单抓取

Python 抓取猫眼电影TOP100数据

kettle 利用 HTTP Client 获取猫眼电影API近期上映相关信息，并解析json输出为Excel文件

requests+re+multiprocessing爬取猫眼电影top100

猫眼电影排行信息

(python3爬虫实战-第一篇）利用requests+正则抓取猫眼电影热映口碑榜

python爬取猫眼电影TOP100信息

requests+re(正则)之猫眼top100排名信息爬取

python正则表达式入门，贪婪匹配和非贪婪匹配，正则表达式的分组，猫眼电影TOP100信息提取

利用requests和正则表达式re爬取猫眼电影top100，并下载图片

猫眼电影获取票房信息

今日推荐

面壁智能发布 Eurux-8x22B 开源大模型 —— 堪称「理科状元」

开源日报 | 谷歌扶持鸿蒙上位；开源Rabbit R1；Docker加持的安卓手机；微软的焦虑和野心；海尔电器把开放平台关了

中国码农的“35岁魔咒”

蘭雅 CorelDRAW 插件 2024.5.1 国际劳动节版，免费下载

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

周排行

OOP第二次作业

java web 乱码问题

android 禁止scrollview 因控件变化自动滚动到底的方法

mysql服务解压版的安装(5.7)

centos7 nginx+tomcat配置https 安装免费SSL Let’s Encrypt

使用Mosquitto遗嘱机制实现感知客户端上下线功能的方法

面向对象之------多态与多态性

开发Teams Tabs应用程序

C# 希尔排序

第2章 Jupyter Notebooks

每日归档

更多

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)

2024-04-27(56)