requests库爬取猫眼电影“最受期待榜”榜单 --网络爬虫 - 代码天地

requests库爬取猫眼电影“最受期待榜”榜单 --网络爬虫

其他 2019-04-23 18:51:20 阅读次数: 0

目标站点：https://maoyan.com/board/6

# coding:utf8
import requests, re, json
from requests.exceptions import RequestException


# from multiprocessing import Pool

# 获取页面
def get_one_page(url):
    try:
        resp = requests.get(url)
        if resp.status_code == requests.codes.ok:
            return resp.text
        else:
            return None
    except RequestException:
        return None


# 页面解析
def parse_one_page(html):
    pattern = re.compile('<dd>.*?board-index.*?>(\\d+)</i>.*?data-src="(.*?)"'
                         '.*?name"><a.*?">(.*?)</a>.*?star">(.*?)</p>.*?releasetime">(.*?)</p>'
                         '.*?</dd>', re.S)
    items = re.findall(pattern, html)  # result is a list,made up of tuples
    for item in items:
        # 生成字典
        yield {
            'index': item[0],
            'img_url': item[1],
            'title': item[2],
            'stars': item[3][3:],
            'releasetime': item[4],
        }


# 将爬取到的内容写入到文件中
def write_file(content):
    with open('content.txt', 'a', encoding='utf-8') as f:
        str_content = json.dumps(content, ensure_ascii=False)  # 转换成字符串
        f.write(str_content + '\n')
        f.close()


# 主函数
def main(offset):
    url = "https://maoyan.com/board/6/?offset=" + str(offset)
    html = get_one_page(url)
    for item in parse_one_page(html):
        write_file(item)
        print(item)


if __name__ == "__main__":
    # 请求4次
    for i in range(5):
        main(i * 10)

在这里插入图片描述

猜你喜欢

转载自www.cnblogs.com/qikeyishu/p/10758081.html

requests库爬取猫眼电影“最受期待榜”榜单 --网络爬虫

python爬虫----猫眼电影：最受期待榜

[Python爬虫]猫眼电影榜单爬取

【JAVA爬虫】利用JSOUP简单爬取猫眼电影榜单

爬虫如何爬取猫眼电影TOP榜数据

Python爬虫爬取猫眼电影热映口碑榜

python爬虫--猫眼电影TOP100榜爬取

一起学爬虫——使用xpath库爬取猫眼电影国内票房榜

python爬虫-利用requests库爬取猫眼电影top100

python战反爬虫：爬取猫眼电影数据 (一）（Requests, BeautifulSoup, MySQLdb,re等库)

爬虫，爬取猫眼电影

用Requests和正则表达式爬取猫眼电影(TOP100+最受期待榜）

爬虫基本库request使用—爬取猫眼电影信息

python爬虫入门——爬取猫眼电影排行（使用requests库和正则表达式）

python爬虫实战：利用pyquery爬取猫眼电影TOP100榜单内容-1

python爬虫实战：利用beautiful soup爬取猫眼电影TOP100榜单内容-1

【python爬虫自学笔记】（实战）----爬取猫眼电影榜单Top100

python 爬虫正则表达式爬取猫眼电影top100榜

python爬虫：爬取猫眼TOP100榜的100部高分经典电影

Python3爬虫入门实战系列（二）爬取猫眼电影排行榜

再一次写爬虫 - 爬取猫眼电影 Top100 榜

Python爬虫爬取猫眼电影排行

python爬虫爬取猫眼电影数据

爬虫爬取猫眼电影排行

网络爬虫爬取音乐榜单

自学python爬虫（四）Requests+正则表达式爬取猫眼电影

爬虫学习：Requests+正则表达式爬取猫眼电影

Python3编写网络爬虫04-爬取猫眼电影排行实例

网络爬虫——Requests库

python爬虫入门新手向实战 - 爬取猫眼电影Top100排行榜

今日推荐

中国码农的“35岁魔咒”

蘭雅 CorelDRAW 插件 2024.5.1 国际劳动节版，免费下载

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

《美国对全球网络空间安全与发展的威胁和破坏》报告发布

周排行

Java基础复习_day13_Collection集合

2018.11.16 c语言学习经验

且看Java内置四大核心函数式接口

小程序云开发中数据库的数据分段和显示图片

python的函数

Web-JS进阶

【干货】C++常用代码积累笔记大全

Spring的ioc操作与 IOC底层原理

构建之法20191121-11 Scrum立会报告+燃尽图 07

Spring boot之Hello World访问404

每日归档

更多

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)

2024-04-27(56)

2024-04-26(39)