requests爬取猫眼电影top100 - 代码天地

requests爬取猫眼电影top100

编程语言 2018-05-09 15:35:31 阅读次数: 1

import requests
from requests.exceptions import RequestException
import re
import json
from multiprocessing import Pool
def page_one_html(url):
    try:
        response = requests.get(url);
        if response.status_code == 200:
            return response.text
        else:
            return None
    except RequestException:
        return None

def parse_page_html(content):
    pattern = re.compile('<dd>.*?board-index.*?>(\d+)</i>.*?title="(.*?)".*?data-src="(.*?)".*?'
                         +'star">(.*?)</p>.*?releasetime">(.*?)</p>.*?integer">(.*?)</i>.*?fraction">(.*?)</i>'
                         +'.*?</dd>',re.S)
    items = re.findall(pattern,content)
    #print(items)
    for item in items:
        yield{
            "index":item[0],
            "title": item[1],
            "image":item[2],
            "actor":item[3].strip()[3:],
            "createTime":item[4].strip()[4:],
            "score":str(item[5])+str(item[6])
        }

def write_text(item):
    with open("result.txt","a",encoding="utf-8") as f:
        f.write(json.dumps(item,ensure_ascii=False) + "\n")
        f.close()


def main(offset):
    url = "http://maoyan.com/board/4?offset="+str(offset)
    html = page_one_html(url)
    #print(html)
    for item in parse_page_html(html):
        write_text(item)

if __name__=="__main__":
     pool = Pool()
     pool.map(main,[i*10 for i in range(10)])

from requests.exceptions import RequestException 异常处理很重要

import re

import json

from multiprocessing import Pool 线程池下线搜搜的

猜你喜欢

转载自394498036.iteye.com/blog/2409569

requests爬取猫眼电影top100

requests和lxml爬取猫眼电影TOP100

爬取猫眼电影Top100

猫眼电影Top100爬取

python：猫眼电影TOP100的电影爬取

Python爬取猫眼电影排行TOP100的电影

爬取猫眼电影top100电影

笔记：Requests+正则爬取猫眼电影top100

【Python】Requests+正则表达式爬取猫眼电影TOP100

利用requests和正则爬取猫眼电影top100榜单

使用requests和xpath爬取猫眼TOP100电影

Requests+正则表达式爬取猫眼TOP100电影

requests+re+multiprocessing爬取猫眼电影top100

爬取猫眼电影榜单Top100—利用requests、正则表达式

requests+正则表达式爬取猫眼电影TOP100

python爬虫-利用requests库爬取猫眼电影top100

7.5爬取猫眼Top100电影名单

爬取猫眼电影TOP100榜

爬取猫眼电影top100信息

python爬虫爬取猫眼电影Top100

爬虫练习 | 爬取猫眼电影Top100

python爬虫，爬取猫眼电影top100

python爬取猫眼电影top100

python应用-爬取猫眼电影top100

爬虫六之爬取猫眼电影top100

python爬取猫眼电影的Top100

python爬虫入门 ✦ 爬取猫眼电影Top100

python爬虫入门 ✦ 爬取猫眼电影Top100

Python爬取猫眼电影top100数据

爬取猫眼电影榜单TOP100

今日推荐

《美国对全球网络空间安全与发展的威胁和破坏》报告发布

火速冲上 GitHub 热榜 —— 开源编程语言、框架哪有这么可爱？

北京人形机器人创新中心发布全球首个纯电驱拟人奔跑的全尺寸人形机器人“天工”

LFOSSA 源来如此公开课 | 掌握云原生未来：CNCF 认证全面攻略与备考秘籍

周排行

循环神经网络（rnn）讲解

Tigao教程四：单独的关节运动

金蝶K3WISE15.0-注册套打教程

如何在Mac上配置Kubernetes

Android应用结束自身进程的方法

SpringMVC学习十三拦截器栈

中国驻洛杉矶总领馆举行新春招待会

HttpClient get post 发送

11 - three.js 笔记 - 绘制三维字体模型

Mysql递归获取某个父节点下面的所有子节点和子节点上的所有父节点

每日归档

更多

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)

2024-04-27(56)

2024-04-26(39)

2024-04-25(22)

2024-04-24(36)

2024-04-23(26)

2024-04-22(39)