爬虫实践--豆瓣电影当前上映电影信息爬取

版权声明:个人博客网站:https://cunyu1943.github.io/, 欢迎访问留言交流,转载请注明出处! https://blog.csdn.net/github_39655029/article/details/84100458

目标

爬取当前时间段豆瓣电影中正在上映的电影的相关信息,如电影名、导演、演员表、上映时间、制作方等信息,然后再通过字典的方式,将其保存在本地文件当中,以便我们查询;

Code

#!/usr/bin/python3
# -*- coding:utf-8 -*-
# @Time    : 2018-11-15 8:24
# @Author  : Manu
# @Site    : 
# @File    : doubanMovie.py
# @Software: PyCharm

import pprint
import requests
from lxml import etree


'''
爬取豆瓣电影上当前正在上映的电影信息,并保存到文件中
'''

HEADERS = {
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
                 'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36',
    'Refer':'https://movie.douban.com/'
}

url = 'https://movie.douban.com/cinema/nowplaying/'
response = requests.get(url, headers=HEADERS)

print(response.text)
text = response.text

html = etree.HTML(text)
ul = html.xpath("//ul[@class='lists']")[0]
lis = ul.xpath('./li')
movies = []
for li in lis:
    title = li.xpath('@data-title')[0]
    score = li.xpath('@data-score')[0]
    release = li.xpath('@data-release')[0]
    duration = li.xpath('@data-duration')[0]
    region = li.xpath('@data-region')[0]
    director = li.xpath('@data-director')[0]
    actors = li.xpath('@data-actors')[0]
    thumbnail = li.xpath('.//img/@src')[0]
    movie = {
        '电影名':title,
        '评分':score,
        "上映时间":release,
        '片长':duration,
        '制片国家':region,
        '导演':director,
        '演员表':actors,
        '海报':thumbnail
    }
    movies.append(movie)
pprint.pprint(movies)
with open('豆瓣正在上映.txt', 'w', encoding='utf-8') as movie_file:
    for movie in movies:
        movie_file.write('电影名:' + movie['电影名'] + '\n')
        movie_file.write('评分:' + movie['评分'] + '\n')
        movie_file.write('上映时间:' + movie['上映时间'] + '\n')
        movie_file.write('片长:' + movie['片长'] + '\n')
        movie_file.write('制片国家:' + movie['制片国家'] + '\n')
        movie_file.write('导演:' + movie['导演'] + '\n')
        movie_file.write('演员表:' + movie['演员表'] + '\n')
        movie_file.write('海报:' + movie['海报'] + '\n')
        movie_file.write('\n')

结果

猜你喜欢

转载自blog.csdn.net/github_39655029/article/details/84100458