Pytho爬虫-4567电影网电影信息爬取

需求

实现每一部电影的简介信息,例如绝地狙杀电影的简介信息。

 

 首先分析url地址,每一部电影的电影在“li”标签下面, 每一部电影简介在span标签下,接下来通过scrapy框架来获取。

# -*- coding: utf-8 -*-
import scrapy
from moviePro.items import MovieproItem

class MovieSpider(scrapy.Spider):
    name = 'movie'
    allowed_domains = ['www.4567tv.tv']
    start_urls = ['http://www.4567tv.tv/index.php/vod/show/id/5.html']
    url = 'https://www.4567tv.tv/index.php/vod/show/id/5/page/%d.html'
    pageNum = 2


    def parse(self, response):
        print('############','开始进行测试!')
        li_list = response.xpath('//ul[@class="stui-vodlist clearfix"]/li')
        for li in li_list :
            item = MovieproItem()
            item['title'] = li.xpath('./div/a/@title').extract_first()
            detail_url = 'https://www.4567tv.tv' + li.xpath('./div/a/@href').extract_first()
            #对详情页url发起请求
            #mate作用:可以将meta字典传送给callback
            yield scrapy.Request(
                url = detail_url,
                callback= self.parse_detail,meta = {'item':item}
            )

        if self.pageNum < 5:
            new_url = format(self.url%self.pageNum)
            self.pageNum = self.pageNum + 1
            yield scrapy.Request(url = new_url,callback= self.parse)
    #被作用于解析详情页的数据
    def parse_detail(self,response):
        #接受传递过来的meta
        item = response.meta['item']
        item['desc'] = response.xpath('/html/body/div[1]/div/div/div/div[2]/p[5]/span[2]')

        yield item
        print('当前item是:',item)
        #pass

得到结果如下:

 

需要获取完整代码的请点赞并私下联系获取完整代码。 

 

 

Guess you like

Origin blog.csdn.net/sl01224318/article/details/118655779