需求
实现每一部电影的简介信息,例如绝地狙杀电影的简介信息。
首先分析url地址,每一部电影的电影在“li”标签下面, 每一部电影简介在span标签下,接下来通过scrapy框架来获取。
# -*- coding: utf-8 -*-
import scrapy
from moviePro.items import MovieproItem
class MovieSpider(scrapy.Spider):
name = 'movie'
allowed_domains = ['www.4567tv.tv']
start_urls = ['http://www.4567tv.tv/index.php/vod/show/id/5.html']
url = 'https://www.4567tv.tv/index.php/vod/show/id/5/page/%d.html'
pageNum = 2
def parse(self, response):
print('############','开始进行测试!')
li_list = response.xpath('//ul[@class="stui-vodlist clearfix"]/li')
for li in li_list :
item = MovieproItem()
item['title'] = li.xpath('./div/a/@title').extract_first()
detail_url = 'https://www.4567tv.tv' + li.xpath('./div/a/@href').extract_first()
#对详情页url发起请求
#mate作用:可以将meta字典传送给callback
yield scrapy.Request(
url = detail_url,
callback= self.parse_detail,meta = {'item':item}
)
if self.pageNum < 5:
new_url = format(self.url%self.pageNum)
self.pageNum = self.pageNum + 1
yield scrapy.Request(url = new_url,callback= self.parse)
#被作用于解析详情页的数据
def parse_detail(self,response):
#接受传递过来的meta
item = response.meta['item']
item['desc'] = response.xpath('/html/body/div[1]/div/div/div/div[2]/p[5]/span[2]')
yield item
print('当前item是:',item)
#pass
得到结果如下:
需要获取完整代码的请点赞并私下联系获取完整代码。