问题描述:在使用scrapy进行网络爬虫的时候,在pipelines处理结果,并保存到db中的时候出现了TypeError: 'LuboavSpider' object is not iterable错误
# -*- coding: utf-8 -*-
# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html
from scrapy.exceptions import DropItem
import pymongo
class DemoPipeline(object):
limit = 8
def process_item(self, item, spider):
if item:
if len(item["tittle"]) > self.limit:
item["tittle"] = item["tittle"][:self.limit].rstrip() + "..."
else:
pass
return item
else:
return DropItem("item missing")
class MongoPipeling(object):
def __init__(self, mongouri, mondb):
self.mongouri = mongouri
self.mongodb = mondb
@classmethod
def from_crawler(cls, crawler):
return cls(
mongouri=crawler.settings.get('MONGOURI'),
mondb=crawler.settings.get('MONGODB')
)
def open_spider(self, spider):
self.client = pymongo.MongoClient(self.mongouri)
self.db = self.client[self.mongodb]
def close_spider(self, spider):
self.client.close()
def process_item(self, spider, item):
self.db["luboav_scrapy"].insert(dict(item))
return item
原因分析:
1.代码中管道文件使用了两个类来对调度器中传过来的item进行处理,一个为对某字段的长度进行限制,另一个将将数据保存到mongodb中,所以在两个类的处理结果方法中,都必须进行项目的返回操作,将结果传递给下一个处理方法或者返回给调度器
2.MongoPipeling类的process_item方法,第二个参数是spider,第三个参数是item,也就是说spider是从上一个类传过来的项目,虽然它是spider。
修改后代码
# -*- coding: utf-8 -*-
# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html
from scrapy.exceptions import DropItem
import pymongo
class DemoPipeline(object):
limit = 8
def process_item(self, item, spider):
if item:
if len(item["tittle"]) > self.limit:
item["tittle"] = item["tittle"][:self.limit].rstrip() + "..."
else:
pass
return item
else:
return DropItem("item missing")
class MongoPipelin(object):
def __init__(self, mongouri, mondb):
self.mongouri = mongouri
self.mongodb = mondb
@classmethod
def from_crawler(cls, crawler):
return cls(
mongouri=crawler.settings.get('MONGOURI'),
mondb=crawler.settings.get('MONGODB')
)
def open_spider(self, spider):
self.client = pymongo.MongoClient(self.mongouri)
self.db = self.client[self.mongodb]
def close_spider(self, spider):
self.client.close()
def process_item(self, item, spider):
self.db["luboav_scrapy"].insert(dict(item))
return item
再次执行程序,运行成功: