python_scrapy_TypeError: 'LuboavSpider' object is not iterable问题及解决

问题描述:在使用scrapy进行网络爬虫的时候,在pipelines处理结果,并保存到db中的时候出现了TypeError: 'LuboavSpider' object is not iterable错误

# -*- coding: utf-8 -*-

# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html
from scrapy.exceptions import DropItem
import pymongo


class DemoPipeline(object):
    limit = 8

    def process_item(self, item, spider):
        if item:
            if len(item["tittle"]) > self.limit:
                item["tittle"] = item["tittle"][:self.limit].rstrip() + "..."
            else:
                pass
            return item
        else:
            return DropItem("item missing")


class MongoPipeling(object):
    def __init__(self, mongouri, mondb):
        self.mongouri = mongouri
        self.mongodb = mondb

    @classmethod
    def from_crawler(cls, crawler):
        return cls(
            mongouri=crawler.settings.get('MONGOURI'),
            mondb=crawler.settings.get('MONGODB')
        )

    def open_spider(self, spider):
        self.client = pymongo.MongoClient(self.mongouri)
        self.db = self.client[self.mongodb]

    def close_spider(self, spider):
        self.client.close()

    def process_item(self, spider, item):
        self.db["luboav_scrapy"].insert(dict(item))
        return item

原因分析:

1.代码中管道文件使用了两个类来对调度器中传过来的item进行处理,一个为对某字段的长度进行限制,另一个将将数据保存到mongodb中,所以在两个类的处理结果方法中,都必须进行项目的返回操作,将结果传递给下一个处理方法或者返回给调度器

2.MongoPipeling类的process_item方法,第二个参数是spider,第三个参数是item,也就是说spider是从上一个类传过来的项目,虽然它是spider。

修改后代码

# -*- coding: utf-8 -*-

# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html
from scrapy.exceptions import DropItem
import pymongo


class DemoPipeline(object):
    limit = 8

    def process_item(self, item, spider):
        if item:
            if len(item["tittle"]) > self.limit:
                item["tittle"] = item["tittle"][:self.limit].rstrip() + "..."
            else:
                pass
            return item
        else:
            return DropItem("item missing")


class MongoPipelin(object):
    def __init__(self, mongouri, mondb):
        self.mongouri = mongouri
        self.mongodb = mondb

    @classmethod
    def from_crawler(cls, crawler):
        return cls(
            mongouri=crawler.settings.get('MONGOURI'),
            mondb=crawler.settings.get('MONGODB')
        )

    def open_spider(self, spider):
        self.client = pymongo.MongoClient(self.mongouri)
        self.db = self.client[self.mongodb]

    def close_spider(self, spider):
        self.client.close()

    def process_item(self, item, spider):
        self.db["luboav_scrapy"].insert(dict(item))
        return item

再次执行程序,运行成功:

猜你喜欢

转载自blog.csdn.net/jss19940414/article/details/85226767
今日推荐