spider 将数据存入到mogodb mysql redis

1.首先将redis mogodb  mysql 的客户端进行安装,(环境如果需要 就请自行配置)

2.创建spider 项目 在settings中设置相应的属性(可以随机应变)

#将数据存储在mysql中的配置
MYSQL_HOST = '192.168.1.123'
MYSQL_DBNAME = 'spiderData'
MYSQL_USER = 'root'
MYSQL_PASSWD = '123456'
MYSQL_PROT = 3306
MYSQL_charset = 'utf8'
MYSQL_use_unicode = True

#mogodb
Mogodb_url='mongodb://localhost:27017'
Mogodb_DB='mogSpiderdb'
MONGO_COLL ='mogSpiderdb'
#redis
Redis_host='127.0.0.1'
Redis_indenx_DB=0

3.在pipelines中编写对应的管道

将爬取出来的数据 存入到mysql中--创建对应的存入数据的表
class SpiderdemoPipeline:
    def __init__(self):
        self.connectdb=pymysql.Connect(host=settings.MYSQL_HOST,port=settings.MYSQL_PROT
                                       ,db=settings.MYSQL_DBNAME,user=settings.MYSQL_USER,
                                       password=settings.MYSQL_PASSWD,use_unicode=settings.MYSQL_use_unicode
                                       ,charset=settings.MYSQL_charset
                                       )
        self.curous=self.connectdb.cursor()
    def process_item(self, item, spider):
        self.curous.execute("insert into  baiduTable (name ,nameurl)values(%s,%s)",(item['goodName'],item['goodimg']))
        self.connectdb.commit()
        return item
    def close_connectdb(self):
        self.curous.close()
        self.connectdb.close()
将数据存入mogodb
class SpiderdemoPipeline_mogodb:
    def __init__(self):
        self.client = pymongo.MongoClient(host=settings.Mogodb_url)
        self.db =self.client[settings.Mogodb_DB]  # 获得数据库的句柄
        self.coll = self.db[settings.MONGO_COLL]  # 获得collection的句柄
        # 数据库登录需要帐号密码的话
        # self.db.authenticate(settings['MONGO_USER'], settings['MONGO_PSW'])
    def process_item(self, item, spider):
        postItem = dict(item)  # 把item转化成字典形式
        self.coll.insert(postItem)  # 向数据库插入一条记录
        return item  # 会在控制台输出原item数据,可以选择不写
将数据写入redis
class SpiderdemoPipeline_Redis:
    def __init__(self):
        self.con_redis=redis.StrictRedis(host=settings.Redis_host,db=settings.Redis_indenx_DB)
    def process_item(self, item, spider):
        self.con_redis.rpush('url',item['goodimg']) # 向数据库插入一条记录(url为key  item['goodimg']为值)
        return item  # 会在控制台输出原item数据,可以选择不写
    def close_con_redis(self):
        self.con_redis.connection_pool.disconnect()

4.在settings中注册管道

5.将数据成功储存

mysql

mogodb

Redis

Guess you like

Origin blog.csdn.net/testManger/article/details/109995032