Small climb 5: scrapy Introduction 3 persistent storage

A Both persistent storage of way

1. Based on an instruction of the terminal eating persistent store:

Features: persistent storage instruction terminal, the method can only parse the return value is stored in a disk file

So we need to be the last article of author and content as the content of the return value, we can put all the data on the contents list,

Each dictionary is stored author's name and content, it is best to return to the list of definitions

 

We next terminal in FIG run the following command

Right-click the whole project we reptile, click on the following options, synchronization of data generated

We get the following qiubai.csv content

 Thinking Can Save suffix to txt file? Only supports the following file format is not supported

Based on the command terminal: scrapy crawl qiubai -o qiubai.csv

 Pros: Convenient

Disadvantages: Strong limitation (only data files can be written to a local file extension by the specific requirements)

2. Based on the pipeline of persistent storage

Based on the pipeline:

  All operations on persistent storage must be written to the file Pipeline Pipeline

open io and a database (mysql, redis etc.), pymysql the like are based on persistent storage operation

 

We see this pipelines.py file, and that file a class

We see this pipeline class has a method item, and item parameters, can only be processed item types

# Item type data objects may be stored in persistent storage

Processed by an object of type item, we write items.py files on top of the data processing

 Here we customize, we refer to a given formula

 

Here we reptile file import classes, the explosion of red is not necessarily representative of mistakes, where you can reflect

 

Now we can instantiate an object of this item,

 Top of the for loop execution several times, process_item performed several times in the following

 

Finished function:

Finished function, we need to open the pipeline in settings.py configuration file.

What time is defined more Pipeline? What time items.py file is defined as a string type?

Run the following command:

 

Here we see the beginning and end of reptiles

This time, we have generated a persistent txt file

Note that the figure below Why not string type, use json type? Field data type is universal, what can be stored

Thinking, what time to define multiple pipeline?

一份redis,一份本地存储,一份mysql,这个时候就需要多个管道类

 导入一个pymysql

登录数据库

 

 

 

 

 

 下面,我们新建一个数据库

下面创建一个表

下面,我们连接数据库

 在open中连接数据库

mysql端口:3306

 redis端口:6379

 

Guess you like

Origin www.cnblogs.com/studybrother/p/10969239.html