A Both persistent storage of way
1. Based on an instruction of the terminal eating persistent store:
Features: persistent storage instruction terminal, the method can only parse the return value is stored in a disk file
So we need to be the last article of author and content as the content of the return value, we can put all the data on the contents list,
Each dictionary is stored author's name and content, it is best to return to the list of definitions
We next terminal in FIG run the following command
Right-click the whole project we reptile, click on the following options, synchronization of data generated
We get the following qiubai.csv content
Thinking Can Save suffix to txt file? Only supports the following file format is not supported
Based on the command terminal: scrapy crawl qiubai -o qiubai.csv
Pros: Convenient
Disadvantages: Strong limitation (only data files can be written to a local file extension by the specific requirements)
2. Based on the pipeline of persistent storage
Based on the pipeline:
All operations on persistent storage must be written to the file Pipeline Pipeline
open io and a database (mysql, redis etc.), pymysql the like are based on persistent storage operation
We see this pipelines.py file, and that file a class
We see this pipeline class has a method item, and item parameters, can only be processed item types
# Item type data objects may be stored in persistent storage
Processed by an object of type item, we write items.py files on top of the data processing
Here we customize, we refer to a given formula
Here we reptile file import classes, the explosion of red is not necessarily representative of mistakes, where you can reflect
Now we can instantiate an object of this item,
Top of the for loop execution several times, process_item performed several times in the following
Finished function:
Finished function, we need to open the pipeline in settings.py configuration file.
What time is defined more Pipeline? What time items.py file is defined as a string type?
Run the following command:
Here we see the beginning and end of reptiles
This time, we have generated a persistent txt file
Note that the figure below Why not string type, use json type? Field data type is universal, what can be stored
Thinking, what time to define multiple pipeline?
一份redis,一份本地存储,一份mysql,这个时候就需要多个管道类
导入一个pymysql
登录数据库
下面,我们新建一个数据库
下面创建一个表
下面,我们连接数据库
在open中连接数据库
mysql端口:3306
redis端口:6379