This article I want to explain CSVFeedSpider template Scrapy reptile template, the template can be said is the most Scrapy simple template, so this article will not be too long length. CSVFeedSpider template is mainly used for parsing CSV file, which is based on iterative behavior units, each iteration line calls once parse_row () method. The template used property as follows:
- delimiter: field separator, separated by commas default;
- quotechar: CSV field if the transport contains, quotes and commas, then this field must be enclosed in double quotes. This property is provided enclosed symbol fields with default is double-quotes;
- headers: CSV file header head, which is a property list.
Zero, examples
Here we crawled Guizhou Province science and technology correspondent CSV data usage CSVFeedSpider look at an example.
# -*- coding: utf-8 -*-
from scrapy.spiders import CSVFeedSpider
from ..items import CsvfeedspiderItem
class CsvdataSpider(CSVFeedSpider):
name = 'csvdata'
allowed_domains = ['gzdata.gov.cn']
start_urls = ['http://gzopen.oss-cn-guizhou-a.aliyuncs.com/科技特派员.csv']
headers = ['name', 'SearchField', 'Service', 'Specialty']
delimiter = ','
quotechar = '\n'
def parse_row(self, response, row):
i = CsvfeedspiderItem()
i["name"] = row["name"]
i["searchField"] = row["SearchField"]
i["service"] = row["Service"]
i["specialty"] = row["Specialty"]
return i
def adapt_response(self, response):
return response.body.decode('gb18030')
import scrapy
class CsvfeedspiderItem(scrapy.Item):
name = scrapy.Field()
searchField = scrapy.Field()
service = scrapy.Field()
specialty = scrapy.Field()