Scrapy reptile template --CSVFeedSpider

This article I want to explain CSVFeedSpider template Scrapy reptile template, the template can be said is the most Scrapy simple template, so this article will not be too long length. CSVFeedSpider template is mainly used for parsing CSV file, which is based on iterative behavior units, each iteration line calls once parse_row () method. The template used property as follows:

  1. delimiter: field separator, separated by commas default;
  2. quotechar: CSV field if the transport contains, quotes and commas, then this field must be enclosed in double quotes. This property is provided enclosed symbol fields with default is double-quotes;
  3. headers: CSV file header head, which is a property list.

Zero, examples

Here we crawled Guizhou Province science and technology correspondent CSV data usage CSVFeedSpider look at an example.

# -*- coding: utf-8 -*-
from scrapy.spiders import CSVFeedSpider
from ..items import CsvfeedspiderItem


class CsvdataSpider(CSVFeedSpider):
    name = 'csvdata'
    allowed_domains = ['gzdata.gov.cn']
    start_urls = ['http://gzopen.oss-cn-guizhou-a.aliyuncs.com/科技特派员.csv']
    headers = ['name', 'SearchField', 'Service', 'Specialty']
    delimiter = ','
    quotechar = '\n'

    def parse_row(self, response, row):
        i = CsvfeedspiderItem()
        i["name"] = row["name"]
        i["searchField"] = row["SearchField"]
        i["service"] = row["Service"]
        i["specialty"] = row["Specialty"]
        return i

    def adapt_response(self, response):
        return response.body.decode('gb18030')
import scrapy


class CsvfeedspiderItem(scrapy.Item):
    name = scrapy.Field()
    searchField = scrapy.Field()
    service = scrapy.Field()
    specialty = scrapy.Field()
Published 204 original articles · won praise 101 · Views 350,000 +

Guess you like

Origin blog.csdn.net/gangzhucoll/article/details/103849376