[Learning] scrapy python reptile learning framework

scrapy learn, can refer to: scrapy1.5 Chinese documents, http://www.scrapyd.cn/doc/

1) Create a project

  1. Specified folder directory create a project, cmd into the folder path, use the command: scrapy startproject project name

      Create a project directory structure after the success:

      

2) write your first spider, reference: http://www.scrapyd.cn/doc/140.html

import scrapy


class mingyan(scrapy.Spider):  # 需要继承scrapy.Spider类

    name = "mingyan2"  # 定义蜘蛛名(crwal后的名称)

    start_urls = ['http://lab.scrapyd.cn']

    def parse(self, response):
        mingyan = response.css('div.quote')

        for v in mingyan:  # 循环获取每一条名言里面的:名言内容、作者、标签

            text = v.css('.text::text').extract_first()  # 提取名言
            autor = v.css('.author::text').extract_first()  # 提取作者
            tags = v.css('.tags .tag::text').extract()  # 提取标签
            tags = ','.join(tags)  # 数组转换为字符串


            #保存
            fileName = '%s-语录.txt' % autor  # 爬取的内容存入文件,文件名为:作者-语录.txt
            with open(fileName, "a+") as f:  # 不同人的名言保存在不同的txt文档,“a+”以追加的形式
                f.write(text)
                f.write('\n')  # ‘\n’ 表示换行
                f.write('标签:' + tags)
                f.write('\n-------\n')
                f.close()

3) pycharm run Scrapy reptiles project reference: https://www.cnblogs.com/llssx/p/8378832.html

     Define a py, as follows:

from scrapy import cmdline

# 参数三为爬虫的名字name
cmdline.execute(['scrapy', 'crawl', 'mingyan2'])

4) scrapy extract data:

      1. css selector
      2. scrapy extract data: xpath selector

5) scrapy command

 

 

Published 38 original articles · won praise 7 · views 30000 +

Guess you like

Origin blog.csdn.net/qq_43285577/article/details/103762089