scrapy01-scrapy project creation and start

1, the installation scrapy

Scrapy framework installed under Windows. Since scrapy framework relies more, the installation need to pay attention to the following points:

install pipywin32 PIP 
PIP can not install Wheel # 
PIP install Twisted 
PIP install scrapy

2, the new project reptiles

  • New Project
scrapy startproject [item name]
  • New reptiles
scrapy genspider [reptile name] [domain name]
  • Write reptiles file, after executing the above command. spider file in the root directory of the project to generate a file folder crawler, thought writing logic inside
Scrapy Import 
# Import from scrapy.selector.unified SelectorList 
from scrapy1_test.items Import Scrapy1TestItem 

class DuanziSpider (scrapy.Spider): 
    name = 'duanzi' 
    allowed_domains = [ 'ishuo.cn'] 
    start_urls = [ 'http://ishuo.cn / '] 

    DEF the parse (Self, the Response): 
        Contents = [] 
        content_lis = response.xpath ( "// div [@ the above mentioned id =' List '] / ul / li") 
        for li in content_lis: 
            Content = li.xpath ( 'are different from the etree ./div[1]/text()').get()# the text () method, an object is to get here, needs to get () may be removed before the contents inside 
            info = li.xpath ( './div[2]/a/text ()'). GET () 
            Item = Scrapy1TestItem (= Content Content,info = info) 
            the yield of a single item # in this way is returned item, a list may be returned with concentrated 
            # Contents.append(item)
        # return contents
        yield scrapy.Request

  

Guess you like

Origin www.cnblogs.com/gzwzx/p/12003106.html