1, the installation scrapy
Scrapy framework installed under Windows. Since scrapy framework relies more, the installation need to pay attention to the following points:
install pipywin32 PIP PIP can not install Wheel # PIP install Twisted PIP install scrapy
2, the new project reptiles
- New Project
scrapy startproject [item name]
- New reptiles
scrapy genspider [reptile name] [domain name]
- Write reptiles file, after executing the above command. spider file in the root directory of the project to generate a file folder crawler, thought writing logic inside
Scrapy Import # Import from scrapy.selector.unified SelectorList from scrapy1_test.items Import Scrapy1TestItem class DuanziSpider (scrapy.Spider): name = 'duanzi' allowed_domains = [ 'ishuo.cn'] start_urls = [ 'http://ishuo.cn / '] DEF the parse (Self, the Response): Contents = [] content_lis = response.xpath ( "// div [@ the above mentioned id =' List '] / ul / li") for li in content_lis: Content = li.xpath ( 'are different from the etree ./div[1]/text()').get()# the text () method, an object is to get here, needs to get () may be removed before the contents inside info = li.xpath ( './div[2]/a/text ()'). GET () Item = Scrapy1TestItem (= Content Content,info = info) the yield of a single item # in this way is returned item, a list may be returned with concentrated # Contents.append(item) # return contents yield scrapy.Request