Create project: scrapy startproject myproject cd myproject Create a spider scrapy genspider spidername spiderurl.com See all commands: scrapy -h Global command: start project settings runspider shell fetch view version Project command: crawl check list edit parse genspider bench Create project: start project scrapy startproject myproject View templates: scrapy genspider -l (basic, crawl, csvfeed, xmlfeed) Create spider in current project (using template: -t basic) scrapy genspider [-t template] <spiderName> <spiderUrl> run spider scrapy crawl myspidername Save the json file .xml, .jl... scrapy crawl myspider -o fileName.json Check the project code: scrapy check [-l] [spider] fetch to view the returned content of the webpage: scrapy fetch <url> Generate static pages scrapy view url scrapy terminal scrapy shell url ### parse syntax: scrapy parse <url> [options] settings: view settings scrapy settings --get BOT_NAME scrapy settings --get DOWNLOAD_DELAY run a spider runspider: scrapy runspider myspider.py Selector use: Get the text under the title tag (the first) response.selector.xpath('//title/text()').extract_first() response.css('title::text').extract_first() Get the text under the title tag (all) response.selector.xpath('//title/text()').extract() get subtag text <div id="images"> <a></a> </div> response.xpath('//div[@id="images"]/a/text()').extract_first() get attribute href attribute of base tag response.xpath('//base/@href').extract() response.css('base::attr(href)').extract() href contains image's response.css('a[href*=image]::attr(href)').extract() response.xpath('//a[contains(@href,"image")]/@href').extract() The a tag contains the src attribute of the subtag img of the image response.xpath('//a[contains(@href,"image")]/img/@src').extract() response.css('a[href*="image"] img::attr(src)').extract() reselector response.xpath().re('Name:(.*)') to get all matching(), re_first() to get the first one Returns None if no match Also .extract_first('custom return')