Scrapy reptile reptiles day2-- simple operation

Set setting.py

Modify the Robots Exclusion Protocol

ROBOTSTXT_OBEY = False

 

Set User-Agent

DEFAULT_REQUEST_HEADERS = {
  'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
  'Accept-Language': 'en',
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3294.99 Safari/537.36'
}

 

Add start.py

In order to use the IDE, run the Create start.py convenient reptile reptiles assembly file in the same directory

from scrapy import cmdline
cmdline.execute("scrapy crawl wx_spider".split())

 

Directory tree

E:.
│  scrapy.cfg
└─BookSpider
    │  items.py
    │  middlewares.py
    │  pipelines.py
    │  settings.pystart.py 
    │  __init__.py
    ├─spiders
    │  │  biqubao_spider.py
    │  │  __init__.py
    │  │
    │  └─__pycache__
    │          biqubao_spider.cpython-36.pyc
    │          __init__.cpython-36.pyc
    └─__pycache__
            settings.cpython-36.pyc
            __init__.cpython-36.pyc

 

Add the following code in reptiles, print out the page information

#biqubao_spider.py    
def parse(self, response):
        print("*"*50)
        print(response.text)
        print("*" * 50)

 

Guess you like

Origin www.cnblogs.com/luocodes/p/11794113.html