windows简易安装scrapy

windows简易安装scrapy


Scrapy,Python开发的一个快速、高层次的屏幕抓取和web抓取框架,用于抓取web站点并从页面中提取结构化的数据。Scrapy用途广泛,可以用于数据挖掘、监测和自动化测试。
Scrapy吸引人的地方在于它是一个框架,任何人都可以根据需求方便的修改。它也提供了多种类型爬虫的基类,如BaseSpider、sitemap爬虫等,最新版本又提供了web2.0爬虫的支持

写在前面:在本文中将使用的python版本为3.7,读者可自行选择版本。系统为windows 64位。

第一步 确保环境要求

读者请自行安装python,并将python目录,以及python目录下Scripts加入系统环境变量中。如下图所示:

注:在安装过程中如果读者勾选了将python加入环境变量,即跳过此步骤。
准备两个所需文件:**Twisted-18.7.0-cp37-cp37m-win_amd64.whl
lxml-4.2.3-cp37-cp37m-win_amd64.whl**
文件下载地址–>https://pan.baidu.com/s/1TC2q_oC5h6Z4ymRpmpSxsA (包含了3.5 以及3.7版本)
读者也可以自行下载–>非官方windows-python扩展包地址:pythonhttps://www.lfd.uci.edu/~gohlke/pythonlibs/
注:由于scrapy使用Twisted为框架,以及使用lxml解析html,在正常的安装过程中无法正确的安装这两个组件,故进行单独安装。

第二步 安装scrapy

进入Twisted-18.7.0-cp37-cp37m-win_amd64.whl 、lxml-4.2.3-cp37-cp37m-win_amd64.whl文件存放目录,使用pip命令进行安装:

C:\Users\WU\Downloads\scrapyFile>pip install lxml-4.2.3-cp37-cp37m-win_amd64.whl

C:\Users\WU\Downloads\scrapyFile>pip install Twisted-18.7.0-cp37-cp37m-win_amd64.whl

注:本人将这两个文件存放C:\Users\WU\Downloads\scrapyFile文件中
在lxml Twisted安装成功后,执行如下命令,进行scrapy安装:

pip install pywin32
pip install scrapy

注:由于python后续还将访问windows系统的API库,故需安装pywin32

第三步 验证scrapy是否安装成功

在cmd中执行‘scrapy’,出现如下信息:

C:\Users\WU\Downloads\scrapyFile>scrapy
Scrapy 1.5.1 - no active project

Usage:
  scrapy <command> [options] [args]

Available commands:
  bench         Run quick benchmark test
  fetch         Fetch a URL using the Scrapy downloader
  genspider     Generate new spider using pre-defined templates
  runspider     Run a self-contained spider (without creating a project)
  settings      Get settings values
  shell         Interactive scraping console
  startproject  Create new project
  version       Print Scrapy version
  view          Open URL in browser, as seen by Scrapy

  [ more ]      More commands available when run from project directory

Use "scrapy <command> -h" to see more info about a command

第四步 创建scrapy项目

进入工作目录,执行如下命令,即可查看对应项目的生成:

D:\pythonplace\scrapy>scrapy startproject helloworld
New Scrapy project 'helloworld', using template directory 'd:\\software\\python3.7\\lib\\site-packages\\scrapy\\templates\\project', created in:
    D:\pythonplace\scrapy\helloworld

You can start your first spider with:
    cd helloworld
    scrapy genspider example example.com

附录:
1.当读者用运行scrapy crawl xxx命令启动爬虫时,出现如下错误:

Traceback (most recent call last):
  File "/usr/local/bin/scrapy", line 11, in <module>
    sys.exit(execute())
  File "/usr/local/lib/python3.7/site-packages/scrapy/cmdline.py", line 150, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "/usr/local/lib/python3.7/site-packages/scrapy/cmdline.py", line 90, in _run_print_help
    func(*a, **kw)
  File "/usr/local/lib/python3.7/site-packages/scrapy/cmdline.py", line 157, in _run_command
    cmd.run(args, opts)
  File "/usr/local/lib/python3.7/site-packages/scrapy/commands/crawl.py", line 57, in run
    self.crawler_process.crawl(spname, **opts.spargs)
  File "/usr/local/lib/python3.7/site-packages/scrapy/crawler.py", line 170, in crawl
    crawler = self.create_crawler(crawler_or_spidercls)
  File "/usr/local/lib/python3.7/site-packages/scrapy/crawler.py", line 198, in create_crawler
    return self._create_crawler(crawler_or_spidercls)
  File "/usr/local/lib/python3.7/site-packages/scrapy/crawler.py", line 203, in _create_crawler
    return Crawler(spidercls, self.settings)
  File "/usr/local/lib/python3.7/site-packages/scrapy/crawler.py", line 55, in __init__
    self.extensions = ExtensionManager.from_crawler(self)
  File "/usr/local/lib/python3.7/site-packages/scrapy/middleware.py", line 58, in from_crawler
    return cls.from_settings(crawler.settings, crawler)
  File "/usr/local/lib/python3.7/site-packages/scrapy/middleware.py", line 34, in from_settings
    mwcls = load_object(clspath)
  File "/usr/local/lib/python3.7/site-packages/scrapy/utils/misc.py", line 44, in load_object
    mod = import_module(module)
  File "/usr/local/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/usr/local/lib/python3.7/site-packages/scrapy/extensions/telnet.py", line 12, in <module>
    from twisted.conch import manhole, telnet
  File "/usr/local/lib/python3.7/site-packages/twisted/conch/manhole.py", line 154
    def write(self, data, async=False):
                              ^
SyntaxError: invalid syntax

请找到python目录下Lib/site-packages/twisted/conch/manhole.py文件的154、155、240、241、247行的async重命名
如下:

154    def write(self, data, async1=False):
155        self.handler.addOutput(data, async1)

       ........

240    def addOutput(self, data, async1=False):
241        if async1:
       ........

247        if async1:

猜你喜欢

转载自blog.csdn.net/belonghuang157405/article/details/81207212