プロジェクトの実際の戦闘-入札ウェブサイトの主要分野におけるクロールエラーの分析

プロジェクトシーン:

入札ウェブサイトのキーフィールドをクロールする必要があります

問題の説明:

クローラーファイルを実行すると、KeyError: 'form_data'

2020-11-22 15:59:26 [scrapy.core.engine] DEBUG: Crawled (200) <POST https://ss.ebnew.com/tradingSearch/index.htm> (referer: None)
2020-11-22 15:59:28 [scrapy.core.scraper] ERROR: Error downloading <POST https://ss.ebnew.com/tradingSearch/index.htm>
Traceback (most recent call last):
  File "D:\python3.8.6\lib\site-packages\twisted\internet\defer.py", line 1416, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "D:\python3.8.6\lib\site-packages\twisted\python\failure.py", line 512, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "D:\python3.8.6\lib\site-packages\scrapy\core\downloader\middleware.py", line 45, in process_request
    return (yield download_func(request=request, spider=spider))
  File "D:\python3.8.6\lib\site-packages\scrapy\utils\defer.py", line 55, in mustbe_deferred
    result = f(*args, **kw)
  File "D:\python3.8.6\lib\site-packages\scrapy\core\downloader\handlers\__init__.py", line 75, in download_request
    return handler.download_request(request, spider)
  File "D:\python3.8.6\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 88, in download_request
    return agent.download_request(request)
  File "D:\python3.8.6\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 342, in download_request
    agent = self._get_agent(request, timeout)
  File "D:\python3.8.6\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 301, in _get_agent
    _, _, proxyHost, proxyPort, proxyParams = _parse(proxy)
  File "D:\python3.8.6\lib\site-packages\scrapy\core\downloader\webclient.py", line 36, in _parse
    return _parsed_url_args(parsed)
  File "D:\python3.8.6\lib\site-packages\scrapy\core\downloader\webclient.py", line 19, in _parsed_url_args
    host = to_bytes(parsed.hostname, encoding="ascii")
  File "D:\python3.8.6\lib\site-packages\scrapy\utils\python.py", line 106, in to_bytes
    raise TypeError('to_bytes must receive a str or bytes '
TypeError: to_bytes must receive a str or bytes object, got NoneType
2020-11-22 15:59:30 [scrapy.core.engine] DEBUG: Crawled (200) <POST https://ss.ebnew.com/tradingSearch/index.htm> (referer: https://ss.ebnew.com/tradingSearch/index.htm)
2020-11-22 15:59:31 [scrapy.core.scraper] ERROR: Spider error processing <POST https://ss.ebnew.com/tradingSearch/index.htm> (referer: https://ss.ebnew.com/tradingSearch/index.htm)
Traceback (most recent call last):
  File "D:\python3.8.6\lib\site-packages\scrapy\utils\defer.py", line 120, in iter_errback
    yield next(it)
  File "D:\python3.8.6\lib\site-packages\scrapy\utils\python.py", line 353, in __next__
    return next(self.data)
  File "D:\python3.8.6\lib\site-packages\scrapy\utils\python.py", line 353, in __next__
    return next(self.data)
  File "D:\python3.8.6\lib\site-packages\scrapy\core\spidermw.py", line 62, in _evaluate_iterable
    for r in iterable:
  File "D:\python3.8.6\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 29, in process_spider_output
    for x in result:
  File "D:\python3.8.6\lib\site-packages\scrapy\core\spidermw.py", line 62, in _evaluate_iterable
    for r in iterable:
  File "D:\python3.8.6\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 340, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "D:\python3.8.6\lib\site-packages\scrapy\core\spidermw.py", line 62, in _evaluate_iterable
    for r in iterable:
  File "D:\python3.8.6\lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 37, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "D:\python3.8.6\lib\site-packages\scrapy\core\spidermw.py", line 62, in _evaluate_iterable
    for r in iterable:
  File "D:\python3.8.6\lib\site-packages\scrapy\spidermiddlewares\depth.py", line 58, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "D:\python3.8.6\lib\site-packages\scrapy\core\spidermw.py", line 62, in _evaluate_iterable
    for r in iterable:
  File "D:\爬虫\pythonProject\CSDN热门爬虫\myspider\myspider\spiders\bilian.py", line 96, in parse_page1
    form_data=response.meta['form_data']
KeyError: 'form_data'
2020-11-22 15:59:31 [scrapy.core.engine] INFO: Closing spider (finished)

原因分析:

エラーメッセージに従って、

File "D:\爬虫\pythonProject\CSDN热门爬虫\myspider\myspider\spiders\bilian.py", line 96, in parse_page1
    form_data=response.meta['form_data']
KeyError: 'form_data'
    def parse_page1(self, response):
        form_data=response.meta['form_data']
        keyword=form_data.get('key')

ビジネスのベテランと相談して、
返品とリクエストは同じものである必要があり
ます。以下の解決策を参照してください

解決:

 requset.meta['form_data'] = form_data
            yield requset

    def parse_page1(self, response):
        form_data = response.meta['form_data']
        keyword = form_data.get('key')

ソースコードが必要な学生は、次のサイトにアクセスできます。
コンサルティング会社の入札情報収集プラットフォーム

おすすめ

転載: blog.csdn.net/weixin_42961082/article/details/109953836