Project Actual Combat-Analysis of Crawling Errors in Key Fields of Bidding Website

Project scene:

Need to crawl the key fields on the bidding website

Problem Description:

When running the crawler file, KeyError:'form_data'

2020-11-22 15:59:26 [scrapy.core.engine] DEBUG: Crawled (200) <POST https://ss.ebnew.com/tradingSearch/index.htm> (referer: None)
2020-11-22 15:59:28 [scrapy.core.scraper] ERROR: Error downloading <POST https://ss.ebnew.com/tradingSearch/index.htm>
Traceback (most recent call last):
  File "D:\python3.8.6\lib\site-packages\twisted\internet\defer.py", line 1416, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "D:\python3.8.6\lib\site-packages\twisted\python\failure.py", line 512, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "D:\python3.8.6\lib\site-packages\scrapy\core\downloader\middleware.py", line 45, in process_request
    return (yield download_func(request=request, spider=spider))
  File "D:\python3.8.6\lib\site-packages\scrapy\utils\defer.py", line 55, in mustbe_deferred
    result = f(*args, **kw)
  File "D:\python3.8.6\lib\site-packages\scrapy\core\downloader\handlers\__init__.py", line 75, in download_request
    return handler.download_request(request, spider)
  File "D:\python3.8.6\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 88, in download_request
    return agent.download_request(request)
  File "D:\python3.8.6\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 342, in download_request
    agent = self._get_agent(request, timeout)
  File "D:\python3.8.6\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 301, in _get_agent
    _, _, proxyHost, proxyPort, proxyParams = _parse(proxy)
  File "D:\python3.8.6\lib\site-packages\scrapy\core\downloader\webclient.py", line 36, in _parse
    return _parsed_url_args(parsed)
  File "D:\python3.8.6\lib\site-packages\scrapy\core\downloader\webclient.py", line 19, in _parsed_url_args
    host = to_bytes(parsed.hostname, encoding="ascii")
  File "D:\python3.8.6\lib\site-packages\scrapy\utils\python.py", line 106, in to_bytes
    raise TypeError('to_bytes must receive a str or bytes '
TypeError: to_bytes must receive a str or bytes object, got NoneType
2020-11-22 15:59:30 [scrapy.core.engine] DEBUG: Crawled (200) <POST https://ss.ebnew.com/tradingSearch/index.htm> (referer: https://ss.ebnew.com/tradingSearch/index.htm)
2020-11-22 15:59:31 [scrapy.core.scraper] ERROR: Spider error processing <POST https://ss.ebnew.com/tradingSearch/index.htm> (referer: https://ss.ebnew.com/tradingSearch/index.htm)
Traceback (most recent call last):
  File "D:\python3.8.6\lib\site-packages\scrapy\utils\defer.py", line 120, in iter_errback
    yield next(it)
  File "D:\python3.8.6\lib\site-packages\scrapy\utils\python.py", line 353, in __next__
    return next(self.data)
  File "D:\python3.8.6\lib\site-packages\scrapy\utils\python.py", line 353, in __next__
    return next(self.data)
  File "D:\python3.8.6\lib\site-packages\scrapy\core\spidermw.py", line 62, in _evaluate_iterable
    for r in iterable:
  File "D:\python3.8.6\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 29, in process_spider_output
    for x in result:
  File "D:\python3.8.6\lib\site-packages\scrapy\core\spidermw.py", line 62, in _evaluate_iterable
    for r in iterable:
  File "D:\python3.8.6\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 340, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "D:\python3.8.6\lib\site-packages\scrapy\core\spidermw.py", line 62, in _evaluate_iterable
    for r in iterable:
  File "D:\python3.8.6\lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 37, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "D:\python3.8.6\lib\site-packages\scrapy\core\spidermw.py", line 62, in _evaluate_iterable
    for r in iterable:
  File "D:\python3.8.6\lib\site-packages\scrapy\spidermiddlewares\depth.py", line 58, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "D:\python3.8.6\lib\site-packages\scrapy\core\spidermw.py", line 62, in _evaluate_iterable
    for r in iterable:
  File "D:\爬虫\pythonProject\CSDN热门爬虫\myspider\myspider\spiders\bilian.py", line 96, in parse_page1
    form_data=response.meta['form_data']
KeyError: 'form_data'
2020-11-22 15:59:31 [scrapy.core.engine] INFO: Closing spider (finished)

Cause Analysis:

According to the error message, locate

File "D:\爬虫\pythonProject\CSDN热门爬虫\myspider\myspider\spiders\bilian.py", line 96, in parse_page1
    form_data=response.meta['form_data']
KeyError: 'form_data'
    def parse_page1(self, response):
        form_data=response.meta['form_data']
        keyword=form_data.get('key')

Consulted with business veterans, the
return and request need to be for the same one,
see the solution below

solution:

 requset.meta['form_data'] = form_data
            yield requset

    def parse_page1(self, response):
        form_data = response.meta['form_data']
        keyword = form_data.get('key')

Students who need the source code can visit:
consulting company bidding information collection platform

Guess you like

Origin blog.csdn.net/weixin_42961082/article/details/109953836