How to display scraped items across multiple lines in output.log?

Chk :

When I use scrapy with the command scrapy crawl my-spider --logfile=output.log, I get items and their logs without any problems. But the way they are displayed is quite displeasing to my eyes.

What I get:

...
2020-02-26 16:23:32 [scrapy.core.scraper] DEBUG: Scraped from <200 https://some-url.com>
{'key_1': 'value_1', 'key_2': 'value_2', 'key_3': 'value_3'}
...

What I want:

...
2020-02-26 16:23:32 [scrapy.core.scraper] DEBUG: Scraped from <200 https://some-url.com>
{'key_1': 'value_1',
 'key_2': 'value_2',
 'key_3': 'value_3'}
...

How can I do that?

Smuuf :

You can configure Scrapy to use your own log formatter, which will extend the standard formatter and format the payload before dumping it into the log.

As an example, let's consider the example project quotesbot mentioned in scrapy docs.

Scrapy docs say there's a setting we can use:

LOG_FORMATTER

Default: scrapy.logformatter.LogFormatter

The class to use for formatting log messages for different actions.

So, in the settings.py file you add a line which will tell Scrapy which log formatter should be used.

BOT_NAME = 'quotesbot'

SPIDER_MODULES = ['quotesbot.spiders']
NEWSPIDER_MODULE = 'quotesbot.spiders'
...
...
...
LOG_FORMATTER = 'quotesbot.my_pretty_formatter.MyPrettyFormatter'  # Add this line.

Then create the new "pretty formatter" at ./quotesbot/my_pretty_formatter.py:

from pprint import pformat
from scrapy.logformatter import LogFormatter

class MyPrettyFormatter(LogFormatter):
    def scraped(self, item, response, spider):
        pretty_item = pformat(item)  # This will prettify the item that's about to be logged.
        return super().scraped(pretty_item, response, spider)

And that's it.

Output of standard log formatter:

2020-02-26 19:57:06 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/10/>
{'text': '“... a mind needs books as a sword needs a whetstone, if it is to keep its edge.”', 'author': 'George R.R. Martin', 'tags': ['books', 'mind']}

Output of our new pretty formatter:

2020-02-26 19:55:43 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/10/>
{'author': 'George R.R. Martin',
 'tags': ['books', 'mind'],
 'text': '“... a mind needs books as a sword needs a whetstone, if it is to '
         'keep its edge.”'}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=12657&siteId=1