Scrapy: Scrape the content of the website that returns the data format as JSON

The data of some websites is obtained through ajax requests, or apis in json format are provided.

For example, for the following data:

{
        {
            "url": "http://www.techbrood.com/news/1",
            "author": "iefreer",
            "title": "techbrood Co. test 1"
        },
        {
            "url": "http://www.techbrood.com/news/2",
            "author": "ryan.chen",
            "title": "techbrood Co. test 2"
        }
}

In Scrapy, just simply change the parse function:

def parse(self, response):
        #调用body_as_unicode()是为了能处理unicode编码的数据。
        sites = json.loads(response.body_as_unicode())
        for site in sites:
        	print site['url']

Call body_as_unicode() in order to be able to process unicode encoded data.

Guess you like

Origin blog.csdn.net/Candyys/article/details/109806026