10 episodes of selected python crawlers (data extraction-jsonpath module)

Data extraction-jsonpath module

  • Knowledge points

    • Understand the usage scenarios of the jsonpath module
    • Master the use of jsonpath module

1. Usage scenarios of the jsonpath module

If there is a complex dictionary with multiple levels of nesting, it is more difficult to extract values ​​in batches based on keys and subscripts. The jsonpath module can solve this pain point, and then we will learn the jsonpath module

jsonpath can perform batch data extraction of python dictionaries according to the key

  • Knowledge points: understand the usage scenarios of the jsonpath module

2. How to use the jsonpath module

2.1 Installation of jsonpath module

jsonpath is a third-party module and requires additional installation

pip install jsonpath

2.2 The method of jsonpath module to extract data

from jsonpath import jsonpath
ret = jsonpath(a, 'jsonpath语法规则字符串')

2.3 jsonpath syntax rules

Insert picture description here

2.4 jsonpath usage example

book_dict = { 
  "store": {
    "book": [ 
      { "category": "reference",
        "author": "Nigel Rees",
        "title": "Sayings of the Century",
        "price": 8.95
      },
      { "category": "fiction",
        "author": "Evelyn Waugh",
        "title": "Sword of Honour",
        "price": 12.99
      },
      { "category": "fiction",
        "author": "Herman Melville",
        "title": "Moby Dick",
        "isbn": "0-553-21311-3",
        "price": 8.99
      },
      { "category": "fiction",
        "author": "J. R. R. Tolkien",
        "title": "The Lord of the Rings",
        "isbn": "0-395-19395-8",
        "price": 22.99
      }
    ],
    "bicycle": {
      "color": "red",
      "price": 19.95
    }
  }
}

from jsonpath import jsonpath

print(jsonpath(book_dict, '$..author')) # 如果取不到将返回False # 返回列表,如果取不到将返回False

Insert picture description here

Three. jsonpath exercise

Let’s take the Lagou city JSON file http://www.lagou.com/lbs/getAllCitySearchLabels.json as an example to obtain a list of the names of all cities and write them into the file.

Reference Code:

import requests
import jsonpath
import json

# 获取拉勾网城市json字符串
url = 'http://www.lagou.com/lbs/getAllCitySearchLabels.json'
headers = {
    
    "User-Agent": "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)"}
response =requests.get(url, headers=headers)
html_str = response.content.decode()

# 把json格式字符串转换成python对象
jsonobj = json.loads(html_str)

# 从根节点开始,获取所有key为name的值
citylist = jsonpath.jsonpath(jsonobj,'$..name')

# 写入文件
with open('city_name.txt','w') as f:
    content = json.dumps(citylist, ensure_ascii=False)
    f.write(content)

Guess you like

Origin blog.csdn.net/weixin_38640052/article/details/108302478