Not convinced, there are so many ways for Python to manipulate JSON~

image

Author: Peter
Source: Python programming time

 

In actual work, especially the transmission of web data, we often encounter json data. It is not as friendly as common text data and numerical data, and it is very similar to dictionary data in Python, which has caused a lot of confusion to many people.

This article describes in detail how to use Python and pandas (Python's third-party library) to process json data based on specific cases. The main contents include:

  • Introduction to json data

  • Commonly used json data conversion website

  • Conversion of json data and python data

  • pandas handles json data

Many people learn python and don't know where to start.
Many people learn python and after mastering the basic grammar, they don't know where to find cases to get started.
Many people who have done case studies do not know how to learn more advanced knowledge.
So for these three types of people, I will provide you with a good learning platform, free to receive video tutorials, e-books, and the source code of the course!
QQ group: 721195303

 

1. A brief introduction to JSON

1.1 What is json data

First, let's look at an explanation of json from Wikipedia:

JSON ( J AVA S cript  O bject  N otation, JavaScript Object Notation) is an exchange language by the Douglas Crockford concept and design, lightweight materials, the language makes for easy reading of text-based, Used to transmit data objects composed of attribute values ​​or sequential values.

The JSON data format has nothing to do with language. Even though it is derived from JavaScript, many programming languages ​​currently support the generation and parsing of JSON format data. The file extension is  .json.

Through the above official introduction, we summarize 3 points:

  • JSON is a text (data) language, an ultra-lightweight data exchange format

  • JSON data is easy to read and has strong legibility

  • Derived from JavaScript, other languages ​​can parse JSON data

1.2 json data type

JSON is actually a subset of JavaScript, the only 6 data types in the JSON language or any combination between them:

  • number: consistent with the number in JavaScript

  • boolean: true or false in JavaScript

  • string: string in JavaScript

  • null: null in JavaScript

  • array: JavaScript representation: []

  • object: JavaScript {…}representation

1.3 Two rules

1. The JSON language stipulates that the character set must be UTF-8

2. In order to parse uniformly, JSON string requirements must be double quotes""

2. Commonly used json data conversion website

1、json.cn:https://www.json.cn/

2. json rookie tool: https://c.runoob.com/front-end/53

3. Sojson: https://www.sojson.com/, a very complete json processing website

4、kjson:https://www.kjson.com/

5. Programming Lion-json check tool: https://www.w3cschool.cn/tools/index?name=jsoncheck

6. JSONViewer: http://jsonviewer.stack.hu/, an online application tool used to check whether the Json format is correct

3. JSON and Dict type conversion

This section mainly explains the conversion of json type data and Python type.

jsonPythonThe conversion of objects and dictionaries mainly uses the built-in jsonpackage. The use of this package is described in detail below. For detailed learning materials, please refer to the official website: https://docs.python.org/3/library/json.html

Import the package directly when you first use it:

import json

jsonThere are 4 methods in the package to convert with Python's built-in data types:

method effect
json.dumps() Encode a python object into a Json string: dictionary to json
json.loads() Decode Json string into python object: json to dictionary
json.dump() Convert objects in python into json and store them in a file
json.load() Convert the json format in the file into a python object and extract it

Note: The two methods related to load are just one more step related to file operations.

json.dumps

The two functions related to dump are to convert Python data type to json type. The conversion comparison table is as follows:

Python JSON
dict object
list, tuple array
str, unicode string
int, long, float number
True true
False false
None null

json.dumpsThe function of the method is to convert Python dictionary type data into json format data. The specific parameters are as follows:

json.dumps(obj,   # 待转化的对象
           skipkeys=False,  # 默认值是False,若dict的keys内的数据不是python的基本类型(str,unicode,int,long,float,bool,None),设置为False时,就会报TypeError的错误。此时设置成True,则会跳过这类key 
           ensure_ascii=True,  # 默认是ASCII码,若设置成False,则可以输出中文
           check_circular=True,  # 若为False,跳过对容器类型的循环引用检查
           allow_nan=True,  # 若allow_nan为假,则ValueError将序列化超出范围的浮点值(nan、inf、-inf),严格遵守JSON规范,而不是使用JavaScript等价值(nan、Infinity、-Infinity)
           cls=None, 
           indent=None, # 参数根据格式缩进显示,表示缩进几个空格
           separators=None,   # 指定分隔符;包含不同dict项之间的分隔符和key与value之间的分隔符;同时去掉`: `
           encoding="utf-8",  # 编码
           default=None, # 默认是一个函数,应该返回可序列化的obj版本或者引发类型错误;默认值是只引发类型错误
           sort_keys=False,  # 若为False,则字典的键不排序;设置成True,按照字典排序(a到z) 
           **kw)

Explain the role of the above common parameters through examples

1. When there is Chinese in our Python type data

information1 = {
    'name': '小明',
    'age': 18,
    'address': 'shenzhen'
}
# 字典转成json数据
information2 = json.dumps(information1)

print(type(information1))
print(type(information2))
print(information2)

 

image

Add ensure_ascii=Falseparameters to display Chinese:

# 字典转成json数据
information3 = json.dumps(information1,ensure_ascii=False)

 

image

⚠️Through the results, we found that: json data has all become double quotes, the original dictionary type data used single quotes , let's look at an example of the change of quotes:

>>> import json
>>> print(json.dumps({'4': 5, '6': 7}, sort_keys=True, indent=4))  # python中的键是字符串,用单引号

# 结果显示
{
    "4": 5,  # 变成双引号
    "6": 7
}

2. The json data is beautifully output through indentation, and the indent parameter is used

information4 = {
    'name': '小明',
    'age': 18,
    'skills': 'python',
    'english': 'CET6',
    'major': '会计',
    'address': '深圳'
}

information5 = json.dumps(information4, ensure_ascii=False)   # 不缩进
information6 = json.dumps(information4, ensure_ascii=False, indent=2)  # 缩进2个空格  
information7 = json.dumps(information4, ensure_ascii=False, indent=5)  # 缩进5个空格


print(information5)
print(information6)
print(information7)

 

image

3. Sort and output the keys in the Python data type

information4 = {
    'name': '小明',
    'age': 18,
    'skills': 'python',
    'english': 'CET6',
    'major': '会计',
    'address': '深圳'
}

information8 = json.dumps(information4, ensure_ascii=False, indent=2)  # 
information9 = json.dumps(information4, ensure_ascii=False, indent=2,sort_keys=True)  #  键的排序设置成True 

print(information8)
print(information9)

 

image

Through sort_keys=Truethe settings, you can observe that the output results are sorted by the first letter; when the first letter is the same, the second letter will be sorted again.

4. Control of output separator

Use separatorsparameters to set different output separators; the default is between different dic elements, and the default between key-value pairs is:

information1 = {
    'name': '小明',
    'age': 18,
    'address': 'shenzhen'
}

information2 = json.dumps(information1,ensure_ascii=False)
information10 = json.dumps(information1,ensure_ascii=False,separators=('+','@'))  # 改变分隔符

print(information2)  # 默认连接符
print(information10)  

 

image

json.dump

json.dumpThe function is json.dumpssimilar, but the data needs to be stored in the file, the parameters of the two are the same

We try to write the following personal information into the file

information = {
    'name': '小明',
    'age': 18,
    'skills': 'python',
    'english': 'CET6',
    'major': '会计',
    'address': '深圳'
}

1. If no indentparameters are used , all information is displayed as one line

# 使用json.dump;json数据一定是双引号

with open("information_1_to_json.json", "w", encoding='utf-8') as f:
    # json.dump(dic_, f) # 全部写入一行数据,不换行
    json.dump(information,   # 待写入数据
              f, # File对象
              sort_keys=True,  # 键的排序
              ensure_ascii=False)  # 显示中文

Take a look at the actual saving effect:

image

Adding indentparameters will display multiple rows of data:

with open("information_2_to_json.json", "w", encoding='utf-8') as f:
    json.dump(information, 
              f, 
              indent=2,  # 空格缩进符,写入多行
              sort_keys=True, 
              ensure_ascii=False) 

image

json.loads

And loadtwo related functions are converted into the Python json data type conversion table as follows:

JSON Python
object dict
array list
string unicode
number (int) int, long
number (real) float
true True
false False
null None

json.loadsThe function is to convert json format data into Python dictionary type data.

information1 = {
    'name': '小明',
    'age': 18,
    'address': 'shenzhen'
}
# 字典转成json数据
information3 = json.dumps(information1,ensure_ascii=False)

information11 = json.loads(information3)  # json转成字典数据
print(information11)

 

image

json.load

Open the json file and convert it to dictionary data

# 使用json.load

with open("information_to_json.json",encoding="utf-8") as f:
    json_to_dict = json.load(f)  # json转成字典

print(json_to_dict)

 

4. Conversion of JSON and non-Dict types

The above is mainly the conversion between json format data and Python dictionary. The following explains how to convert other Python data types json.dumpsinto json data:

1. Tuple conversion

image

2. List conversion

image

3. Boolean conversion

image

4. Numerical data conversion

5. Use Demjson to parse

DemjsonYes Pythonthird-party library that can be used to encode and decode jsondata:

  • encode: encode a Python object into a JSON string

  • decode: decode the encoded JSON string into a Python object

Install demjson

Use the pip install demjsoninstallation directly , kan'dao sees the following interface to indicate that the installation is successful.

image

Use demjson

Import before use:

import demjson   # 导入包

1. Coding function

image

2. Decoding function

image

demjsonAn obvious disadvantage of the package is that it cannot directly parse Chinese data:

image

If we want to see Chinese data, we can use the eval function:

image

6. Pandas handles json

The following describes the processing of json data by the pandas library:

  • read_json: read data from json file

  • to_json: write the data in pandas to the json file

  • json_normalize: normalize json data

https://geek-docs.com/pandas/pandas-read-write/pandas-reading-and-writing-json.html

6.1 read_json

First look at read_jsonthe parameters in the official website :

pandas.read_json(
  path_or_buf=None,  # json文件路径
  orient=None,  # 重点参数,取值为:"split"、"records"、"index"、"columns"、"values"
  typ='frame',   # 要恢复的对象类型(系列或框架),默认’框架’.
  dtype=None, # boolean或dict,默认为True
  convert_axes=None, 
  convert_dates=True, 
  keep_default_dates=True, 
  numpy=False, 
  precise_float=False, 
  date_unit=None, 
  encoding=None, 
  lines=False,  # 布尔值,默认为False,每行读取该文件作为json对象
  chunksize=None,
  compression='infer', 
  nrows=None, 
  storage_options=None)

For detailed parameter analysis, please refer to the article: https://blog.csdn.net/qq_41562377/article/details/90203805

Suppose we now have a copy of json data, as shown in the following figure:

We read in the above data. Since the data is relatively standardized, it can be read directly by filling in the file path:

Focus on explaining the following parameters orient:

1、oriden='split'

split’ : dict like {index -> [index], columns -> [columns], data -> [values]}

The name of the key of the json file can only be index,cloumns,datathese three, and one more key will not work, and one less key will not work. for example:

2、orient='records'

‘records’ : list like [{column -> value}, … , {column -> value}]

 

3、orient='index'

dict like {index -> {column -> value}}

 

4、orient='columns'

dict like {column -> {index -> value}}

After transposing is orient='index'the result above

 

 

5、orient='values'

‘values’ : just the values array

 

6.2 to_json

to_jsonThe method is to save the DataFrame file as a json file:

df.to_json("个人信息.json")   # 直接保存成json文件

If you save according to the above code, Chinese is not displayed:

Of course, we can json.loadread the json file again to display Chinese, or we can directly display Chinese when saving:

df.to_json("个人信息1.json",force_ascii=False)   # 显示中文

6.3 json_normalize

https://www.jianshu.com/p/a84772b994a0

The json data in the saving and reading of the json data introduced above are all in the form of a list; but the data in the json file is usually not all in the form of a list, then we need to convert the file of the dictionary structure into a list form, this process is called Standardization.

json_normalize()Functions in pandas can convert dictionaries or lists into tables, and import them before use:

from pandas.io.json import json_normalize

To learn at the same time through the official website and a practical example, first look at the example of the official website:

1. The hierarchical dictionary displays data in the form of attributes:

2. If the max_level parameter is added, different effects will be displayed:

If max_level=0, the nested dictionary will be treated as a whole and displayed in the data frame

If max_level=1, the nested dictionary will be disassembled and the keys inside will be separated out:

3. Read part of the content in the nested level:

4. Read all content

7. Summary

jsonData is a data format often encountered in work, and it is also a very important data.

This article first jsondata format and a brief introduction, the idea of jsondata; Secondly, various practical cases, will jsonand Pythona variety of data types, especially type dictionary has been transformed; and finally, it is important to explain the jsondata is read, write And standardized operations.

I hope that the detailed explanation of this article can help you get the jsondata~


I still want to recommend the Python learning group I built by myself : 721195303. All students in the group are learning Python. If you want to learn or are learning Python, you are welcome to join. Everyone is a software development party and shares dry goods from time to time (only Python software development related), including a copy of the latest Python advanced materials and zero-based teaching compiled by myself in 2021. Welcome friends who are in advanced and interested in Python to join!

Guess you like

Origin blog.csdn.net/aaahtml/article/details/114382845