Author: Peter
Source: Python programming time
In actual work, especially the transmission of web data, we often encounter json data. It is not as friendly as common text data and numerical data, and it is very similar to dictionary data in Python, which has caused a lot of confusion to many people.
This article describes in detail how to use Python and pandas (Python's third-party library) to process json data based on specific cases. The main contents include:
-
Introduction to json data
-
Commonly used json data conversion website
-
Conversion of json data and python data
-
pandas handles json data
Many people learn python and don't know where to start.
Many people learn python and after mastering the basic grammar, they don't know where to find cases to get started.
Many people who have done case studies do not know how to learn more advanced knowledge.
So for these three types of people, I will provide you with a good learning platform, free to receive video tutorials, e-books, and the source code of the course!
QQ group: 721195303
1. A brief introduction to JSON
1.1 What is json data
First, let's look at an explanation of json from Wikipedia:
JSON ( J AVA S cript O bject N otation, JavaScript Object Notation) is an exchange language by the Douglas Crockford concept and design, lightweight materials, the language makes for easy reading of text-based, Used to transmit data objects composed of attribute values or sequential values.
The JSON data format has nothing to do with language. Even though it is derived from JavaScript, many programming languages currently support the generation and parsing of JSON format data. The file extension is
.json
.
Through the above official introduction, we summarize 3 points:
-
JSON is a text (data) language, an ultra-lightweight data exchange format
-
JSON data is easy to read and has strong legibility
-
Derived from JavaScript, other languages can parse JSON data
1.2 json data type
JSON is actually a subset of JavaScript, the only 6 data types in the JSON language or any combination between them:
-
number: consistent with the number in JavaScript
-
boolean: true or false in JavaScript
-
string: string in JavaScript
-
null: null in JavaScript
-
array: JavaScript representation: []
-
object: JavaScript
{…}
representation
1.3 Two rules
1. The JSON language stipulates that the character set must be UTF-8
2. In order to parse uniformly, JSON string requirements must be double quotes""
2. Commonly used json data conversion website
1、json.cn:https://www.json.cn/
2. json rookie tool: https://c.runoob.com/front-end/53
3. Sojson: https://www.sojson.com/, a very complete json processing website
4、kjson:https://www.kjson.com/
5. Programming Lion-json check tool: https://www.w3cschool.cn/tools/index?name=jsoncheck
6. JSONViewer: http://jsonviewer.stack.hu/, an online application tool used to check whether the Json format is correct
3. JSON and Dict type conversion
This section mainly explains the conversion of json type data and Python type.
json
Python
The conversion of objects and dictionaries mainly uses the built-in json
package. The use of this package is described in detail below. For detailed learning materials, please refer to the official website: https://docs.python.org/3/library/json.html
Import the package directly when you first use it:
import json
json
There are 4 methods in the package to convert with Python's built-in data types:
method | effect |
---|---|
json.dumps() | Encode a python object into a Json string: dictionary to json |
json.loads() | Decode Json string into python object: json to dictionary |
json.dump() | Convert objects in python into json and store them in a file |
json.load() | Convert the json format in the file into a python object and extract it |
Note: The two methods related to load are just one more step related to file operations.
json.dumps
The two functions related to dump are to convert Python data type to json type. The conversion comparison table is as follows:
Python | JSON |
---|---|
dict | object |
list, tuple | array |
str, unicode | string |
int, long, float | number |
True | true |
False | false |
None | null |
json.dumps
The function of the method is to convert Python dictionary type data into json format data. The specific parameters are as follows:
json.dumps(obj, # 待转化的对象
skipkeys=False, # 默认值是False,若dict的keys内的数据不是python的基本类型(str,unicode,int,long,float,bool,None),设置为False时,就会报TypeError的错误。此时设置成True,则会跳过这类key
ensure_ascii=True, # 默认是ASCII码,若设置成False,则可以输出中文
check_circular=True, # 若为False,跳过对容器类型的循环引用检查
allow_nan=True, # 若allow_nan为假,则ValueError将序列化超出范围的浮点值(nan、inf、-inf),严格遵守JSON规范,而不是使用JavaScript等价值(nan、Infinity、-Infinity)
cls=None,
indent=None, # 参数根据格式缩进显示,表示缩进几个空格
separators=None, # 指定分隔符;包含不同dict项之间的分隔符和key与value之间的分隔符;同时去掉`: `
encoding="utf-8", # 编码
default=None, # 默认是一个函数,应该返回可序列化的obj版本或者引发类型错误;默认值是只引发类型错误
sort_keys=False, # 若为False,则字典的键不排序;设置成True,按照字典排序(a到z)
**kw)
Explain the role of the above common parameters through examples
1. When there is Chinese in our Python type data
information1 = {
'name': '小明',
'age': 18,
'address': 'shenzhen'
}
# 字典转成json数据
information2 = json.dumps(information1)
print(type(information1))
print(type(information2))
print(information2)
Add ensure_ascii=False
parameters to display Chinese:
# 字典转成json数据
information3 = json.dumps(information1,ensure_ascii=False)
⚠️Through the results, we found that: json data has all become double quotes, the original dictionary type data used single quotes , let's look at an example of the change of quotes:
>>> import json
>>> print(json.dumps({'4': 5, '6': 7}, sort_keys=True, indent=4)) # python中的键是字符串,用单引号
# 结果显示
{
"4": 5, # 变成双引号
"6": 7
}
2. The json data is beautifully output through indentation, and the indent parameter is used
information4 = {
'name': '小明',
'age': 18,
'skills': 'python',
'english': 'CET6',
'major': '会计',
'address': '深圳'
}
information5 = json.dumps(information4, ensure_ascii=False) # 不缩进
information6 = json.dumps(information4, ensure_ascii=False, indent=2) # 缩进2个空格
information7 = json.dumps(information4, ensure_ascii=False, indent=5) # 缩进5个空格
print(information5)
print(information6)
print(information7)
3. Sort and output the keys in the Python data type
information4 = {
'name': '小明',
'age': 18,
'skills': 'python',
'english': 'CET6',
'major': '会计',
'address': '深圳'
}
information8 = json.dumps(information4, ensure_ascii=False, indent=2) #
information9 = json.dumps(information4, ensure_ascii=False, indent=2,sort_keys=True) # 键的排序设置成True
print(information8)
print(information9)
Through sort_keys=True
the settings, you can observe that the output results are sorted by the first letter; when the first letter is the same, the second letter will be sorted again.
4. Control of output separator
Use separators
parameters to set different output separators; the default is between different dic elements, and the default between ,
key-value pairs is:
information1 = {
'name': '小明',
'age': 18,
'address': 'shenzhen'
}
information2 = json.dumps(information1,ensure_ascii=False)
information10 = json.dumps(information1,ensure_ascii=False,separators=('+','@')) # 改变分隔符
print(information2) # 默认连接符
print(information10)
json.dump
json.dump
The function is json.dumps
similar, but the data needs to be stored in the file, the parameters of the two are the same
We try to write the following personal information into the file
information = {
'name': '小明',
'age': 18,
'skills': 'python',
'english': 'CET6',
'major': '会计',
'address': '深圳'
}
1. If no indent
parameters are used , all information is displayed as one line
# 使用json.dump;json数据一定是双引号
with open("information_1_to_json.json", "w", encoding='utf-8') as f:
# json.dump(dic_, f) # 全部写入一行数据,不换行
json.dump(information, # 待写入数据
f, # File对象
sort_keys=True, # 键的排序
ensure_ascii=False) # 显示中文
Take a look at the actual saving effect:
Adding indent
parameters will display multiple rows of data:
with open("information_2_to_json.json", "w", encoding='utf-8') as f:
json.dump(information,
f,
indent=2, # 空格缩进符,写入多行
sort_keys=True,
ensure_ascii=False)
json.loads
And load
two related functions are converted into the Python json data type conversion table as follows:
JSON | Python |
---|---|
object | dict |
array | list |
string | unicode |
number (int) | int, long |
number (real) | float |
true | True |
false | False |
null | None |
json.loads
The function is to convert json format data into Python dictionary type data.
information1 = {
'name': '小明',
'age': 18,
'address': 'shenzhen'
}
# 字典转成json数据
information3 = json.dumps(information1,ensure_ascii=False)
information11 = json.loads(information3) # json转成字典数据
print(information11)
json.load
Open the json file and convert it to dictionary data
# 使用json.load
with open("information_to_json.json",encoding="utf-8") as f:
json_to_dict = json.load(f) # json转成字典
print(json_to_dict)
4. Conversion of JSON and non-Dict types
The above is mainly the conversion between json format data and Python dictionary. The following explains how to convert other Python data types json.dumps
into json data:
1. Tuple conversion
2. List conversion
3. Boolean conversion
4. Numerical data conversion
5. Use Demjson to parse
Demjson
Yes Python
third-party library that can be used to encode and decode json
data:
-
encode: encode a Python object into a JSON string
-
decode: decode the encoded JSON string into a Python object
Install demjson
Use the pip install demjson
installation directly , kan'dao sees the following interface to indicate that the installation is successful.
Use demjson
Import before use:
import demjson # 导入包
1. Coding function
2. Decoding function
demjson
An obvious disadvantage of the package is that it cannot directly parse Chinese data:
If we want to see Chinese data, we can use the eval function:
6. Pandas handles json
The following describes the processing of json data by the pandas library:
-
read_json: read data from json file
-
to_json: write the data in pandas to the json file
-
json_normalize: normalize json data
https://geek-docs.com/pandas/pandas-read-write/pandas-reading-and-writing-json.html
6.1 read_json
First look at read_json
the parameters in the official website :
pandas.read_json(
path_or_buf=None, # json文件路径
orient=None, # 重点参数,取值为:"split"、"records"、"index"、"columns"、"values"
typ='frame', # 要恢复的对象类型(系列或框架),默认’框架’.
dtype=None, # boolean或dict,默认为True
convert_axes=None,
convert_dates=True,
keep_default_dates=True,
numpy=False,
precise_float=False,
date_unit=None,
encoding=None,
lines=False, # 布尔值,默认为False,每行读取该文件作为json对象
chunksize=None,
compression='infer',
nrows=None,
storage_options=None)
For detailed parameter analysis, please refer to the article: https://blog.csdn.net/qq_41562377/article/details/90203805
Suppose we now have a copy of json data, as shown in the following figure:
We read in the above data. Since the data is relatively standardized, it can be read directly by filling in the file path:
Focus on explaining the following parameters orient
:
1、oriden='split'
split’ : dict like {index -> [index], columns -> [columns], data -> [values]}
The name of the key of the json file can only be index,cloumns,data
these three, and one more key will not work, and one less key will not work. for example:
2、orient='records'
‘records’ : list like [{column -> value}, … , {column -> value}]
3、orient='index'
dict like {index -> {column -> value}}
4、orient='columns'
dict like {column -> {index -> value}}
After transposing is orient='index'
the result above
5、orient='values'
‘values’ : just the values array
6.2 to_json
to_json
The method is to save the DataFrame file as a json file:
df.to_json("个人信息.json") # 直接保存成json文件
If you save according to the above code, Chinese is not displayed:
Of course, we can json.load
read the json file again to display Chinese, or we can directly display Chinese when saving:
df.to_json("个人信息1.json",force_ascii=False) # 显示中文
6.3 json_normalize
https://www.jianshu.com/p/a84772b994a0
The json data in the saving and reading of the json data introduced above are all in the form of a list; but the data in the json file is usually not all in the form of a list, then we need to convert the file of the dictionary structure into a list form, this process is called Standardization.
json_normalize()
Functions in pandas can convert dictionaries or lists into tables, and import them before use:
from pandas.io.json import json_normalize
To learn at the same time through the official website and a practical example, first look at the example of the official website:
1. The hierarchical dictionary displays data in the form of attributes:
2. If the max_level parameter is added, different effects will be displayed:
If max_level=0, the nested dictionary will be treated as a whole and displayed in the data frame
If max_level=1, the nested dictionary will be disassembled and the keys inside will be separated out:
3. Read part of the content in the nested level:
4. Read all content
7. Summary
json
Data is a data format often encountered in work, and it is also a very important data.
This article first json
data format and a brief introduction, the idea of json
data; Secondly, various practical cases, will json
and Python
a variety of data types, especially type dictionary has been transformed; and finally, it is important to explain the json
data is read, write And standardized operations.
I hope that the detailed explanation of this article can help you get the json
data~