Python learning - serialization and deserialization - json&pickle

I. Overview

Friends who have played a slightly larger game know that the archive function of many games allows us to quickly enter the last exit state (all runtime data including equipment, level, experience value, etc.), then in the program There is also such a demand in development: for a relatively simple program, the processing of objects is directly implemented in memory, and the object disappears after the program exits; but for programs with slightly higher functional requirements, it is often necessary to persist the object. Save it, so that you can directly enter the last state when you start the program next time.

This process is serialization and deserialization in program development.

Second, the concept of serialization and deserialization

The overview introduces a game save scenario, which is essentially the process by which a game program converts runtime objects into objects that can be persisted and then saves (to a database). Or use this as an introduction to talk about the concepts of serialization and deserialization (the following concepts are integrated from network data, and I personally think that the explanation is in place).

  • Serialization
    The process of converting a data structure or object in the memory of the program into a binary sequence of bytes is called serialization, so that we can implement persistent storage or network transmission for the object.
    Please note the following key points:

    1. Serialized objects
      are data structures or objects in memory, that is, all objects that we manipulate during program execution (this is an object-oriented era, of course, including variables mentioned in many places~)
    2. The serialized object
      becomes a binary string byte sequence. This will not be explained too much. To persist it to hardware or perform network transmission, it must be a bytes object.
    3. The purpose of serialization
      Think about the convenience and necessity of archiving when you play the game. Some objects must be able to be persisted (general files, databases, balala...) or network transmission (distributed programs), in order to meet the Functional Requirements.
  • Deserialization
    Deserialization is the reverse process of serialization. The ultimate purpose of persistent storage or network transmission of data is also for subsequent use. It must be reverse loaded into memory for secondary use, which is deserialization.
    The serialization and deserialization modules in python include json and pickle. Let's see how to play with them.

3. json module

The json module provides four methods: dumps, loads, dump and load, which are explained below:

1. Dumps serialization and loads deserialization

Dumps and loads come in pairs:
dumps are used to encode simple data types in python (typically dictionaries and lists, as well as strings) in json format, and convert them into strings that conform to json format (return standard json format string);
loads is just the opposite, used to decode the json format string into a specific data type in python (note that it is a simple data type, which will be explained in detail below).

>>> import json
>>> list1=['a','b','c']
>>> print(type(list))
<class 'type'>

# dumps序列化,可以理解为encode json过程
>>> print(json.dumps(list1))
["a", "b", "c"]
>>> print(type(json.dumps(list1)))
<class 'str'>   # list dumps处理后变为str类型
>>> dict1={'id':'001','name':'Pumpkin'}
>>> print(type(dict1))
<class 'dict'>
>>> print(type(json.dumps(dict1)))
<class 'str'>   # dict经过dumps处理后也变成str类型

# loads反序列化
>>> print(json.loads(json.dumps(list1)))
['a', 'b', 'c']
>>> print(json.loads(json.dumps(dict1)))
{'id': '001', 'name': 'Pumpkin'}
>>>print(type(json.loads(json.dumps(list1))))
<class 'list'>  # 把经过json dumps处理过的字符串loads序列化,可还原为原来的数据类型
>>> print(type(json.loads(json.dumps(dict1))))
<class 'dict'>  # 把经过json dumps处理过的字符串loads序列化,可还原为原来的数据类型

The above code is only to show the effect of the loads method. In actual use, we can convert the custom string that conforms to the json format in python (such as the input in the program) into a specific data type, so that we can run it in the program:

>>> str1='["a", "b", "c"]'
>>> print(type(json.loads(str1)))
<class 'list'>  # 反序列化为list
>>> print(json.loads(str1))
['a', 'b', 'c']
>>> str2='{"id":"001","name":"Pumpkin"}'
>>> print(json.loads(str2))
{'id': '001', 'name': 'Pumpkin'}
>>> print(json.loads(str2))
{'id': '001', 'name': 'Pumpkin'}
>>> print(type(json.loads(str2)))
<class 'dict'>  # 反序列化为dict

Note: When deserializing a custom string through the loads method, the outer quotation marks must be single quotation marks, and the inner quotation marks must be double quotation marks (you can see that the quotation marks output after dumps are double quotation marks), this is python's Norms, don't ask why.

Well, the above shows the processing of dumps and loads. Isn't our purpose to persist or network transmission of runtime objects in python? The following shows the effect of saving data to a text file after processing by dumps and loading historically saved data from the text file:

(1) Dumps are serialized and saved

import json
dict1 = {'id':'001','name':'Pumpkin'}
with open('dumps.txt', 'w', encoding = 'utf-8') as f:
    f.write(json.dumps(dict1))

Take a look at the contents of the saved text file dumps.txt:
write picture description here

(2) loads deserialization loads data from a file

>>> import json
>>> with open('dumps.txt','r',encoding='utf-8') as f:
...     content = f.read()
>>> print(json.loads(content))
{'id': '001', 'name': 'Pumpkin'}
>>> print(type(json.loads(content)))
<class 'dict'>  #loads反序列化后成功还原为原来的数据类型dict
>>> print(json.loads(content).get('name'))
Pumpkin  #此时可以应用dict的各种大法了
>>> print(json.loads(content)['name'])
Pumpkin

2. Dump serialization and load deserialization

Dump and load also appear in pairs. Dump can convert specific data types in python (dict and list are more commonly used) into json format, and directly write to a file-like Object (the object of operation includes unconverted objects) Python object and file-like Object), load is the opposite, it can deserialize the Python native object from the file object.

>>> import json
>>> dict1={'id': '001', 'name': 'Pumpkin'}
>>> with open('dump.txt','w',encoding='utf-8') as f:
...     json.dump(dict1,f)   # dump序列化,直接操作原生数据类型对象和文件句柄
...
#load反序列化
>>> with open('dump.txt','r',encoding='utf-8') as f:
...     content = json.load(f)  # 注意这里先用一个对象把load的内容保存起来,否则关闭文件后就不能再访问了
...
>>> print(content)
{'id': '001', 'name': 'Pumpkin'}
>>> print(type(content))
<class 'dict'>   # 成功反序列化成dict
>>> print(content['name'])
Pumpkin          # 试试dict大法

3、(dumps & loads) VS (dump & load)

Compare dumps & loads and dump & load:

(1)dumps & loads

It can only solve the problem of mutual conversion between objects that can be processed by the json module and json format strings in python. The objects they operate are objects that can be processed by the json module and json format strings.

(2)dump和load

It can be understood as a json file processing function. The objects of operation are objects and file-like objects that can be processed by the json module in python, and the details of the json format string are shielded or omitted.

In contrast, if you want to save or load runtime objects from files, it is more convenient to use dump and load, and the code is less; on the contrary, if you only need to process json format, it is recommended to use dumps and loads.

Fourth, pickle

The pickle module implements a binary protocol for serializing and deserializing Python object structures. Unlike the json module, the pickle module serialization and deserialization processes are called pickling and unpickling respectively, and the conversion is binary bytes before and after code, no longer a simple readable string:

  • pickling: is the process of converting Python objects to byte streams;
  • unpickling: is the process of converting a byte stream binary file or byte object back to a Python object;

1. Dumps serialization and loads deserialization

Very similar to json dumps and loads, the difference is that the converted format is binary bytecode

>>> import pickle
>>> dict1={'id':'001','name':'Pumpkin'}
>>> pickle.dumps(dict1)
b'\x80\x03}q\x00(X\x02\x00\x00\x00idq\x01X\x03\x00\x00\x00001q\x02X\x04\x00\x00\
x00nameq\x03X\x07\x00\x00\x00Pumpkinq\x04u.' # 序列化成二进制字节码
>>> print(type(pickle.dumps(dict1)))
<class 'bytes'>
>>> pickle.loads(pickle.dumps(dict1))        # 成功反序列化
{'id': '001', 'name': 'Pumpkin'}
>>> print(type(pickle.loads(pickle.dumps(dict1))))
<class 'dict'>
>>> pickle.loads(pickle.dumps(dict1))['name']
'Pumpkin'

Since the data type becomes binary bytecode after pickle serialization, it needs to be opened in wb and rb modes respectively when saving and reading files:

>>> import pickle
>>> dict1={'id':'001','name':'Pumpkin'}
>>> with open('picklt.txt','wb') as f:   # 以wb模式打开文件后写入dumps内容
...     f.write(pickle.dumps(dict1))
...

>>> with open('picklt.txt','rb') as f:   # 以rb模式打开后读取内容
...     data = pickle.loads(f.read())
...
>>> print(data)
{'id': '001', 'name': 'Pumpkin'}

After writing through dumps, because it is binary bytecode, there will be garbled characters when opening picklt.txt.

2. Dump serialization and load deserialization

Similarly, pickle's dump serialization and load deserialization are very similar to json's dump and load. Let's look at the same example:

>>> import pickle
>>> dict1={'id': '001', 'name': 'Pumpkin'}
>>> with open('pickle.txt','wb') as f:
...     pickle.dump(dict1,f)
...
>>> with open('pickle.txt','rb') as f:
...     content = pickle.load(f)
...
>>> print(content)
{'id': '001', 'name': 'Pumpkin'}
>>> print(content['name'])
Pumpkin

3. Serialization function

(1) Serialization

# !/usr/bin/env python
# -*- coding: utf-8 -*-
import pickle

def sayhi(name):
    print('Hello:', name)

info = {'name':'Pumpkin', 'func':sayhi}  #func对应的值是一个函数

with open('test.txt', 'wb') as f:
    data = pickle.dumps(info)
    f.write(data)

(2) Deserialization

# -*- coding: utf-8 -*-

import pickle

def sayhi(name):   #此处需要定义出函数,因为它不能被直接加载到内存中
    print('Hello:',name)

with open('test.txt','rb') as f:
    data = pickle.loads(f.read())

print(data.get('name'))
data.get('func')('Tom')

结果输出:
Pumpkin
Hello: Tom

Fourth, json and pickle comparison

  • JSON is a text serialization format (it outputs unicode files, most of the time will be encoded as utf-8), while pickle is a binary serialization format;

  • JOSN deals with the conversion of python objects and strings, which is a data format that we can understand, while pickle is a binary format, which we cannot understand;

  • JSON is agnostic to a specific programming language or system, and it is widely used outside the Python ecosystem, whereas the data format used by pickle is Python-specific;

  • By default, JSON can only represent Python's built-in data types, and it is limited to relatively simple data types, such as dict, list and str. For custom data types, some additional work is required to complete; pickle can directly represent a large number of Python data types, including custom data types (many of which are automatically implemented through clever use of Python introspection; complex cases can be resolved by implementing a specific object API).

Reference: http://www.cnblogs.com/linupython/p/8256428.html

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324756195&siteId=291194637