Article for Python common sequence of operations

About
program ape a thoughtful, lifelong learning practitioners, is currently in a start-up team any team lead, technology stack involves Android, Python, Java, and Go, this is the main technology stack our team.
GitHub: https://github.com/hylinux1024
micro-channel public number: Lifetime developer (angrycode)

0x00 marshal

marshalUse is Pythonassociated with a machine-independent language but to read and write binary Pythonobjects. This binary format also with Pythonthe relevant language version, marshalserialized format different versions Pythonare not compatible.

marshalUsually a Pythonsequence of internal objects.

Generally they include:

  • basic type booleans, integers,floating point numbers,complex numbers
  • Sequence set type strings, bytes, bytearray, tuple, list, set, frozenset, dictionary
  • Object code code object
  • Other types None, Ellipsis, StopIteration

marshalThe main role is to Python"compile" the .pycfile read and write support. This is also marshalfor Pythonthe version is not compatible with reason. Developers If you are using serialization / de-serialization , you should use the picklemodule.

Common method

marshal.dump(value, file[, version])

Serialize an object to a file

marshal.dumps(value[, version])

Serialize an object and returns a bytessubject

marshal.load(file)

Deserialized from the file object

marshal.loads(bytes)

From the bytesbinary data of an object to deserialize

0x01 pickle

pickleModule can also be a binary manner Pythonobject read. Compared marshalprovide basic serialization capabilities, pickleserialization more widely.

pickleData is serialized and Pythonspecific language, i.e., for example, other languages Javacan not be read by a Pythonby picklea sequence of binary data. If you are unable to use the language serialization then we should use json. The information below describes.

Be pickleserialized data types:

  • None, True, and False
  • integers, floating point numbers, complex numbers
  • strings, bytes, bytearrays
  • tuples, lists, sets, and dictionaries and pickle containing a serialized object may be
  • Function objects defined in the module top layer (the def defined, instead of lambdathe expression)
  • Built-in functions defined in the module top
  • Top level class in the schema definition
  • A class __dict__contains a serializable object or __getstate__()method to return the object to be serialized

If it picklewill be thrown when an unsupported serialized object PicklingError.

Common method

pickle.dump(obj, file, protocol=None, *, fix_imports=True)

The objobject is serialized to a filefile, and the process Pickler(file, protocol).dump(obj)is equivalent.

pickle.dumps(obj, protocol=None, *, fix_imports=True)

The objtarget sequence into bytesbinary data.

pickle.load(file, *, fix_imports=True, encoding="ASCII", errors="strict")

From filedocument deserialize an object, the method Unpickler(file).load()is equivalent.

pickle.loads(bytes_object, *, fix_imports=True, encoding="ASCII", errors="strict")

From the binary data bytes_objectis deserialized.

Examples of sequence

import pickle

# 定义了一个包含了可以被序列化对象的字典
data = {
    'a': [1, 2.0, 3, 4 + 6j],
    'b': ("character string", b"byte string"),
    'c': {None, True, False}
}

with open('data.pickle', 'wb') as f:
    # 序列化对象到一个data.pickle文件中
    # 指定了序列化格式的版本pickle.HIGHEST_PROTOCOL
    pickle.dump(data, f, pickle.HIGHEST_PROTOCOL)

After performing in a multi-folder data.picklefile

serialization
├── data.pickle
├── pickles.py
└── unpickles.py

Examples deserialization

import pickle

with open('data.pickle', 'rb') as f:
    # 从data.pickle文件中反序列化对象
    # pickle能够自动检测序列化文件的版本
    # 所以这里可以不用版本号
    data = pickle.load(f)

    print(data)

# 执行后结果
# {'a': [1, 2.0, 3, (4+6j)], 'b': ('character string', b'byte string'), 'c': {False, True, None}}

0x02 json

jsonIs language-independent, very common data exchange format. In Pythonit marshaland picklethe same it has similar API.

Common method

json.dump(obj, fp, *, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw)

Serialized object to fpa file

json.dumps(obj, *, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw)

The objserialized jsonobjects

json.load(fp, *, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw)

Anti sequence from a file into an object

json.loads(s, *, encoding=None, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw)

From the jsonformat of the document deserialized into an object

jsonAnd Pythontransforming the object table

JSON Python
object dict
list,tuple array
str string
int, float, int- & float-derived Enums number
True true
False false
None null

For primitive types, sequences, and a set of the type comprising a basic types jsonare well completion sequence of work.

Examples of sequence

>>> import json
>>> json.dumps(['foo', {'bar': ('baz', None, 1.0, 2)}])
'["foo", {"bar": ["baz", null, 1.0, 2]}]'
>>> print(json.dumps("\"foo\bar"))
"\"foo\bar"
>>> print(json.dumps('\u1234'))
"\u1234"
>>> print(json.dumps('\\'))
"\\"
>>> print(json.dumps({"c": 0, "b": 0, "a": 0}, sort_keys=True))
{"a": 0, "b": 0, "c": 0}
>>> from io import StringIO
>>> io = StringIO()
>>> json.dump(['streaming API'], io)
>>> io.getvalue()
'["streaming API"]'

Examples deserialization

>>> import json
>>> json.loads('["foo", {"bar":["baz", null, 1.0, 2]}]')
['foo', {'bar': ['baz', None, 1.0, 2]}]
>>> json.loads('"\\"foo\\bar"')
'"foo\x08ar'
>>> from io import StringIO
>>> io = StringIO('["streaming API"]')
>>> json.load(io)
['streaming API']

For objectthe more complicated the situation

For example, it defines a complex complexobject jsondocument

complex_data.json

{
  "__complex__": true,
  "real": 42,
  "imaginary": 36
}

Take this jsondocument into anti-sequence Pythonobjects, we need to define the method of conversion

# coding=utf-8
import json

# 定义转化函数,将json中的内容转化成complex对象
def decode_complex(dct):
    if "__complex__" in dct:
        return complex(dct["real"], dct["imaginary"])
    else:
        return dct

if __name__ == '__main__':
    with open("complex_data.json") as complex_data:
        # object_hook指定转化的函数
        z = json.load(complex_data, object_hook=decode_complex)
        print(type(z))
        print(z)

# 执行结果
# <class 'complex'>
# (42+36j)

If not specified object_hook, the default will jsondocument objectturn intodict

# coding=utf-8
import json

if __name__ == '__main__':

    with open("complex_data.json") as complex_data:
        # 这里不指定object_hook
        z2 = json.loads(complex_data.read())
        print(type(z2))
        print(z2)
# 执行结果
# <class 'dict'>
# {'__complex__': True, 'real': 42, 'imaginary': 36}

We can see jsonthe document objectturned into dictobjects.
Under normal circumstances this use seems to have no problem, but if the high type requires a well-defined scene you need a process for converting.

In addition to object_hookthe parameters can also be usedjson.JSONEncoder

import json

class ComplexEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, complex):
            # 如果complex对象这里转成数组的形式
            return [obj.real, obj.imag]
            # 默认处理
        return json.JSONEncoder.default(self, obj)

if __name__ == '__main__':
    c = json.dumps(2 + 1j, cls=ComplexEncoder)
    print(type(c))
    print(c)

# 执行结果
# <class 'str'>
# [2.0, 1.0]

Because the jsonmodules are not able to automatically complete sequence of all types, not for the type of support will directly thrown TypeError.

>>> import datetime
>>> d = datetime.datetime.now()
>>> dct = {'birthday':d,'uid':124,'name':'jack'}
>>> dct
{'birthday': datetime.datetime(2019, 6, 14, 11, 16, 17, 434361), 'uid': 124, 'name': 'jack'}
>>> json.dumps(dct)
Traceback (most recent call last):
  File "<pyshell#19>", line 1, in <module>
    json.dumps(dct)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type datetime is not JSON serializable

Do not support the type of sequence, for example, datetimeand a self-defined types, you need to use JSONEncoderto define the logical transformation.

import json
import datetime

# 定义日期类型的JSONEncoder
class DatetimeEncoder(json.JSONEncoder):

    def default(self, obj):
        if isinstance(obj, datetime.datetime):
            return obj.strftime('%Y-%m-%d %H:%M:%S')
        elif isinstance(obj, datetime.date):
            return obj.strftime('%Y-%m-%d')
        else:
            return json.JSONEncoder.default(self, obj)

if __name__ == '__main__':
    d = datetime.date.today()
    dct = {"birthday": d, "name": "jack"}
    data = json.dumps(dct, cls=DatetimeEncoder)
    print(data)

# 执行结果
# {"birthday": "2019-06-14", "name": "jack"}

Now when we want to send serialization, can be jsonconverted into a date format document datetime.dateobjects, then you need to use json.JSONDecoderup.

# coding=utf-8
import json
import datetime

# 定义Decoder解析json
class DatetimeDecoder(json.JSONDecoder):

    # 构造方法
    def __init__(self):
        super().__init__(object_hook=self.dict2obj)

    def dict2obj(self, d):
        if isinstance(d, dict):
            for k in d:
                if isinstance(d[k], str):
                    # 对日期格式进行解析,生成一个date对象
                    dat = d[k].split("-")
                    if len(dat) == 3:
                        date = datetime.date(int(dat[0]), int(dat[1]), int(dat[2]))
                        d[k] = date
        return d

if __name__ == '__main__':
    d = datetime.date.today()
    dct = {"birthday": d, "name": "jack"}
    data = json.dumps(dct, cls=DatetimeEncoder)
    # print(data)

    obj = json.loads(data, cls=DatetimeDecoder)
    print(type(obj))
    print(obj)

# 执行结果
# {"birthday": "2019-06-14", "name": "jack"}
# <class 'dict'>
# {'birthday': datetime.date(2019, 6, 14), 'name': 'jack'}

0x03 summarize

PythonThe chemical industry has a common sequence marshal, pickleand json. marshalMainly used for Pythonthe .pycfile, and with the Pythonrelevant version. It can not be serialized user-defined classes.
pickleIs Pythonserialized object tool than the marshalmore common of these, it is compatible with Pythondifferent versions. jsonIt is a language-independent data structure, widely used in a variety of network applications, especially in REST APIinteractive data service.

0x04 learning materials

Guess you like

Origin www.cnblogs.com/angrycode/p/11416092.html