python对象序列化之pickle

本片文章主要是对pickle官网的阅读记录。

The pickle module implements binary protocols for serializing and de-serializing a Python object structure. “Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy. Pickling (and unpickling) is alternatively known as “serialization”, “marshalling,” [1] or “flattening”; however, to avoid confusion, the terms used here are “pickling” and “unpickling”.

pickle是python标准模块之一，不需要再额外安装。

pickle用来序列化和反序列化 Python object structure。其实就是一种数据存储方式，将python的数据结构以特定的形式保存下来。另外，经过pickle序列化后的数据不是human-readable的。

这里提一下老外对事物的命名习惯，pickle是腌制的意思，那么对python object的"腌制"，其实就是一种数据处理，至于数据处理的规则是什么，这里暂时不做进一步介绍。

“Pickling” 就是将有层次结构的python object转换成字节流；“unpickling” 就是相反的过程。

说明: 如果碰到“Pickling” “serialization”, “marshalling,” or “flattening”，都是表达相同的意思，翻译成"序列化"就好了；如果单词前加了un，就翻成“反序列化”。

Warning：The pickle module is not secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.

不要去序列化错误的或者恶意的结构化数据，也不要去反序列化不受信任或未授权的数据源。意思就是“序列化”和“反序列化”要按照pickle模块的规则来进行。

Data stream format

The data format used by pickle is Python-specific. This has the advantage that there are no restrictions imposed by external standards such as JSON or XDR (which can’t represent pointer sharing); however it means that non-Python programs may not be able to reconstruct pickled Python objects.

pickle使用的数据格式是Python语言特有的。非Python程序可能不能重构被序列化的数据。

By default, the pickle data format uses a relatively compact binary representation. If you need optimal size characteristics, you can efficiently compress pickled data.

默认，pickle的序列化数据格式是一种相对紧凑的二进制表示。如果对数据大小有更高要求，可以压缩已序列化的数据。

The module pickletools contains tools for analyzing data streams generated by pickle. pickletools source code has extensive comments about opcodes used by pickle protocols.

pickletools包含很多用来解析已序列化数据的工具。

There are currently 5 different protocols which can be used for pickling. The higher the protocol used, the more recent the version of Python needed to read the pickle produced.

Protocol version 0 is the original “human-readable” protocol and is backwards compatible with earlier versions of Python.
Protocol version 1 is an old binary format which is also compatible with earlier versions of Python.
Protocol version 2 was introduced in Python 2.3. It provides much more efficient pickling of new-style classes. Refer to PEP 307 for information about improvements brought by protocol 2.
Protocol version 3 was added in Python 3.0. It has explicit support for bytes objects and cannot be unpickled by Python 2.x. This is the default protocol, and the recommended protocol when compatibility with other Python 3 versions is required.
Protocol version 4 was added in Python 3.4. It adds support for very large objects, pickling more kinds of objects, and some data format optimizations. Refer to PEP 3154 for information about improvements brought by protocol 4.

Note：

Serialization is a more primitive notion than persistence; although pickle reads and writes file objects, it does not handle the issue of naming persistent objects, nor the (even more complicated) issue of concurrent access to persistent objects. The pickle module can transform a complex object into a byte stream and it can transform the byte stream into an object with the same internal structure. Perhaps the most obvious thing to do with these byte streams is to write them onto a file, but it is also conceivable to send them across a network or store them in a database. The shelve module provides a simple interface to pickle and unpickle objects on DBM-style database files.

python对象序列化之pickle

Data stream format

Module Interface

猜你喜欢