Detailed pickle module
This pickle
module implements the binary protocol for serialization and deserialization Python object structures. "Pickling" is to convert the Python object hierarchy as a process stream of bytes, "unpickling" is the reverse operation, so that a stream of bytes (from a binary file or an object of similar bytes ) back to the object hierarchy. pickle
Module for error or malicious data structure is unsafe.
pickle protocol and JSON (JavaScript Object Notation) difference:
1. JSON format is a text sequence (which unicode text output, although most of the time it is encoded utf-8
), and the pickle is a binary serialization format;
2. JSON is human readable, but is instead pickle;
3. JSON are interoperable and are widely used outside the ecosystem Python, and Python, specific pickle;
By default, JSON can represent only a subset of the Python built-in types, but can not represent a custom class; pickle can represent extremely large Python types (many of which are automatic, by skillfully using Python's introspection tool; complex cases can be solved by implementing a particular object API).
pickle data format is specific to Python. The advantage is that no external standards imposed restrictions, such as JSON or XDR (can not represent pointer sharing); but this means that non-Python programs may not reconstruct pickled Python objects.
By default, pickle
the data format to use a relatively compact binary representation. If you need the best characteristics of size, it is possible to effectively compress data.
Interface Module
To serialize an object hierarchy, only the call dumps()
function can be. Similarly, to deserialize the data stream, call the loads()
function. However, if you want more control over the serialization and de-serialization, you can create a separately Pickler
or Unpickler
object.
pickle
Module provides the following constants:
-
pickle.
HIGHEST_PROTOCOL
-
Integer, the highest protocol version available. This value can be transmitted as a function of the value of the protocol
dump()
anddumps()
as well as thePickler
constructor.
-
pickle.
DEFAULT_PROTOCOL
-
Integer that encodes the default protocol version. Probably not
HIGHEST_PROTOCOL
. Currently, the default protocol is three, this is a new protocol designed to Python 3.
pickle
Module provides the following features that make the pickling process more convenient:
-
pickle.
dump
(obj,file,protocol = None,*,fix_imports = True ) -
Encoding the pickle coded representation of the object obj written to the file object, corresponding to
Pickler(file,protocol).dump(obj)
Alternative protocol parameter is an integer that specifies the protocol version pickler used to support the protocol is 0
HIGHEST_PROTOCOL
. If not specified, the default isDEFAULT_PROTOCOL
. If you specify a negative number, then selectHIGHEST_PROTOCOL
.File parameters must have parameter accepts a single byte write method. Therefore, it can be written to open a disk file to binary
io.BytesIO
instance or to satisfy any other custom object for this interface.It is true and if the protocol fix_imports less than 3, the pickle will try to map the new to the old name Python 3 Python 2 module name used to pickle using Python 2 can read the data stream.
-
pickle.
dumps
(obj,protocol = None,*,fix_imports = True ) -
The pickled representation of the object as the
bytes
object is returned, instead of being written to the file.Fix_imports protocol parameters and have the same meaning as in the
dump()
.
-
pickle.
load
(file,*,fix_imports = True,encoding =“ASCII”,errors =“strict” ) -
Reading pickle object representation, and wherein the return reconstructed object hierarchy from the specified open file object file. This is equivalent
Unpickler(file).load()
.Pickle protocol version is automatically detected, and therefore does not require protocol parameters. More than pickle objects in bytes will be ignored.
Parameter file must have two methods, one using the integer parameter read () method and the readline () method does not require a parameter. Both methods should return bytes. Therefore, a binary file can be opened for reading a disk file,
io.BytesIO
object, or to satisfy any other custom object for this interface.Optional keyword parameters fix_imports, encoding, and errors, for controlling compatibility support Python 2 stream generated pickle. If fix_imports true, the pickle will attempt to map the old name of the Python 2 to Python 3 new name used. Error coding and decoding tell how pickle Python 2 Example 8 encoded string; these default were 'ASCII' and 'strict'. The encoder may be a "byte" object instances such as 8 byte string read. Use
encoding='latin1'
required to take storage arrays and examples of NumPydatetime
,date
andtime
is decoded Python 2.
-
pickle.
loads
(bytes_object,*,fix_imports = True,encoding =“ASCII”,errors =“strict” ) -
From
bytes
reading the object hierarchy pickle wherein the specified object and returns a reconstructed object hierarchy.Pickle protocol version is automatically detected, and therefore does not require protocol parameters. More than pickle objects in bytes will be ignored.
import numpy as np import pickle import io if __name__ == '__main__': path = 'test' f = open(path, 'wb') data = {'a':123, 'b':'ads', 'c':[[1,2],[3,4]]} pickle.dump(data, f) f.close() f1 = open(path, 'rb') data1 = pickle.load(f1) print(data1)
对于python格式的数据集,我们就可以使用pickle进行加载了,下面与cifar10数据集为例,进行读取和加载:
import numpy as np import pickle import random import matplotlib.pyplot as plt from PIL import Image path1 = 'D:\\tmp\cifar10_data\cifar-10-batches-py\data_batch_1' path2 = 'D:\\tmp\cifar10_data\cifar-10-batches-py\data_batch_2' path3 = 'D:\\tmp\cifar10_data\cifar-10-batches-py\data_batch_3' path4 = 'D:\\tmp\cifar10_data\cifar-10-batches-py\data_batch_4' path5 = 'D:\\tmp\cifar10_data\cifar-10-batches-py\data_batch_5' path6 = 'D:\\tmp\cifar10_data\cifar-10-batches-py\\test_batch' if __name__ == '__main__': with open(path1, 'rb') as fo: data = pickle.load(fo, encoding='bytes') # print(data[b'batch_label']) # print(data[b'labels']) # print(data[b'data']) # print(data[b'filenames']) print(data[b'data'].shape) images_batch = np.array(data[b'data']) images = images_batch.reshape([-1, 3, 32, 32]) print(images.shape) imgs = images[5, :, :, :].reshape([3, 32, 32]) img = np.stack((imgs[0, :, :], imgs[1, :, :], imgs[2, :, :]), 2) print(img.shape) plt.imshow(img) plt.axis('off') plt.show()
运行结果:
接下来就可以读取数据进行训练了。