python - Detailed pickle module

Detailed pickle module

This picklemodule implements the binary protocol for serialization and deserialization Python object structures. "Pickling" is to convert the Python object hierarchy as a process stream of bytes, "unpickling" is the reverse operation, so that a stream of bytes (from a binary file or an object of similar bytes ) back to the object hierarchy. pickleModule for error or malicious data structure is unsafe.

pickle protocol and JSON (JavaScript Object Notation) difference:

  1. JSON format is a text sequence (which unicode text output, although most of the time it is encoded utf-8), and the pickle is a binary serialization format;

  2. JSON is human readable, but is instead pickle;

  3. JSON are interoperable and are widely used outside the ecosystem Python, and Python, specific pickle;

By default, JSON can represent only a subset of the Python built-in types, but can not represent a custom class; pickle can represent extremely large Python types (many of which are automatic, by skillfully using Python's introspection tool; complex cases can be solved by implementing a particular object API).

pickle data format is specific to Python. The advantage is that no external standards imposed restrictions, such as JSON or XDR (can not represent pointer sharing); but this means that non-Python programs may not reconstruct pickled Python objects.

By default, picklethe data format to use a relatively compact binary representation. If you need the best characteristics of size, it is possible to effectively compress data.

Interface Module

To serialize an object hierarchy, only the call dumps()function can be. Similarly, to deserialize the data stream, call the loads()function. However, if you want more control over the serialization and de-serialization, you can create a separately Pickleror Unpicklerobject.

pickleModule provides the following constants:

pickle.HIGHEST_PROTOCOL

Integer, the highest protocol version available. This value can be transmitted as a function of the value of the protocol  dump()and dumps()as well as the Pickler constructor.

pickle.DEFAULT_PROTOCOL

Integer that encodes the default protocol version. Probably not HIGHEST_PROTOCOL. Currently, the default protocol is three, this is a new protocol designed to Python 3.

pickleModule provides the following features that make the pickling process more convenient:

pickle.dump(obj,file,protocol = None,*,fix_imports = True 

Encoding the pickle coded representation of the object obj written to the file object, corresponding toPickler(file,protocol).dump(obj)

Alternative protocol parameter is an integer that specifies the protocol version pickler used to support the protocol is 0 HIGHEST_PROTOCOL. If not specified, the default is DEFAULT_PROTOCOL. If you specify a negative number, then select HIGHEST_PROTOCOL.

File parameters must have parameter accepts a single byte write method. Therefore, it can be written to open a disk file to binary  io.BytesIOinstance or to satisfy any other custom object for this interface.

It is true and if the protocol fix_imports less than 3, the pickle will try to map the new to the old name Python 3 Python 2 module name used to pickle using Python 2 can read the data stream.

pickle.dumps(obj,protocol = None,*,fix_imports = True 

The pickled representation of the object as the bytesobject is returned, instead of being written to the file.

Fix_imports protocol parameters and have the same meaning as in the  dump().

pickle.load(file,*,fix_imports = True,encoding =“ASCII”,errors =“strict” 

Reading pickle object representation, and wherein the return reconstructed object hierarchy from the specified open file object file. This is equivalent Unpickler(file).load().

Pickle protocol version is automatically detected, and therefore does not require protocol parameters. More than pickle objects in bytes will be ignored.

Parameter file must have two methods, one using the integer parameter read () method and the readline () method does not require a parameter. Both methods should return bytes. Therefore, a binary file can be opened for reading a disk file, io.BytesIOobject, or to satisfy any other custom object for this interface.

Optional keyword parameters fix_imports, encoding, and errors, for controlling compatibility support Python 2 stream generated pickle. If fix_imports true, the pickle will attempt to map the old name of the Python 2 to Python 3 new name used. Error coding and decoding tell how pickle Python 2 Example 8 encoded string; these default were 'ASCII' and 'strict'. The encoder may be a "byte" object instances such as 8 byte string read. Use encoding='latin1'required to take storage arrays and examples of NumPy datetime, dateand timeis decoded Python 2.

pickle.loads(bytes_object,*,fix_imports = True,encoding =“ASCII”,errors =“strict” 

From bytesreading the object hierarchy pickle wherein the specified object and returns a reconstructed object hierarchy.

Pickle protocol version is automatically detected, and therefore does not require protocol parameters. More than pickle objects in bytes will be ignored.

import numpy as np
import pickle
import io

if __name__ == '__main__':
    path = 'test'
    f = open(path, 'wb')
    data = {'a':123, 'b':'ads', 'c':[[1,2],[3,4]]}
    pickle.dump(data, f)
    f.close()

    f1 = open(path, 'rb')
    data1 = pickle.load(f1)
    print(data1)

对于python格式的数据集,我们就可以使用pickle进行加载了,下面与cifar10数据集为例,进行读取和加载:

import numpy as np
import pickle
import random
import matplotlib.pyplot as plt
from PIL import Image

path1 = 'D:\\tmp\cifar10_data\cifar-10-batches-py\data_batch_1'
path2 = 'D:\\tmp\cifar10_data\cifar-10-batches-py\data_batch_2'
path3 = 'D:\\tmp\cifar10_data\cifar-10-batches-py\data_batch_3'
path4 = 'D:\\tmp\cifar10_data\cifar-10-batches-py\data_batch_4'
path5 = 'D:\\tmp\cifar10_data\cifar-10-batches-py\data_batch_5'

path6 = 'D:\\tmp\cifar10_data\cifar-10-batches-py\\test_batch'

if __name__ == '__main__':
    with open(path1, 'rb') as fo:
        data = pickle.load(fo, encoding='bytes')

        # print(data[b'batch_label'])
        # print(data[b'labels'])
        # print(data[b'data'])
        # print(data[b'filenames'])

        print(data[b'data'].shape)

        images_batch = np.array(data[b'data'])
        images = images_batch.reshape([-1, 3, 32, 32])
        print(images.shape)
        imgs = images[5, :, :, :].reshape([3, 32, 32])
        img = np.stack((imgs[0, :, :], imgs[1, :, :], imgs[2, :, :]), 2)

        print(img.shape)

        plt.imshow(img)
        plt.axis('off')
        plt.show()

运行结果:

接下来就可以读取数据进行训练了。

 

Guess you like

Origin www.cnblogs.com/baby-lily/p/10990026.html