Article Directory
Note: If the imported package is not installed, use pip to install
gzip package
If you just read the .gz file, use the gzip package.
Example: There is an input.gz file in the current directory. Use the following code to read it:
import gzip
with gzip.open('input.gz') as file:
all_content = file.read()
In this way, the input.gz file is read into all_content
Keras reads the mnist data set
from tensorflow import keras
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
Read mnist data set locally
Download the data set
Data set download addressData set interface:
Unzip and read
method one
from mnist import MNIST
mndata = MNIST('samples')
images, labels = mndata.load_training()
# or
images, labels = mndata.load_testing()
index = random.randrange(0, len(images)) # choose an index ;-)
print(mndata.display(images[index]))
Method Two
Unzip the .gz file and read it
from mlxtend.data import loadlocal_mnist
import platform
if not platform.system() == 'Windows':
X, y = loadlocal_mnist(
images_path='train-images.idx3-ubyte',
labels_path='train-labels.idx1-ubyte')
else:
X, y = loadlocal_mnist(
images_path='train-images.idx3-ubyte',
labels_path='train-labels.idx1-ubyte')
print('Dimensions: %s x %s' % (X.shape[0], X.shape[1]))
print('\n1st row', X[0])
gzip package read
import gzip
import numpy as np
import matplotlib.pyplot as plt
with gzip.open('train-images-idx3-ubyte.gz') as all_img:
all_img = all_img.read()
# print(all_img[:4])
# print((len(all_img)-16)/784)
img1 = all_img[16:16+784]
img = []
for i in range(28):
for j in range(28):
img.append(img1[28*i+j])
#print(img)
img = np.array(img).reshape(28, 28)
print(img.shape)
plt.imshow(img)
plt.show()
Read bytes data
参考stackoverflowConvert bytes to a string
>>> b"abcde"
b'abcde'
# utf-8 is used here because it is a very common encoding, but you
# need to use the encoding your data is actually in.
>>> b"abcde".decode("utf-8")
'abcde'