foreword
When training the model, the MNIST dataset is often used to train the model, so how to obtain the MNIST dataset? After practice, the blogger summed up the experience, hoping to help you use the MNIST dataset in front of the screen.
Table of contents
1 Download the MNIST dataset file
1 Download the MNIST dataset file
Since the MNIST data set is released on the external network, the download is relatively slow, so the blogger put MNIST in the Baidu network disk
Link: https://pan.baidu.com/s/1V-4FOePbTyBG7qZ7ge_TqQ?pwd=dw2i
Extraction code: dw2i
After downloading to the local, decompress the gz suffix compressed package
It contains 4 files, which are described in detail in the following table:
The source of the chart is transferred from: MNIST Dataset_Keep Sensible 802's Blog-CSDN Blog_mnist Dataset
2 Parse the idx3-ubyte file
Next we need to convert the idx3-ubyte file into an image form
Convert the training set and test set separately, the blogger uses pycharm
2.1 Parsing the training set
train-images.idx3-ubyte and train-labels.idx1-ubyte are the pictures and labels of the training set respectively, and the location of the data/label file needs to be modified to the location where your local training set is saved.
import numpy as np
import struct
from PIL import Image
import os
data_file = r'D:\postgraduate\DUT\tpds\malicious_node\MNIST_data\train-images.idx3-ubyte'
# It's 47040016B, but we should set to 47040000B
data_file_size = 47040016
data_file_size = str(data_file_size - 16) + 'B'
data_buf = open(data_file, 'rb').read()
magic, numImages, numRows, numColumns = struct.unpack_from(
'>IIII', data_buf, 0)
datas = struct.unpack_from(
'>' + data_file_size, data_buf, struct.calcsize('>IIII'))
datas = np.array(datas).astype(np.uint8).reshape(
numImages, 1, numRows, numColumns)
label_file = r'D:\postgraduate\DUT\tpds\malicious_node\MNIST_data\train-labels.idx1-ubyte'
# It's 60008B, but we should set to 60000B
label_file_size = 60008
label_file_size = str(label_file_size - 8) + 'B'
label_buf = open(label_file, 'rb').read()
magic, numLabels = struct.unpack_from('>II', label_buf, 0)
labels = struct.unpack_from(
'>' + label_file_size, label_buf, struct.calcsize('>II'))
labels = np.array(labels).astype(np.int64)
datas_root = 'mnist_train'
if not os.path.exists(datas_root):
os.mkdir(datas_root)
for i in range(10):
file_name = datas_root + os.sep + str(i)
if not os.path.exists(file_name):
os.mkdir(file_name)
for ii in range(numLabels):
img = Image.fromarray(datas[ii, 0, 0:28, 0:28])
label = labels[ii]
file_name = datas_root + os.sep + str(label) + os.sep + \
'mnist_train_' + str(ii) + '.png'
img.save(file_name)
2.2 Parsing the test set
t10k-labels.idx3-ubyte and t10k-labels.idx1-ubyte are the pictures and labels of the test set respectively, and the location of the data/label file needs to be modified to the location where your local test set is saved.
import numpy as np
import struct
from PIL import Image
import os
data_file = r'D:\postgraduate\DUT\tpds\malicious_node\MNIST_data\t10k-images.idx3-ubyte'
# It's 7840016B, but we should set to 7840000B
data_file_size = 7840016
data_file_size = str(data_file_size - 16) + 'B'
data_buf = open(data_file, 'rb').read()
magic, numImages, numRows, numColumns = struct.unpack_from(
'>IIII', data_buf, 0)
datas = struct.unpack_from(
'>' + data_file_size, data_buf, struct.calcsize('>IIII'))
datas = np.array(datas).astype(np.uint8).reshape(
numImages, 1, numRows, numColumns)
label_file = r'D:\postgraduate\DUT\tpds\malicious_node\MNIST_data\t10k-labels.idx1-ubyte'
# It's 10008B, but we should set to 10000B
label_file_size = 10008
label_file_size = str(label_file_size - 8) + 'B'
label_buf = open(label_file, 'rb').read()
magic, numLabels = struct.unpack_from('>II', label_buf, 0)
labels = struct.unpack_from(
'>' + label_file_size, label_buf, struct.calcsize('>II'))
labels = np.array(labels).astype(np.int64)
datas_root = 'mnist_test'
if not os.path.exists(datas_root):
os.mkdir(datas_root)
for i in range(10):
file_name = datas_root + os.sep + str(i)
if not os.path.exists(file_name):
os.mkdir(file_name)
for ii in range(numLabels):
img = Image.fromarray(datas[ii, 0, 0:28, 0:28])
label = labels[ii]
file_name = datas_root + os.sep + str(label) + os.sep + \
'mnist_test_' + str(ii) + '.png'
img.save(file_name)
3. Run the py file
After running the above two py files, two folders will be generated in the root directory of the project:
Among them, mnist_train has 6w pictures, and minst_test has 1w pictures.
You're done, and you can start training the model! ! !