Table of contents
The MINST data set is a classic data set in the field of machine learning, which includes 70,000 samples, including 60,000 training samples and 10,000 test samples.
Dataset import
Using tensorflow
the framework, keras
get the MNIST dataset by:
mnist = tf.keras.datasets.mnist
Through load_data()
the method to load the data in the dataset
The acquired data tuple
is stored in the format of:
(训练样本数据集,训练标签数据集),(测试样本数据集,测试标签数据集)
So use the corresponding tuple to receive the data:
(x_train, y_train), (x_test, y_test) = mnist.load_data()
Format of the dataset
The format of the above four data sets is: numpy.ndarray
, ndarray is an object of N-dimensional array type, and you can print the relevant attributes of the data set for viewing:
print("训练样本的维度为:",x_train.ndim)
print("训练样本的形状为:",x_train.shape)
print("训练样本的元素数量为:",x_train.size)
print("训练样本的数据类型为:",x_train.dtype)
The result is as follows:
It can be seen from its shape that the training sample data set stores 60,000 digital images of 28*28 pixels;
Contents of the dataset
The images stored in it can be printed and viewed through the following code:
for i in range(0,28):
for j in range(0,28):
print("%.1f" % x_train[0][i][j] , end=" ")
print()
The result is as follows:
It's obvious that it's a number5
Since the value of each pixel in the data set is within 0-255
the range, we normalize the data and convert it into a 0-1
floating point number between:
x_train, x_test = x_train / 255.0, x_test / 255.0
As you can see, the data type has changed after processing:
Print out the stored image again, and it can be vaguely seen that it is a number 5
: