A brief introduction to the MNIST dataset

Table of contents

Dataset import

Format of the dataset

Contents of the dataset

The MINST data set is a classic data set in the field of machine learning, which includes 70,000 samples, including 60,000 training samples and 10,000 test samples.

Dataset import

Using tensorflowthe framework, kerasget the MNIST dataset by:

mnist = tf.keras.datasets.mnist

Through load_data()the method to load the data in the dataset

The acquired data tupleis stored in the format of:

(训练样本数据集,训练标签数据集),(测试样本数据集,测试标签数据集)

So use the corresponding tuple to receive the data:

(x_train, y_train), (x_test, y_test) = mnist.load_data()

Format of the dataset

The format of the above four data sets is: numpy.ndarray, ndarray is an object of N-dimensional array type, and you can print the relevant attributes of the data set for viewing:

print("训练样本的维度为:",x_train.ndim)
print("训练样本的形状为:",x_train.shape)
print("训练样本的元素数量为:",x_train.size)
print("训练样本的数据类型为:",x_train.dtype)

The result is as follows:

It can be seen from its shape that the training sample data set stores 60,000 digital images of 28*28 pixels;

Contents of the dataset

The images stored in it can be printed and viewed through the following code:

for i in range(0,28):
    for j in range(0,28):
        print("%.1f" % x_train[0][i][j] , end=" ")
    print()

The result is as follows:

It's obvious that it's a number5

Since the value of each pixel in the data set is within 0-255the range, we normalize the data and convert it into a 0-1floating point number between:

x_train, x_test = x_train / 255.0, x_test / 255.0

As you can see, the data type has changed after processing:

Print out the stored image again, and it can be vaguely seen that it is a number 5:

Guess you like

Origin blog.csdn.net/qq_51235856/article/details/130033356