First, the acquisition of data sets
TensorFlow encapsulates the MNIST dataset to make it more convenient for us to use.
from tensorflow.examples.tutorials.mnist import input_data
# 读取数据集,第一次TensorFlow会自动下载数据集到下面的路径中, label 采用 one_hot 形式
# label 默认采用 0~9 来表示,等价于 one_hot=False, read_data_sets 时会默认把图像 reshape(展平)
# 若想保留图像的二维结构,可以传入 reshape=False
mnist = input_data.read_data_sets('/path/to/MNIST_data', one_hot=True)
Second, the division of the data set
1. The dataset will be automatically divided into 3 subsets,train、validation 和 test
# 显示默认数据集的大小
print("Training data size: ", mnist.train.num_examples)
>>> Training data size: 55000
print("Validating data size: ", mnist.validation.num_examples)
>>> Validating data size: 5000
print("Testing data size: ", mnist.test.num_examples)
>>> Testing data size: 10000
2. Use train, validation and test to images 和 labels 方法
obtain images and class labels
- The data types of images and class labels are both
ndarray
# 显示数据集图像和类标的形状
print("Images shape:", mnist.train.images.shape, "Labels shape:", mnist.train.labels.shape)
>>> Images shape: (55000, 784) Labels shape: (55000, 10)
print("Images shape:", mnist.validation.images.shape, "Labels shape:", mnist.validation.labels.shape)
>>> Images shape: (5000, 784) Labels shape: (5000, 10)
print("Images shape:", mnist.test.images.shape, "Labels shape:", mnist.test.labels.shape)
>>> Images shape: (10000, 784) Labels shape: (10000, 10)
3. View a picture in the train dataset大小、类标和像素值
# 图片大小(28*28), TensorFlow 默认把它展开了,但这样丢失了图片的二维结构信息!
print("Example training data0: ", mnist.train.images[0].shape)
>>> Example training data0 shape: (784,)
print("Example training data0 label: ", mnist.train.labels[0])
>>> Example training data0 label: [ 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
print("Example training data0: ", mnist.train.images[0])
- TensorFlow normalizes image pixel values (
|x−255|255 ) , so the value range of the elements in the pixel matrix is [0, 1], which represents the depth of the color . where 0 represents a white background and 1 represents a black foreground.
4. Use mnist.train.next_batch 随机
to take out batch_size pictures and their class labels
batch_size = 100
xs, ys = mnist.train.next_batch(batch_size)
print("X shape:", xs.shape)
>>> X shape: (100, 784)
print("Y shape:", ys.shape)
>>> Y shape: (100, 10)