Analyze the MNIST dataset based on TensorFlow

Table of contents

MNIST dataset analysis and preprocessing

Choice of CPU/GPU

comparative analysis

Concrete operation

Modeling

compile model

training model

evaluation model

save model

Results visualization

Use the model to make predictions

AutoEncoder self-encoder

common autoencoder

Convolutional Autoencoder

MNIST dataset analysis and preprocessing

You can refer to:

A brief introduction to the MNIST dataset - THE WHY Blog - CSDN Blog

Choice of CPU/GPU

comparative analysis

CPU: Central Processing Unit (Central Processing Unit), which can perform general-purpose calculations with high complexity, but the amount of calculation is small;

GPU: Graphics Processing Unit (Graphics Processing Unit), which can perform simple operations with low complexity, but the amount of calculation is large

Based on the characteristics of the above two, we can see that to analyze MNIST, the main operations are graphics-like matrix operations, which have a large amount of calculation, but do not involve complex operations, so it is more appropriate to choose GPU;

Concrete operation

Before choosing a GPU, we first need to determine whether there is a GPU on our computer, and in order to avoid version unsupported, it is best to install and use the latest TensorFlow GPU version, I personally use the 2.11.0version

The code is implemented as follows:
Check if there is an available GPU:

print(tf.test.is_gpu_available())

If the return value is False, no GPU is available

You can also print the GPU list for viewing:

pring(tf.config.experimental.list_physical_devices('GPU'))

If no GPU is available, we can choose to install GPU or use CPU

There is no need for additional configuration to use the CPU. The default is to use the CPU. Of course, you can also view the currently available CPUs:

pring(tf.config.experimental.list_physical_devices('CPU'))

If there is a GPU available, we can choose the appropriate one to use

gpu = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(gpu[0],True) #具体选用哪个视情况而定

Modeling

tf.keras.models.Sequential()Model by method

Sequential is a container that can load layers of linear stacks into the model to create a network structure from the input layer to the output layer;

# 建立模型
model = tf.keras.Sequential()
# 通过add方法向模型中加载网络层(layer)
model.add(tf.keras.layers.Flatten(input_shape=(28,28))) # 每个样本有28*28个像素点,通过拉直层进行输入
model.add(tf.keras.layers.Dense(128,activation='relu')) # 隐含层,设置128个结点,激活函数使用Relu
model.add(tf.keras.layers.Dense(10,activation='softmax')) # 输出层,设置10个结点(数字0-9),激活函数使用softmax归一化

The layers used are FlattenandDense

FlattenIt is used to process the input and flatten the input tensor. It input_shape=(28,28)can be seen that our input size is a two-dimensional array of 28*28, which Flattencan be converted into a one-dimensional array through the layer, so that these data can be effectively passed to each neuron of the model

Dense: Fully connected layer, our hidden layer and output layer all use Densethe structure, and the parameters are set as follows:

Dense(number of neurons, activation = "activation function", kernel_regularizer = "regularization method)

Optional activation functions: relu, softmax, sigmoid, tanh, etc.

Optional regularization methods: tf.keras.regularizers.l1(), tf.keras.regularizers.l2(), etc.

  • In the hidden layer, 128 nodes are set, and the activation function uses the Relu function;
    • The sigmod function:

    • relu function:

    • In contrast, the relu function simplifies the calculation process, eliminates the influence of the exponential function on gradient descent, and can reduce calculation costs, so we choose the relu function
  • In the output layer, set 10 nodes (output numbers 0-9), and the activation function uses the softmax function for normalization;

After the model is built, we can print model-related information:

print(model.summary())

The output is as follows:

Layer (type): the name (type) of the network layer, the name can be specified tf.keras.layers.Dense()byname属性

Output Shape: the output shape of each layer

Param: The number of weights of neurons in each layer of the fully connected layer neural network; calculation formula: (input_shape+1) * number of neurons

  • For example, dense_1 layer, input_shape=128, the number of neurons is 10, (128+1)*10=1290

compile model

compileCompile the model by method:

# 优化器:选用adam优化器,学习率设置为0.1
optimizer = tf.keras.optimizers.Adam(lr=0.1)
# 损失函数:选用交叉熵损失函数 ‘from_logits=False’ 表示将输出转为概率分布的形式
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False)
model.compile(optimizer=optimizer,loss=loss_fn,metrics=['accuracy'])

The method has three parameters:

1. optimizer: optimizer, which can be in the form of a function or a string; choose the adam optimizer:

optimizer="adam"oroptimizer=tf.keras.optimizers.Adam()

The learning rate can be configured by selecting the function form: tf.keras.optimizers.Adam(lr=), the default learning rate is 0.001

Specific information and other optimizer references: tf.keras.optimizers.Adam | TensorFlow v2.12.0

2. loss: Loss function, which can be in the form of a function or a string; choose the sparse classification cross-entropy loss function:

loss="sparse_categorical_crossentropy"orloss=tf.keras.losses.SparseCatagoricalCrossentropy(from_logits = False)

Other commonly used loss functions are also MSEcompared with the cross-entropy loss function as follows:

MSE:

Cross-entropy:

Since the gradient of the cross-entropy loss function is not easy to "stuck" in the local optimal solution, it is preferred;

Specific information and other loss function reference: tf.keras.losses.SparseCategoricalCrossentropy | TensorFlow v2.12.0

3.metrics: Accuracy evaluation standard, common options: 'accuracy', sparse_accuracy,sparse_categorical_accuracy

For specific information, please refer to: tf.keras.metrics.Accuracy | TensorFlow v2.12.0

training model

Use fitthe method to train the model:

history = model.fit(x_train,y_train,batch_size=128,epochs=10,validation_split=0.1,verbose=2)

Parameter analysis:

fit(x=None, y=None, batch_size=None, epochs=1, verbose=1,validation_split=0.0)

x, y: input and output, required

batch_size: The number of samples for each gradient update, the default value is 32

epochs: number of iterations

verbose: format of log printing: 0: no log information output; 1: display progress bar; 2: output a row of records per iteration

validation_split: split part of the training data set as validation data, and the rest as training data;

return content:

fitThe return value of the method is a History object, including the loss of the dataset and the accuracy of training

loss = history.history['loss']
val_loss = history.history['val_loss']
accuracy = history.history['accuracy']
val_accuracy = history.history['val_accuracy']
print("训练集损失:",loss)
print("测试集损失:",val_loss)
print("训练集准确率:",accuracy)
print("测试集准确率:",val_accuracy)
# 注意,key的名称与选择的损失函数和准确率评测标准有关

The printed results are as follows:

You can see that the relevant parameters of each iteration are printed out;

For details, refer to: tf.keras.Model | TensorFlow v2.12.0

evaluation model

Evaluate the model by evaluate:

model.evaluate(x_test,y_test,verbose=2)

Parameter analysis:

model.evaluate(x,y,batch_size,verbose)

x, y: test data set, label of test data set

batch_size: How many samples are used for each evaluation calculation

verbose: format of log printing: 0: no log information output; 1: display progress bar; 2: output a row of records per iteration

For details, refer to: tf.keras.Model | TensorFlow v2.12.0

save model

Parameter analysis:

model.save(存储路径): Save the model to the corresponding path

model.save(模型名称): If no path is specified, it will be saved to the current working path by default

Therefore, we can save the entire trained model and load it directly when using it:

model = tf.keras.models.load_model(模型路径)

This model can be used directly formodel.predict

For details, refer to: Training Checkpoint | TensorFlow Core

Results visualization

Result visualization needs matplotlib.pyplotto be used for plotting

Official documentation: matplotlib — Matplotlib 3.7.1 documentation

1. plt.figure: Custom canvas related properties

plt.figure(num='first',figsize=(10,3),dpi=75, facecolor='#FFFFFF', edgecolor='#0000FF')

num: the number of the image; figsize: the size of the canvas; dpi: pixels per inch; facecolor: the color of the canvas; edgecolor: the color of the edge of the canvas

文档:matplotlib.figure — Matplotlib 3.7.1 documentation

2. plt.subplot(nrows, ncols, index): Used to draw multiple subgraphs at once

The entire plotting area of ​​the chart is divided into nrows rows and ncols columns

Number the subgraphs according to the rule from left to right and from top to bottom, index specifies the position of the subgraph to be drawn in the entire drawing area

文档:matplotlib.pyplot.subplot — Matplotlib 3.7.1 documentation

3. plt.plot: Draw a line chart

plt.plot(x,y,color='b',label='标签名称')

x: x-axis data; y: y-axis data; color: label color (commonly used: b-blue, r-red)

If only one set of data is filled in, the default value of the x-axis is filled with [0,1,2,3,4.......], which is the length of y[]

文档:matplotlib.pyplot.plot — Matplotlib 3.7.1 documentation

4. plt.xlabel() plt.ylabel(): Unit name of x, y axis

5. plt.legend(): Automatically detect the elements that should be displayed in the legend and make them appear

6. plt.title(): Set title

According to the above commonly used drawing API, we can draw the change curve of loss and accuracy;

plt.figure(figsize=(10,3))
plt.plot(loss,color='b',label='train')
plt.plot(val_loss,color='r',label='test')
plt.ylabel('loss')
plt.legend()

plt.figure(figsize=(10,3))
plt.plot(accuracy,color='b',label='train')
plt.plot(val_accuracy,color='r',label='test')
plt.ylabel('accuracy')
plt.legend()

Use the model to make predictions

① Randomly select an image from the test data set:

id = np.random.randint(1,10000)

tf.reshapeMake it conform to the input format of the model by refactoring;

num = tf.reshape(x_test[id],(1,28,28)) 

③ By model.predictmaking predictions:

model.predict(num)

④ Obtain forecast results

res = np.argmax(model.predict(num))

Find the maximum value from the predicted results (in the following format) as the predicted result


2023.4.10 update: 

AutoEncoder self-encoder

common autoencoder

Autoencoder is an unsupervised data dimension compression and data feature expression method, which consists of two parts:

Encoder: compresses the input into a latent space representation

Decoder: Reconstructs input from latent space representations

Simply put, an autoencoder is a neural network that tries to make the output the same as the input. By setting the dimension of the potential space representation to be smaller than the dimension of the input data, the autoencoder is incomplete, thus forcing the autoencoder to learn the input data. The salient features of the data, so as to better extract useful information from the data, and the accuracy rate is higher when reconstructing the input from the latent space representation;

import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf

from keras import layers
from keras.models import Model

# 加载数据
mnist = tf.keras.datasets.mnist
(x_train, _), (x_test, _) = mnist.load_data()

# 获取训练集和测试集
x_train, x_test = x_train / 255.0, x_test / 255.0

# 选取10%的训练数据集
x_train = x_train[0:6000]

latent_dim = 64 # 控制编码器的压缩程度(隐含层的结点数量)

class Autoencoder(Model): # 继承Model类,自定义自编码器模型
  def __init__(self, latent_dim):
    super(Autoencoder, self).__init__() # 自编码器的初始化
    self.latent_dim = latent_dim
    # 编码器,将原始图像压缩成64维的隐向量,相当于是隐含层
    self.encoder = tf.keras.Sequential([
      layers.Flatten(),
      layers.Dense(latent_dim, activation='relu'),
    ])
    # 解码器,从隐空间中重构图像
    self.decoder = tf.keras.Sequential([
      layers.Dense(784, activation='sigmoid'),
      layers.Reshape((28, 28))
    ])

  def call(self, x):
    encoded = self.encoder(x) # 先编码
    decoded = self.decoder(encoded) # 然后解码
    return decoded # 返回解码后的图像

autoencoder = Autoencoder(latent_dim) # 创建自编码器

# 由于Autoencoder继承了Model类,因此创建的自编码器也就相当于一个自定义模型,可以使用model的方法对模型进行编译,训练
# 编译模型:使用adam优化器,交叉熵损失函数
autoencoder.compile(optimizer='adam', loss='categorical_crossentropy')
# 训练模型
autoencoder.fit(x_train, x_train,
                epochs=10,
                shuffle=True,
                validation_data=(x_test, x_test))
# 保存训练的模型
autoencoder.save_weights("AEtest.h5")

# 模型训练完成,输入测试集中的数据进行预测
encoded_imgs = autoencoder.encoder(x_test).numpy()
decoded_imgs = autoencoder.decoder(encoded_imgs).numpy()

n = 10
plt.figure(figsize=(20, 4))
for i in range(n):
  # 随机选取测试集中的一张图像
  num = np.random.randint(1, 10000)

  # 展示初始的图像
  ax = plt.subplot(2, n, i + 1)
  plt.imshow(x_test[num])
  plt.title("original")
  plt.gray()
  ax.get_xaxis().set_visible(False)
  ax.get_yaxis().set_visible(False)

  # 展示经过自编码器处理之后的图像
  ax = plt.subplot(2, n, i + 1 + n)
  plt.imshow(decoded_imgs[num])
  plt.title("reconstructed")
  plt.gray()
  ax.get_xaxis().set_visible(False)
  ax.get_yaxis().set_visible(False)
plt.show()

The result is as follows:

The upper layer is the input test set, and the lower layer is the image obtained after encoding and decoding by the self-encoder;

The above test uses the MNIST data set, using 10% of the data; the code implementation is modified from the example of the official website:

Intro to Autoencoders  |  TensorFlow Core

From the above results, we can see that the effect of decoder reconstruction can only be said to be average. It can also be seen from the information printed during model training that both loss and val_loss are relatively large:

And this phenomenon is more obvious when we only use 1% of the data set. Due to the lack of training samples, the autoencoder cannot learn the characteristics of the data well:

As shown in the figure above, a very abstract result appeared;

So I increased the training iterations setting to 100:

However, it is found that the loss of the test value (val_loss) will increase, so after trying 200 iterations, it can be seen that the decrease in loss is very limited:

The results of the test are also not satisfactory:

This shows that ordinary autoencoders are not good at extracting effective features from small-scale data sets;

Convolutional Autoencoder

To fix this, I tried another convolutional autoencoder:

The so-called convolutional self-encoder, in simple terms, is to replace the fully connected neural network in the ordinary self-encoder with a convolutional neural network for feature extraction and reconstruction;

The code implementation is as follows: (also modified according to the code on the official website)

import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf

from keras import layers,losses
from keras.models import Model

# 加载数据
mnist = tf.keras.datasets.mnist
(x_train, _), (x_test, _) = mnist.load_data()

# 获取训练集和测试集
x_train, x_test = x_train / 255.0, x_test / 255.0

# 选取10%的训练数据集
x_train = x_train[0:600]

latent_dim = 64 # 控制编码器的压缩程度(隐含层的结点数量)

class Denoise(Model):
  def __init__(self):
    super(Denoise, self).__init__()
    self.encoder = tf.keras.Sequential([
      layers.Input(shape=(28, 28, 1)), # shape=(28,28,1)分表表示RGB图像的高,宽和通道数(由于MNIST数据集是单色的,所以通道数为1
      # 输入的每一个通道都要与每一个卷积核进行卷积运算,生成特征图
      # 每个通道都可以看做是原始图像的一个抽象,堆的越多,神经网络汇总每一层的信息就越多,原始图像的损失就越少
      # 而在自编码器中,对输入的象征进行降采样以提供较小维度潜在表示,并强制自编码器学习象征的压缩版本
      # 这里输入1个通道,设置16个卷积核,进行卷积运算后生成16个特征图,从而输出通道数就是16
      layers.Conv2D(16, (3, 3), activation='relu', padding='same', strides=2),
      # 这里输入16个通道,设置8个卷积核,进行卷积运算后生成8个特征图,从而输出通道数就是8
      layers.Conv2D(8, (3, 3), activation='relu', padding='same', strides=2)])

    self.decoder = tf.keras.Sequential([
      # 卷积的逆操作
      layers.Conv2DTranspose(8, kernel_size=3, strides=2, activation='relu', padding='same'),
      layers.Conv2DTranspose(16, kernel_size=3, strides=2, activation='relu', padding='same'),
      # 设置1个通道,生成的特征图就是经过自编码器编码解码处理后的图像
      layers.Conv2D(1, kernel_size=(3, 3), activation='sigmoid', padding='same')])

  def call(self, x):
    encoded = self.encoder(x)
    decoded = self.decoder(encoded)
    return decoded

autoencoder = Denoise()


autoencoder.compile(optimizer='adam', loss=losses.MeanSquaredError())# 训练模型

autoencoder.fit(x_train, x_train,
                epochs=10,
                shuffle=True,
                validation_data=(x_test, x_test))
# 保存训练的模型
# autoencoder.save_weights("AEtest.h5")

# 模型训练完成,输入测试集中的数据进行预测
encoded_imgs = autoencoder.encoder(x_test).numpy()
decoded_imgs = autoencoder.decoder(encoded_imgs).numpy()

n = 10
plt.figure(figsize=(20, 4))
for i in range(n):
  # 随机选取测试集中的一张图像
  num = np.random.randint(1, 10000)

  # 展示初始的图像
  ax = plt.subplot(2, n, i + 1)
  plt.imshow(x_test[num])
  plt.title("original")
  plt.gray()
  ax.get_xaxis().set_visible(False)
  ax.get_yaxis().set_visible(False)

  # 展示经过自编码器处理之后的图像
  ax = plt.subplot(2, n, i + 1 + n)
  plt.imshow(decoded_imgs[num])
  plt.title("reconstructed")
  plt.gray()
  ax.get_xaxis().set_visible(False)
  ax.get_yaxis().set_visible(False)
plt.show()

For the first test, I set up a two-layer convolutional network with convolution kernels of 16 and 8:

Iterate 10 times for training, the training process and test results are as follows

Since the number of convolution kernels of the convolutional network corresponds to the number of channels of the image, and each channel can be regarded as an abstraction of the original image, the more channels, the more information the neural network obtains, and the loss of the original image less so I ran the following test again:

Set up a two-layer convolutional network with convolution kernels of 32 and 16 respectively:

Iterate 10 times for training, the training process and test results are as follows

It can be seen that the effect of loss reduction and image fitting are relatively good

Finally, summarize the advantages and disadvantages of CNN autoencoders:

The CNN autoencoder is an autoencoder that uses a convolutional neural network for feature extraction and reconstruction. Its advantages and disadvantages are as follows:

advantage:

Can extract the local features of the image: the convolutional neural network has good local perception ability and can extract the local features of the image, so the CNN autoencoder can better preserve the local structure of the image.

Features can be learned adaptively: CNN autoencoders can learn features of images adaptively without manually designing feature extractors, so they can better adapt to different datasets and tasks.

Can be used for image noise reduction and artifact removal: CNN self-encoders can learn low-dimensional representations of images, and can be used for image noise reduction and artifact removal tasks.

shortcoming:

Long training time: Since the CNN autoencoder needs to learn a large number of parameters, the training time is long and requires large computing resources and time.

Easy to overfit: Since the CNN autoencoder has a strong learning ability, it is easy to overfit on the training set, resulting in poor performance on the test set.

Low efficiency for large-scale image processing: Since the CNN autoencoder needs to perform convolution operations on the entire image, the processing efficiency for large-scale images is low.

It can also be seen from the above that for the training of the MNIST dataset, the effect of using the CNN autoencoder will be better, especially in the case of fewer training data samples; the CNN autoencoder can better process image data, Because they can utilize convolutional layers to extract spatial features in images. These convolutional layers can capture local patterns and structures in the input image to better reconstruct the image. In addition, the CNN autoencoder can also use the pooling layer to reduce the spatial size of the image, thereby reducing the number of parameters of the model and improving training efficiency;

I have learned these contents for the time being, to be continued;

(If there are any mistakes in the article, please correct me! The author is a novice who is just getting started, so it is inevitable that there will be places where I don’t understand~)

Guess you like

Origin blog.csdn.net/qq_51235856/article/details/130043631