Keras Deep Learning - Impact of Input Value Distribution on Neural Network Model Performance

Get into the habit of writing together! This is the 11th day of my participation in the "Nuggets Daily New Plan · April Update Challenge", click to view the details of the event .

Effect of Input Value Distribution on Model Performance

While we've been able to recognize handwritten digits with high accuracy, we haven't MNISTlooked at the distribution of values ​​in the dataset, and different distributions of input values ​​can change the speed of training. In this section, we'll see how to train weights faster by modifying the input values ​​to reduce training time. Build the exact same model architecture as the original neural network , however, with some small changes to the input dataset:

  • Invert the background and foreground colors. Essentially, paint the background white and the numbers black.

We first theoretically analyze the impact of pixel values ​​on model performance. Since the black pixel value is zero, when this input is multiplied by any weight value, the output is zero. This will cause the weight value of the black pixels connected to the hidden layer to change without affecting the loss value. However, if there is a white pixel, then it will contribute to some hidden node values ​​and the weights need to be adjusted.

  1. Load and scale the input dataset:
from keras.datasets import mnist
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.utils import np_utils
import matplotlib.pyplot as plt

(x_train, y_train), (x_test, y_test) = mnist.load_data()

num_pixels = x_train.shape[1] * x_train.shape[2]
x_train = x_train.reshape(-1, num_pixels).astype('float32')
x_test = x_test.reshape(-1, num_pixels).astype('float32')
x_train = x_train / 255.
x_test = x_test / 255.

y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]
复制代码
  1. See the distribution of input values:
x_train.flatten()
复制代码

The preceding code flattens all inputs into a shape of ( 28 × 28 × x _ t r a i n . s h a p e [ 0 ] = 47040000 28\times 28 \times x\_train.shape[0]=47040000 ) list. Plot the distribution of all input values:

plt.hist(x_train.flatten())
plt.title('Histogram of input values')
plt.xlabel('Input values')
plt.ylabel('Frequency of input values')
plt.show()
复制代码

Since the background of the input image is black, most of the input is zero (black pixel value).

data value distribution

  1. Use the following code to invert the colors so that the background is white and the numbers are black.
x_train = 1-x_train
x_test = 1-x_test
复制代码

Draw the image:

plt.subplot(221)
plt.imshow(x_train[0].reshape(28,28), cmap='gray')
plt.subplot(222)
plt.imshow(x_train[1].reshape(28,28), cmap='gray')
plt.subplot(223)
plt.imshow(x_test[0].reshape(28,28), cmap='gray')
plt.subplot(224)
plt.imshow(x_test[1].reshape(28,28), cmap='gray')
plt.show()
复制代码

As follows:reversed image

The histogram of the resulting image after inverting the colors looks like this:

data value distribution

As you can see, most of the input values ​​now have a value of 1.

  1. Use the exact same model architecture as before:
model = Sequential()
model.add(Dense(1000, input_dim=num_pixels, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])

history = model.fit(x_train, y_train,
                    validation_data=(x_test, y_test),
                    epochs=50,
                    batch_size=64,
                    verbose=1)
复制代码

Plot epochthe training and testing:

Changes in accuracy and loss values

It can be seen that the model accuracy drops to 97%100%. In contrast, when the data set is not inverted (the data values ​​in the data set are mostly zero), using the same epochnumber , batch size and model architecture, the model obtained by training has Accuracy is approx 98%. The accuracy of the model in the case of pixel value inversion (fewer data value zeros in the dataset) 97%and the training process is much slower than the case where most of the input pixels are zero. When the majority of pixels are zero, the training of the model is easier because it only needs to make predictions based on the few pixel values ​​with pixel values ​​greater than zero. However, when most pixels are not zero, more weights need to be fine-tuned to reduce the loss value.

Related Links

Keras Deep Learning - Training Raw Neural Networks

Keras Deep Learning - Scaling Input Datasets to Improve Neural Network Performance

Guess you like

Origin juejin.im/post/7086047828643938340