Get into the habit of writing together! This is the 13th day of my participation in the "Nuggets Daily New Plan · April Update Challenge", click to view the event details .
Building Deep Neural Networks to Improve Model Accuracy
The neural network we used in the previous model had only one hidden layer between the input and output layers. In this section, we will learn to use multiple hidden layers in a neural network (hence the name deep neural network) to explore the effect of network depth on model performance.
A deep neural network means that there are multiple hidden layers between the input layer and the output layer. Multiple hidden layers ensure that neural networks can learn complex nonlinear relationships between inputs and outputs, a requirement that simple neural networks cannot accomplish. A classic deep neural network architecture looks like this:
Building a deep neural network architecture by adding multiple hidden layers between the input and output layers, the steps are as follows.
- Load and scale the dataset:
(x_train, y_train), (x_test, y_test) = mnist.load_data()
num_pixels = x_train.shape[1] * x_train.shape[2]
x_train = x_train.reshape(-1, num_pixels).astype('float32')
x_test = x_test.reshape(-1, num_pixels).astype('float32')
x_train = x_train / 255.
x_test = x_test / 255.
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]
复制代码
- Build a model with multiple hidden layers between the input and output layers:
model = Sequential()
model.add(Dense(512, input_dim=num_pixels, activation='relu'))
model.add(Dense(1024, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
复制代码
The relevant model information for the model architecture is as follows:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 512) 401920
_________________________________________________________________
dense_1 (Dense) (None, 1024) 525312
_________________________________________________________________
dense_2 (Dense) (None, 64) 65600
_________________________________________________________________
dense_3 (Dense) (None, 10) 650
=================================================================
Total params: 993,482
Trainable params: 993,482
Non-trainable params: 0
_________________________________________________________________
复制代码
Since there are more hidden layers in the deep neural network architecture, there are also more parameters in the model.
- Once the model is built, it is time to compile and fit the model:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])
history = model.fit(x_train, y_train,
validation_data=(x_test, y_test),
epochs=50,
batch_size=64,
verbose=1)
复制代码
The accuracy of the trained model is approx 98.9%
., slightly better than the accuracy obtained with the previously used model architecture due to MNIST
the relatively simple dataset. The training and testing losses and accuracy are as follows:
As can be seen in the figure above, the training dataset accuracy is largely better than the test dataset accuracy, which indicates that the deep neural network is overfitting the training data. In later studies, we will learn about ways to avoid overfitting the training data.