Get into the habit of writing together! This is the 10th day of my participation in the "Nuggets Daily New Plan · April Update Challenge", click to view the details of the event .
The effect of batch size on model accuracy
In the original neural network , we used batch size ( batch size
) for all models we built 64
. In this section, we will study the effect of changing the batch size on accuracy. To explore the effect of batch size on model accuracy, let’s compare two cases:
- batch size is 4096
- batch size is 64
When the batch size is large, the number of weight updates in each epoch
is small. When the batch size is small epoch
, multiple weight updates are made for each , because in each epoch
, all the training data in the dataset must be traversed, so if each is batch
used to calculate the loss value with less data, it will cause each to epoch
have more More batch
to traverse the entire dataset. Therefore, the batch
smaller the size, the better the accuracy of the same epoch
trained model. However, you should also make sure that the batch size is not so small as to cause overfitting.
In the previous model, we used a model with a batch size of 64. In this section, we continue to use the same model architecture and only modify the batch size for model training to compare the impact of different batch sizes on model performance. Preprocess the dataset and fit the model:
(x_train, y_train), (x_test, y_test) = mnist.load_data()
num_pixels = x_train.shape[1] * x_train.shape[2]
x_train = x_train.reshape(-1, num_pixels).astype('float32')
x_test = x_test.reshape(-1, num_pixels).astype('float32')
x_train = x_train / 255.
x_test = x_test / 255.
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]
model = Sequential()
model.add(Dense(1000, input_dim=num_pixels, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])
history = model.fit(x_train, y_train,
validation_data=(x_test, y_test),
epochs=50,
batch_size=4096,
verbose=1)
复制代码
The only changes to the code are the batch_size
parameters . Plot epoch
the train and test (the code for plotting the graph is exactly the same as used in training the original neural network ):
在上图中可以注意到,与批大小较小时的模型相比,批大小较大时模型需要训练更多的 epoch
准确率才能达到了 98%。在本节模型中,刚开始的训练阶段,模型准确率相对较低,并且只有在运行了相当多的 epoch
后模型的准确率才能达到较高水平。其原因是,批大小较小时,在每个 epoch
中权重更新的次数要少得多。
数据集总大小为 60000,当我们以批大小为 4096 运行模型 500 个 epoch 时,权重更新进行了 次。当批大小为 64 时,权重更新进行了 次。因此,批大小越小,权重更新的次数就越多,并且通常在 epoch 数相同的情况下,准确率越好。同时,应注意批大小也不能过小,这可能导致训练时间过长以及过拟合情况的出现。