Why should a non-linear function be used as an activation function?
The default step size of convolution is 1, and the default step size of pooling is the same as the size of kernel_size
After the fully connected layer, the convolutional layer and the pooling layer cannot be added