This article briefly records the calculation method, because I can’t remember it every time, and it is too troublesome to ask Baidu every time.
Convolutional layer: (input image size - convolution kernel size + 2*padding)/strides+1
For example, in the picture above, the input image size is 256*256, in_size=3, out_size=64, convolution kernel size is 4*4, strides=2, padding=1. Calculated according to the formula to get 128*128*64. In fact, a simple understanding of the number of channels can extract more features of the image. Maybe this idea is not particularly accurate.
Pooling layer: (input picture size-convolution kernel size+2*padding)/strides+1, for example, the kernel is 2*2, strides=2, the input size is 64*64*3, and it is calculated as 32*32*3 , it can be seen that the pooling layer does not change the number of channels, but only changes the size of the image. Generally, the image size is reduced by twice.