Deep Learning - Parameter Amount & Model Size & Theoretical Calculation Amount

1. The amount of parameters
refers to the number of parameters, that is, how many parameters a model contains\color{blue}{refers to the number of parameters, that is, how many parameters a model contains}Refers to the number of parameters, that is, how many parameters a model contains . Unit M (10 to the 6th power).

1.1. Calculation of convolutional layer parameters
The parameters that need to be paid attention to in the convolutional layer are conv(kernel_size, in_channel, out_channel), that is, the size of the convolution kernel, the number of input and output channels, and the bias.
Calculation formula: param = (k_size * k_size * in_channel + bias) * out_channel
Example:
Image size: 640 * 640 * 3
Convolution kernel size: 3 * 3
Input channel size: 3
Output channel size: 64
The number of parameters is: param = (3 * 3 * 3 + 1) * 64=1792

1.2. Calculation of fully connected layer parameters
The parameters that need to be paid attention to in the fully connected layer include the number of input neurons M and the number of output neurons N.
Calculation formula: param = (M + 1) * N
Example:
Input neurons: 3
Output neurons: 5
The number of parameters is: param=(3+1)*5=20

You need to input batch_size pictures at a time during training, so you need to multiply batch_size.

1.3. Output layer parameter calculation
refers to the parameter quantity of the feature map\color{blue}{refers to the parameter quantity of the feature map}refers to the parameter quantity of the feature map .
The size of the feature map output by the model is HxW, and the number of channels is C, so the total number of parameters is CxHxW.

Calculation formula: param = H * W * C

1.4. The amount of parameters for training and testing
Although the training needs to input batch_size samples at a time, it will not affect the amount of parameters\color{blue}{will not affect the amount of parameters}It will not affect the amount of parameters , because the same set of model parameters act on different samples at the same time, and then sum the loss function values ​​​​obtained on all samples, and finally update the parameters according to the gradient descent method.

2. Model size
refers to the size of a model, that is, the storage space occupied by the model\color{blue}{refers to the size of a model, that is, the storage space occupied by the model}Refers to the size of a model, that is, the storage space occupied by the model . Unit MB (abbreviation for MByte):

In the deep learning neural network, the most common data format is float32, which occupies 4 bytes (Byte). Similarly, float16 occupies 2 bytes. 1024 bytes is 1KB, and 1024x1024 bytes is 1MB. Then the memory size required to store 10000 parameters is 10000x4 Bytes, which is about 39KB. The memory size required to store 1M (1 million) parameters is 39x100/1024MB, which is about 3.8MB. The parameter volume of deep learning neural network is usually more than one million, so we can regard 3.8MB as a basic unit, that is, 3.8MB is required for every one million numbers.

Note that not only the model parameters require storage space, each element in the feature map, the weight gradient calculated by backpropagation during training also requires the same storage space, and the number of gradients is the same as the number of model parameters, because each parameter is A gradient is required to update. \color{blue}{Note that not only the model parameters need storage space, each element in the feature map, the weight gradient calculated by backpropagation during training also needs the same storage space, and the number of gradients is the same as the number of model parameters , because each parameter requires a gradient to update. }Note that not only the model parameters require storage space, each element in the feature map, the weight gradient calculated by backpropagation during training also requires the same storage space, and the number of gradients is the same as the number of model parameters, because each parameter is A gradient is required to update.

3. The theoretical calculation amount
FLOPs, s refers to seconds, which means the number of floating-point operations per second, and is a standard for considering the calculation amount of a network model. The unit is the same as the parameter quantity unit, the unit of the large model is usually G, and the unit of the small model is usually M.
Example:
input: N * H * W
output: M * H * W
filters: K * K
params: N * K * K * M
FLOPs: W * H * N * K * K * M
If addition is added: (( K * K + 1) * N+ (N - 1)) * W * H * M
Among them, (K * K + 1) means that bias is superimposed when calculating a convolution kernel, and multiplied by N means that it is performed in the channel direction of the input The product, that is, a set of products, N - 1 means that after a set of convolutions, N-1 additions are performed on N results to superimpose the results, (K * K + 1) * N + (N - 1) means that the feature map is finally converged A point on , multiplied by W * H * M represents the number of all points in the final output feature map.

Note that the process of backpropagating to update weight parameters also requires computation. \color{blue}{Note that the process of backpropagating to update weight parameters also requires calculation. }Note that the process of backpropagating to update weight parameters also requires computation.

Reference:
https://www.jianshu.com/p/4afe8308dab1
https://blog.csdn.net/suiyingy/article/details/125173843
https://www.pythonheidong.com/blog/article/1370711/dd18f8f5f126ea76beb6/
http://events.jianshu.io/p/41c1ab51467d
https://blog.csdn.net/dhy5710/article/details/127520025
https://www.pudn.com/news/6228c58d9ddf223e1ad04a1e.html

Guess you like

Origin blog.csdn.net/weixin_40826634/article/details/128164063