The amount of parameters (Params) and calculation (FLOPs) of the neural network

definition

Amount of parameters (Params)

The number of parameters refers to the total number of parameters that need to be trained during model training. Used to measure the size of the model (computational space complexity).

Calculations (FLOPs)

The number of floating-point operations, understood as the amount of calculation (calculation time complexity), can be used to measure the complexity of the algorithm, and is often used as an indirect measure of the speed of the neural network model (although recent articles have proved that the speed of the model is evaluated by the indirect standard of FLOPs It is unreliable, because the calculation speed of the model is also related to factors such as memory throughput, but this standard is still widely used as a reference evaluation standard for model speed). When counting FLOPS, we usually count addition, subtraction, multiplication, division, curtaining, square root, etc. as a single FLOP.

Formula representation

convolutional layer

insert image description here
输入维度: W i n ∗ H i n ∗ C i n W_{in}*H_{in}*C_{in} WinHinCin
输出维度: W o u t ∗ H o u t ∗ C o u t W_{out}*H_{out}*C_{out} WoutHoutCout
Convolution kernel: kw ∗ kh k_w*k_hkwkh

参数量: k w ∗ k h ∗ C i n ∗ C o u t k_w*k_h * C_{in} * C_{out} kwkhCinCout
Parameter quantity (including bias): ( kw ∗ kh ∗ C in + 1 ) ∗ C out (k_w*k_h * C_{in} + 1) * C_{out}(kwkhCin+1)Cout

计算量: k w ∗ k h ∗ C i n ∗ W o u t ∗ H o u t ∗ C o u t k_w*k_h* C_{in} * W_{out} * H_{out} * C_{out} kwkhCinWoutHoutCout
Computation amount (considering addition and offset): [ ( kw ∗ kh ∗ C in ) + ( kw ∗ kh ∗ C in − 1 ) + 1 ] ∗ W out ∗ H out ∗ C out [(k_w*k_h* C_{ in}) +\boldsymbol{(k_w*k_h* C_{in}-1)+1}]* W_{out} * H_{out} * C_{out}[(kwkhCin)+(kwkhCin1)+1]WoutHoutCout

PS: Among them kw ∗ kh ∗ C in k_{w} * k_{h}* C_{in}kwkhCinIndicates the amount of multiplication, kw ∗ kh ∗ C in − 1 k_{w} * k_{h}* C_{in}-1kwkhCin1 means addition calculation amount, +1 means bias

Pooling and activation layers

Activation and pooling layers just do a transformation of the original matrix and don't introduce new parameters. These other layer types definitely take time, but they don't use dot products and become a rounding error in the total computational complexity of the network.
insert image description here

fully connected layer

The fully connected layer sets the number of neurons in the next layer, and uses affine transformation yi = W ⃗ i ⋅ x ⃗ + bi y_i = \vec W_i \cdot \vec x+b_iyi=W ix +biGet the value of the next layer of neurons, because the neurons between the two layers will all be connected, which is called full connection, as shown in the figure.
insert image description here
Input dimension: din d_{in}din
Output dimension: dout d_{out}dout

参数量: d i n ∗ d o u t d_{in}*d_{out} dindout
Parameter amount (including bias): ( din + 1 ) ∗ dout (d_{in}+1)*d_{out}(din+1)dout

计算量: d i n ∗ d o u t d_{in}*d_{out} dindout
Computation amount (consider addition and offset): [ din + ( din − 1 ) + 1 ] ∗ dout [d_{in}+(d_{in}-1)+1]*d_{out}[din+(din1)+1]dout
PS: d i n − 1 d_{in} -1 din1 represents the amount of addition operation (the addition operation required to multiply the weight matrix and the matrix vector of the input/previous layer value), +1 represents the bias, and the input is a multidimensional direct multiplication.

BN layer

The parameters introduced by the BN layer are related to the number of neurons in the input layer. Assuming that the number of input neurons is n , the parameters introduced by this layer are 2n . However, in some cases, the influence of the Batch Normalization layer can be completely ignored.

Summary: For the neural network model, the reduction of network parameters should be mainly aimed at the fully connected layer; when optimizing the amount of calculation, the focus should be on the convolutional layer.

Supplement: In the written test, sometimes the size of the output feature map is not given, and you need to calculate it yourself. The calculation of the size of the feature map is as follows:

卷积层: o u t s i z e = I n P u t S i z e − K e r n e l S i z e + 2 ∗ P a d d i n g S t r i d e + 1 \LARGE {out_{size} = \frac {InPutSize-KernelSize + 2*Padding} {Stride}+1} outsize=StrideInPutSizeKernelSize+2Padding+1

池化层: o u t s i z e = I n P u t S i z e − K e r n e l S i z e S t r i d e + 1 \LARGE {out_{size} = \frac {InPutSize-KernelSize } {Stride}+1} outsize=StrideInPutSizeKernelSize+1

PS: When the calculation result is not an integer, the convolution layer is rounded up, and the pooling layer is rounded down.

Guess you like

Origin blog.csdn.net/weixin_46707326/article/details/128302771