definition
Amount of parameters (Params)
The number of parameters refers to the total number of parameters that need to be trained during model training. Used to measure the size of the model (computational space complexity).
Calculations (FLOPs)
The number of floating-point operations, understood as the amount of calculation (calculation time complexity), can be used to measure the complexity of the algorithm, and is often used as an indirect measure of the speed of the neural network model (although recent articles have proved that the speed of the model is evaluated by the indirect standard of FLOPs It is unreliable, because the calculation speed of the model is also related to factors such as memory throughput, but this standard is still widely used as a reference evaluation standard for model speed). When counting FLOPS, we usually count addition, subtraction, multiplication, division, curtaining, square root, etc. as a single FLOP.
Formula representation
convolutional layer
输入维度: W i n ∗ H i n ∗ C i n W_{in}*H_{in}*C_{in} Win∗Hin∗Cin
输出维度: W o u t ∗ H o u t ∗ C o u t W_{out}*H_{out}*C_{out} Wout∗Hout∗Cout
Convolution kernel: kw ∗ kh k_w*k_hkw∗kh
参数量: k w ∗ k h ∗ C i n ∗ C o u t k_w*k_h * C_{in} * C_{out} kw∗kh∗Cin∗Cout
Parameter quantity (including bias): ( kw ∗ kh ∗ C in + 1 ) ∗ C out (k_w*k_h * C_{in} + 1) * C_{out}(kw∗kh∗Cin+1)∗Cout
计算量: k w ∗ k h ∗ C i n ∗ W o u t ∗ H o u t ∗ C o u t k_w*k_h* C_{in} * W_{out} * H_{out} * C_{out} kw∗kh∗Cin∗Wout∗Hout∗Cout
Computation amount (considering addition and offset): [ ( kw ∗ kh ∗ C in ) + ( kw ∗ kh ∗ C in − 1 ) + 1 ] ∗ W out ∗ H out ∗ C out [(k_w*k_h* C_{ in}) +\boldsymbol{(k_w*k_h* C_{in}-1)+1}]* W_{out} * H_{out} * C_{out}[(kw∗kh∗Cin)+(kw∗kh∗Cin−1)+1]∗Wout∗Hout∗Cout
PS: Among them kw ∗ kh ∗ C in k_{w} * k_{h}* C_{in}kw∗kh∗CinIndicates the amount of multiplication, kw ∗ kh ∗ C in − 1 k_{w} * k_{h}* C_{in}-1kw∗kh∗Cin−1 means addition calculation amount, +1 means bias
Pooling and activation layers
Activation and pooling layers just do a transformation of the original matrix and don't introduce new parameters. These other layer types definitely take time, but they don't use dot products and become a rounding error in the total computational complexity of the network.
fully connected layer
The fully connected layer sets the number of neurons in the next layer, and uses affine transformation yi = W ⃗ i ⋅ x ⃗ + bi y_i = \vec W_i \cdot \vec x+b_iyi=Wi⋅x+biGet the value of the next layer of neurons, because the neurons between the two layers will all be connected, which is called full connection, as shown in the figure.
Input dimension: din d_{in}din
Output dimension: dout d_{out}dout
参数量: d i n ∗ d o u t d_{in}*d_{out} din∗dout
Parameter amount (including bias): ( din + 1 ) ∗ dout (d_{in}+1)*d_{out}(din+1)∗dout
计算量: d i n ∗ d o u t d_{in}*d_{out} din∗dout
Computation amount (consider addition and offset): [ din + ( din − 1 ) + 1 ] ∗ dout [d_{in}+(d_{in}-1)+1]*d_{out}[din+(din−1)+1]∗dout
PS: d i n − 1 d_{in} -1 din−1 represents the amount of addition operation (the addition operation required to multiply the weight matrix and the matrix vector of the input/previous layer value), +1 represents the bias, and the input is a multidimensional direct multiplication.
BN layer
The parameters introduced by the BN layer are related to the number of neurons in the input layer. Assuming that the number of input neurons is n , the parameters introduced by this layer are 2n . However, in some cases, the influence of the Batch Normalization layer can be completely ignored.
Summary: For the neural network model, the reduction of network parameters should be mainly aimed at the fully connected layer; when optimizing the amount of calculation, the focus should be on the convolutional layer.
Supplement: In the written test, sometimes the size of the output feature map is not given, and you need to calculate it yourself. The calculation of the size of the feature map is as follows:
卷积层: o u t s i z e = I n P u t S i z e − K e r n e l S i z e + 2 ∗ P a d d i n g S t r i d e + 1 \LARGE {out_{size} = \frac {InPutSize-KernelSize + 2*Padding} {Stride}+1} outsize=StrideInPutSize−KernelSize+2∗Padding+1
池化层: o u t s i z e = I n P u t S i z e − K e r n e l S i z e S t r i d e + 1 \LARGE {out_{size} = \frac {InPutSize-KernelSize } {Stride}+1} outsize=StrideInPutSize−KernelSize+1
PS: When the calculation result is not an integer, the convolution layer is rounded up, and the pooling layer is rounded down.