Neural Network Learning Small Record 72——Analysis of Calculation Measurement Indicators such as Parameters Parameters, FLOPs Floating Point Operations, FPS Transmission Frames per Second, etc.

Neural Network Learning Small Record 72——Analysis of Calculation Measurement Indicators such as Parameters Parameters, FLOPs Floating Point Operations, FPS Transmission Frames per Second, etc.

study preface

Many students think of the optimization scheme of lightweight when they are studying, but they often face a dilemma: Why is the speed slower when the number of parameters is reduced? In this blog, I will analyze the calculation metrics commonly used in the network.
insert image description here

Computational components of the network

insert image description here
At present, most lightweight models use FLOPs when comparing model speed. This indicator mainly measures the multiplication operation of the convolutional layer.

However, in actual use, it will be found that the network computing speed of the same FLOPS is different, and only FLOPS can not fully represent the speed of the model.

This is a graph from the ShuffleNetV2 paper, where the runtime of the network breaks down into different components. It can be seen from the figure that although convolution takes up most of the time, other operations, including data I/O and Element-wise (AddTensor, ReLU, etc.), also take up a lot of time.

Thus, optimizing solely for the time it takes the network to perform convolutions has some effect, but it is still important to focus on other components of runtime.

What metrics should we focus on on the network

insert image description here
Let's take a look at this picture. This is the ablation experiment picture in YoloX. It gives five indicators. When students are writing their papers, generally so many indicators are enough. After all, there are only so many indicators in SOTA papers like YOLOX.

The graph contains several indicators:

index meaning
AP(%) This represents the detection accuracy of the target detection algorithm.
Parameters The number of parameters refers to how many parameters the model contains.
GFLOPs FLOPs is the number of floating-point operations, which can be used to measure the algorithm/model complexity GFLOPs. One billion (1e9) floating-point operations.
Latency Network forward propagation time, 1 ms=1e-3 s, 10.5ms=0.0105s
FPS Number of frames transmitted per second, FPS=1/Latency, 1/0.0105=95.2

1. Parameters

Parameters parameter quantity. The parameter quantity refers to the number of parameters contained in the model. For example, each number corresponding to the convolution used in our model and the weight matrix in the full connection is a composition of the parameter quantity. Taking the YoloV3 algorithm as an example, the number of parameters is 62,001,757. Generally abbreviated as 62.00M.
insert image description here
Taking the YoloV3 algorithm as an example, the number of parameters is 62,001,757. Generally abbreviated as 62.00M. It should be noted that the parameter quantity of the model is not equal to the size of the storage space, and the unit of storage space is MB (or KB) instead of M.
insert image description here

2. FLOPs floating-point operations

Let's look at the FLOPs parameter again. It should be noted that FLOPS and FLOPs are different.

FLOPS is a measure of processor performance and is an acronym for "Floating Point Operations Performed Per Second".
FLOPs is a measure of algorithm complexity, and is an abbreviation for "number of floating-point operations", and s represents a complex number.

In many papers, FLOPs is an index used to measure the complexity of the algorithm, but the complexity of the algorithm is often not equal to the computing speed of the algorithm. Efficientdet is a very typical example. The FLOPs are small, but the speed is slow and it takes up a lot of video memory.
insert image description here

3. Latency delay

insert image description here
Latency refers to the time it takes for the network to predict a picture. According to YoloX in the above figure, it should not include post-processing (without post processing). That is, it only includes the time of the network fronthaul part.

4. FPS is the number of frames transmitted per second

FPS refers to the number of frames transmitted per second. FPS=1/Latency. After obtaining the above-mentioned Latency delay, the FPS can be easily obtained, just find the reciprocal.

Relationship between indicators

  1. Parameters low ≈ FLOPs low. (FLOPs are basically positively related to Parameters, but FLOPs are also related to the size of the input image. The larger the input image, the larger the FLOPs)
  2. Low FLOPs ≠ low latency. (Low FLOPs ≠ high FPS. The most typical example is EfficientNet. EfficientNet uses a large number of operations with low FLOPs and high data read and write volumes, that is, depth-separable convolution operations. These operations with high data read and write volumes are subject to Due to the limitation of GPU bandwidth, the algorithm wastes a lot of time on reading and writing data, and the GPU computing power is naturally not well applied)
  3. Low Parameters ≠ low Latency. (Low Parameters ≠ FPS high, same as FLOPs, the most typical example is EfficientNet.)

What is the computing speed of the network related to?

The computing speed of the network is related to various factors. Mainly about the following points:

  1. Graphics card: Most SOTA algorithms use V100 or A100.
  2. Network structure: It is not that the lower the number of parameters, the faster the speed is. It is not that adding two deep separable convolutions will make the network faster. There is a MAC concept (Memory Access Cost), mentioned in the ShuffleNet V2 paper. Depthwise separable convolution is a high MAC, low parameter operation. Depthwise separable convolutions perform better in CPU. On some very high-end GPUs, depthwise separable convolutions are not even as good as normal convolutions.
  3. Parallelism of the network: Inception is a model that continuously increases the width of the network, which uses convolutions of different convolution kernel sizes for feature extraction. But it doesn't work particularly fast. Multiple times counts as multiple times.
  4. The number of layers of the network: additional operations such as Relu and ADD are operations that have no parameters but require computing time.
  5. The impact of CUDA, CUDNN, and deep learning algorithm framework versions: On the machine with 1660ti graphics card, the FPS of YOLOX-S is more than 50 in torch1.7 and more than 20 in torch1.2.

Guess you like

Origin blog.csdn.net/weixin_44791964/article/details/124320564