Calculation amount, parameter amount, and memory access amount of the calculation model

1. Calculation amount

        The amount of calculation is the number of calculations required by the model, which reflects the model's demand for hardware computing units. The amount of calculation is generally expressed by  OPs  (Operations), that is, the number of calculations. Since the most commonly used data format is float32, it is often written as  FLOPs  (Floating Point Operations), which is the number of floating point calculations. PyTorch has many tools that can model the calculation amount, but it should be noted that these tools may miss the calculation amount of some operators and calculate their calculation amount as 0, resulting in a slight deviation between the statistical calculation amount and the actual calculation amount. , but in most cases these deviations have little effect.

The following is a Pytorch tool that I commonly use:

from ptflops.flops_counter import get_model_complexity_info

flops, params = get_model_complexity_info(model, (1, 32000), print_per_layer_stat=False)
print("%s %s" % (flops, params))

2. Parameter quantity

        The number of parameters is the total number of parameters in the model, which is directly related to the space required by the model on the disk. For CNN, the parameters mainly consist of the Weight of the Conv/FC layer. Of course, some other operators also have parameters, but they are generally ignored. The parameter amount is often counted as part of the memory access amount, so the parameter amount does not directly affect the model inference performance. However, the number of parameters will affect the memory usage on the one hand, and the program initialization time on the other hand.

The following is the code I commonly use to calculate model parameters (of course the above code already includes the calculation of model parameters):

    def numParams(net):
        num = 0
        for param in net.parameters():
            if param.requires_grad:
                num += int(np.prod(param.size()))
        return num
    print(numParams(model))

3. Amount of access to storage

        The amount of memory access refers to the byte size of the storage unit that needs to be accessed during model calculation, which reflects the model's demand for storage unit bandwidth. It can also be said that when a single sample is input, the total amount of memory exchanges that occur when the model completes a forward propagation process is the space complexity of the model. The amount of memory access is generally expressed in  Bytes (or  KB/MB/GB ), that is, the model calculates how many Bytes of data need to be stored/fetched. Like the calculation amount, the overall memory access amount of the model is equal to the sum of the memory access amounts of each operator of the model. The amount of memory access is crucial to the inference speed of the model and needs to be paid attention to when designing the model.

4. Computational intensity of the model

        The computing intensity of the model can be obtained by dividing the amount of calculation by the amount of memory access. It indicates how many floating-point operations are used for each Byte of memory exchange during the calculation process of this model. The unit is FLOPs/Byte. It can be seen that the greater the modular computing intensity, the higher the memory usage efficiency.

There is no direct causal relationship between the amount of computation and actual inference speed. The amount of calculation can only be used as a reference for the model inference speed .

The indicator we are most concerned about when running a model is actually the number of floating point operations per second (theoretical value) that the model can achieve on the computing platform (server). The unit is FLOPS or FLOP/s.

五、Roof-line Model

        Roof-line Model talks about how fast the floating point calculation speed can be achieved by the model under the constraints of a computing platform. More specifically, what the Roof-line Model solves is "What is the theoretical performance upper limit E that a model with calculation amount A and memory access amount B can achieve on a computing platform with computing power C and bandwidth D" question.

This article refers to: Discussion on deep learning model size and model inference speed - Zhihu (zhihu.com)

Guess you like

Origin blog.csdn.net/qq_42019881/article/details/130843952