Tips for using MegEngine: Roofline analysis of Android opencl operators with mperf

foreword

  • Roofline analysis is a simple way to evaluate the utilization of current computing tasks on the computing/memory access capabilities of the current platform, which can help analyze the optimization direction and optimization potential of operators. mperf realizes the roofline analysis capability of the Android Mali/adreno GPU platform. The following takes the Mali platform as an example to briefly introduce the operation steps.

compile and integrate

  • download repo code

    git clone https://github.com/MegEngine/mperf.git
    git submodule update --init --recursive
    
  • Compile and install

    ./android_build.sh -g mali
    cmake --build <mperf_build_dir> --target install
    
  • project integration

    set(mperf_DIR /path/to/your/installed/mperfConfig.cmake)
    find_package(mperf REQUIRED)
    target_link_libraries(your_target mperf::mperf)
    

    For the compilation and integration part, see mperf readme for details

Get roofline data

  • Obtain the GFLOPs and GBPs of the opencl operator execution process

    // define the measurement set
    mperf::GpuCounterSet gpu_set = {
        mperf::GpuCounter::GFLOPs,
        mperf::GpuCounter::GBPs,
    };
    mperf::XPMU xpmu(gpu_set);
    xpmu.run();
    
    ... // add your opencl kernel calls
    

    For detailed test samples, see mali_gpu_pmu_test

  • Obtain the peak computing power and memory access bandwidth of the current gpu platform

    • Copy the gpu_inst_gflops_latency and gpu_spec_dram_bw in the build_dir/apps directory obtained during the compilation phase to the mobile phone and execute it, and you can get the actual peak computing power and peak bandwidth of the gpu

    For the detailed logic of peak performance testing, see gpu_inst_gflops_latency and gpu_spec_dram_bw

draw roofline

  • In the previous step, we got the GFLOPs and GBPs of the opencl operator execution process and the measured peak computing power and peak bandwidth of the gpu. Now we can use the mperf plot_roofline script to draw the roofline curve:
    • Edit roofline_data.txt:

      # params for plotting roofs, gpu peak calculation and memory ability
      memroofs 26.3
      mem_roof_names 'DRAM' 
      comproofs 1159         
      comp_roof_names 'FMA'   
      
      # omit the following if only plotting roofs
      # the measured data for your opencl kernel call, AI is measured_GFLOPs/measrured_GBPs
      AI 15.5                 
      FLOPS 261               
      labels 'FMA, DRAM' 
      
    • Execute the python script:

      python3 plot_roofline.py ./roofline_data.txt
      

    • For example, in the roofline curve obtained above, the calculation memory access ratio of the operator is smaller than the machine balance point (usually the abscissa of the eaves and the turning point of the roof is called the machine balance point), so it can be preliminarily judged that the operator is mainly bound on the platform. For the memory access part, the computing power resources of the platform are still sufficient for this operator. In addition, the ratio of the actual bandwidth of the operator to the peak bandwidth of the machine can be used to evaluate how much room there is for subsequent memory access optimization.
    • At the same time, it is reminded that when we obtain the operator GBPs, we get the actual ddr access amount of the operator. This access amount can be compared with the total memory usage of the input and output variables of the operator, so as to measure the operator How many repeated memory accesses are not covered by cache and registers, resulting in repeated accesses to ddr. If it is observed that the amount of DDR memory access is significantly greater than the total memory usage of input and output, then we need to examine whether the memory access logic of the operator is not cache-friendly enough, whether some repeated memory access can be avoided by adding some cache logic, and so on.

expand thinking

  • Through the above steps, we obtained the roofline data, which can help us determine whether the current operator is computing bound or memory access bound on the current platform, as well as the gap size relative to peak computing power and peak bandwidth. However, it is difficult to further specify the location of the bottleneck and the countermeasures to alleviate it only by relying on the roofline analysis. For example, the reason for the memory access bound is due to the low memory access efficiency of which level of storage? Is the calculation bound due to instruction dependence or a certain type of alu hardware resource shortage?
  • In order to solve these problems, mperf also made some hardware parameter detection, PMU data processing analysis, dynamic and static code analysis of opencl kernel (the function of dynamic and static code analysis is still under internal iterative development, and has not yet been pushed to the open source repo) and other attempts , as much as possible to make operator performance analysis and optimization more traceable, or less mental burden.

Attached:

To get more information about MegEngine, you can: view documents and GitHub projects , or join the MegEngine user communication QQ group: 1029741705. Welcome to contribute to the MegEngine community, become an Awesome MegEngineer , and enjoy endless certificates of honor and customized gifts.

Clarification about MyBatis-Flex plagiarizing MyBatis-Plus Arc browser officially released 1.0, claiming to be a substitute for Chrome OpenAI officially launched Android version ChatGPT VS Code optimized name obfuscation compression, reduced built-in JS by 20%! LK-99: The first room temperature and pressure superconductor? Musk "purchased for zero yuan" and robbed the @x Twitter account. The Python Steering Committee plans to accept the PEP 703 proposal, making the global interpreter lock optional . The number of visits to the system's open source and free packet capture software Stack Overflow has dropped significantly, and Musk said it has been replaced by LLM
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/5265910/blog/8787095
Recommended