[2023 · CANN Training Camp Season 1] Quick Tuning Guide for MindSpore Model Chapter 2—MindSpore Debugging and Tuning

1. Ecological migration

image.png

Examples of using ecological migration tools

image.png

Ecological Migration Tool Technology Solution


The front-end expressions of model definitions vary greatly between different frameworks (the API technical difficulties, operator functions, and model construction methods of the same operator are quite different);
for the same framework, regardless of the differences in front-end expressions, the final corresponding calculation
graphs are similar. Therefore, it is proposed: a model-based migration scheme
[the transfer of the external link image failed, the source site may have an anti-leeching mechanism, it is recommended to save the image and upload it directly (img-jZ1Xkxst-1685069610205) (https://bbs-img.huaweicloud.com /blogs/img/20230526/1685068836900161981.png)]
image.png

onnx introduction:

image.png

Ecological Migration Tool Migration Case Sharing

image.png
Tutorial steps:

  1. ONNX model export;
  2. ONNX model validation;
  3. MindConverter performs model script and weight migration;
  4. MindSpore model validation;

2. Model accuracy tuning

MindSpore debugger is a debugging tool provided for graph mode training, which can be used to view and analyze the intermediate results of computing graph nodes.
Operation process:
• Start MindInsight in debugger mode and wait for the training connection;
mindinsight start --port {PORT} --enable-debugger True --debugger-port {DEBUGGER_PORT}
• Configure relevant environment variables and run the training script;
export ENABLE_MS_DEBUGGER=1 export MS_DEBUGGER_PORT={DEBUGGER_PORT}
• After the training connection is successful, set monitoring points on the MindInsight debugger interface;
• Analyze the training execution status on the MindInsight debugger interface.
During the training process of MindSpore graph mode, users cannot obtain the results of intermediate nodes in the calculation graph from the Python layer, which makes training and debugging difficult. Using the MindSpore debugger, users can:
• Combine calculation graphs on the MindInsight debugger interface to view output results of graph nodes;
• Set monitoring points to monitor training exceptions (such as checking tensor overflow), and track the cause of errors when exceptions occur;
• View changes in parameters such as weights.

• Use the debugger function to check the training site
– configure the "check weight change is too small" monitoring point to check whether the weight change is too small;
– configure the "check unchanged weight" monitoring point to check whether the weight is not updated
; Gradient disappearance” monitoring point to locate abnormal gradients;
– Configure the “Check tensor overflow” monitoring point to locate the location where NAN/INF occurs;
– Configure the “Check excessive tensor” monitoring point to locate operators with large values;
– Configure monitoring points for "checking for excessive weight change", "checking for gradient disappearance", and "checking for excessive gradient" to locate abnormal weights or gradients;

3. Model performance tuning

Performance Tuning Tool Profiler Introduction

Profiler provides MindSpore with performance tuning capabilities. It provides easy-to-use and rich debugging functions in terms of operator performance, iteration performance, and data processing performance, helping users quickly locate performance bottlenecks and improve network performance.
The capabilities provided by the Profiler tool for users can be divided into two parts:  MindSpore provides users with the startup interface and data analysis interface of
performance data collection in the training script , and finally generates performance data files;  MindInsight provides users with a visual interface, which will Display performance data and statistical analysis results in multiple dimensions.




image.png

Instructions:

  1. Initialize Profiler at the beginning of training and start performance collection;
  2. Use the analyze method to analyze performance data after training.

Note:
 Currently Profiler already supports GPU scenarios, and the usage method under GPU
is the same as that of Ascend;
 Only the output_path parameter is valid when GPU initializes Profiler;

  1. Start the MindInsight visual interface:
    mindinsight start --port 9001 --summary-base-dir ./
  2. Access the MindInsight interactive interface through a browser:
    <your server ip address>:9001/
  3. Stop the MindInsight visualization service:
    mindinsight stop --port 9001

Summary of this chapter

• Ability and basic use of ecological migration tools:
mindconverter --help
• Ability and basic use of precision debugger;
mindinsight start --port {PORT} --enable-debugger True --debugger-port {DEBUGGER_PORT}
• Ability and basic use of performance debugger;
训练脚本中添加:profiler = Profiler(), …, profiler.analyse()
• Basic concepts of ONNX model:
– Graph, Node, Value Info, Initializer;

Guess you like

Origin blog.csdn.net/qq_45257495/article/details/130882367