1. Ecological migration
Examples of using ecological migration tools
Ecological Migration Tool Technology Solution
The front-end expressions of model definitions vary greatly between different frameworks (the API technical difficulties, operator functions, and model construction methods of the same operator are quite different);
for the same framework, regardless of the differences in front-end expressions, the final corresponding calculation
graphs are similar. Therefore, it is proposed: a model-based migration scheme
[the transfer of the external link image failed, the source site may have an anti-leeching mechanism, it is recommended to save the image and upload it directly (img-jZ1Xkxst-1685069610205) (https://bbs-img.huaweicloud.com /blogs/img/20230526/1685068836900161981.png)]
onnx introduction:
Ecological Migration Tool Migration Case Sharing
Tutorial steps:
- ONNX model export;
- ONNX model validation;
- MindConverter performs model script and weight migration;
- MindSpore model validation;
2. Model accuracy tuning
MindSpore debugger is a debugging tool provided for graph mode training, which can be used to view and analyze the intermediate results of computing graph nodes.
Operation process:
• Start MindInsight in debugger mode and wait for the training connection;
mindinsight start --port {PORT} --enable-debugger True --debugger-port {DEBUGGER_PORT}
• Configure relevant environment variables and run the training script;
export ENABLE_MS_DEBUGGER=1 export MS_DEBUGGER_PORT={DEBUGGER_PORT}
• After the training connection is successful, set monitoring points on the MindInsight debugger interface;
• Analyze the training execution status on the MindInsight debugger interface.
During the training process of MindSpore graph mode, users cannot obtain the results of intermediate nodes in the calculation graph from the Python layer, which makes training and debugging difficult. Using the MindSpore debugger, users can:
• Combine calculation graphs on the MindInsight debugger interface to view output results of graph nodes;
• Set monitoring points to monitor training exceptions (such as checking tensor overflow), and track the cause of errors when exceptions occur;
• View changes in parameters such as weights.
• Use the debugger function to check the training site
– configure the "check weight change is too small" monitoring point to check whether the weight change is too small;
– configure the "check unchanged weight" monitoring point to check whether the weight is not updated
; Gradient disappearance” monitoring point to locate abnormal gradients;
– Configure the “Check tensor overflow” monitoring point to locate the location where NAN/INF occurs;
– Configure the “Check excessive tensor” monitoring point to locate operators with large values;
– Configure monitoring points for "checking for excessive weight change", "checking for gradient disappearance", and "checking for excessive gradient" to locate abnormal weights or gradients;
3. Model performance tuning
Performance Tuning Tool Profiler Introduction
Profiler provides MindSpore with performance tuning capabilities. It provides easy-to-use and rich debugging functions in terms of operator performance, iteration performance, and data processing performance, helping users quickly locate performance bottlenecks and improve network performance.
The capabilities provided by the Profiler tool for users can be divided into two parts: MindSpore provides users with the startup interface and data analysis interface of
performance data collection in the training script , and finally generates performance data files; MindInsight provides users with a visual interface, which will Display performance data and statistical analysis results in multiple dimensions.
Instructions:
- Initialize Profiler at the beginning of training and start performance collection;
- Use the analyze method to analyze performance data after training.
Note:
Currently Profiler already supports GPU scenarios, and the usage method under GPU
is the same as that of Ascend;
Only the output_path parameter is valid when GPU initializes Profiler;
- Start the MindInsight visual interface:
mindinsight start --port 9001 --summary-base-dir ./
- Access the MindInsight interactive interface through a browser:
<your server ip address>:9001/
- Stop the MindInsight visualization service:
mindinsight stop --port 9001
Summary of this chapter
• Ability and basic use of ecological migration tools:
mindconverter --help
• Ability and basic use of precision debugger;
mindinsight start --port {PORT} --enable-debugger True --debugger-port {DEBUGGER_PORT}
• Ability and basic use of performance debugger;
训练脚本中添加:profiler = Profiler(), …, profiler.analyse()
• Basic concepts of ONNX model:
– Graph, Node, Value Info, Initializer;