MegEngine officially supports XLA!

XLA (Accelerated Linear Algebra) is a neural network compiler proposed by Google that can be used to accelerate the training and reasoning of AI models. MegEngine 1.13.1 also supports XLA. You can choose to turn on this feature when training a model. Different models can obtain speed improvements ranging from 10% to 80%.

Main target scenarios

MegEngine is now executed dynamically, that is, each mge.functional call in Python corresponds to a kernel execution on the underlying GPU. The advantage of this mode is that the actual execution method is consistent with the code logic, what you see is what you get, and it is very flexible; however, the problem is that it is difficult to optimize and the performance may not be optimal.

XLA adopts a static execution method and expresses the model calculation process into a static calculation graph, called "HLO" (High-Level Optimized). HLO contains information such as operations related to calculation graphs, data flow and shape of tensors. XLA will then perform a series of optimizations on HLO and ultimately generate a better calculation graph to complete calculations faster. The limitation of XLA is that it is not flexible enough and cannot express information such as Tensor Shape changes or control flow well.

Now MegEngine already supports XLA. We can use XLA to accelerate some relatively static scenes in model training, thereby shortening the time of the entire training process.

Usage methods and effects

When using MegEngine for training, you can enable XLA compilation optimization by adding the xla_trace/partial_trace decorator to the original training function.

When the entire model is completely static, we can use xla_trace to express the entire network into a static graph, and then hand it over to XLA for subsequent optimization and compilation. The subsequent execution process will execute this optimized calculation graph to increase the speed.

And if there is some dynamics in our model, such as some Tensor Shape changes during the training process, or there is control flow, we can use partial_trace to trace the static parts of the network into some subgraphs and hand them over to XLA for processing. Compilation optimization, while other parts of the network still maintain dynamic execution, while ensuring performance and flexibility.

The following shows the performance changes of mainstream neural network models before and after the XLA function is turned on in MegEngine. The blue color is the training speed before XLA is turned on, and the orange color is the training speed after XLA is turned on. After turning on XLA, the performance of most models can be improved by 10% to 40%, up to more than 80%.1.png

For more information about XLA and specific usage methods, please refer tohttps://www.megengine.org.cn/doc/stable/zh/user-guide/model -development/jit/xla.html.

Tang Xiaoou, founder of SenseTime, passed away at the age of 55 In 2023, PHP stagnated Wi-Fi 7 will be fully available in early 2024 Debut, 5 times faster than Wi-Fi 6 Hongmeng system is about to become independent, and many universities have set up “Hongmeng classes” Zhihui Jun’s startup company refinances , the amount exceeds 600 million yuan, and the pre-money valuation is 3.5 billion yuan Quark Browser PC version starts internal testing AI code assistant is popular, and programming language rankings are all There's nothing you can do Mate 60 Pro's 5G modem and radio frequency technology are far ahead MariaDB splits SkySQL and is established as an independent company Xiaomi responds to Yu Chengdong’s “keel pivot” plagiarism statement from Huawei
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/5265910/blog/10321024