Wondering how MLGO works? After reading this article, you will understand

In today's software development world, code optimization is critical to improving performance and reducing code size. In this regard, inlining is considered one of the key optimization techniques. MLGO, on the other hand, can intelligently make inline/non-inline decisions during compilation, providing more efficient and compact code. This article will take a deep dive into how MLGO works, giving you a complete understanding of this exciting technology.

3f6805ef78ab291204b0403ebcf5a7ad.jpeg

Inlining is an optimization technique aimed at reducing code size by removing redundant code. When there are a large number of functions calling each other in the code, a structure called a call graph (Call graph) is formed. In the inlining phase, the compiler traverses the entire call graph and decides whether to inline certain caller-callee pairs according to certain decision rules. This is a continuous decision process, as previous inlining decisions change the call graph, which affects subsequent decisions and the final result. By inlining caller-callee pairs, the compiler can generate more concise code, thereby reducing the size of the code.

However, in traditional heuristic methods, the inlining/non-inlining decision is usually based on some empirical rules or heuristic rules. This approach has become increasingly difficult to improve over time, especially when faced with complex code structures and large-scale software packages. In order to solve this problem, MLGO (Machine Learning Guided Optimization) introduces machine learning models to replace traditional heuristic methods. MLGO trains the decision network with reinforcement learning using policy gradient and evolutionary strategy algorithms to provide inline/non-inline decision suggestions.

f57aeefdf57bdf80696df7fd6f7b3318.jpeg

In MLGO, the compiler asks the neural network model whether to inline a particular caller-callee pair by extracting relevant features as input during the traversal of the call graph. According to the model's suggestion, the compiler makes inline/non-inline decisions in turn until the entire call graph has been traversed. Such an iterative decision-making process will collect logs of states, actions, and rewards, and continuously improve the policy and update the model in the form of online reinforcement learning (Online Reinforcement Learning).

During training, the compiler uses the trained policy to make inline/non-inline decisions and keeps a log of the sequential decision process. These logs are then passed to the trainer for updating the neural network model. This training process is iterated until a satisfactory model is obtained.

d236ada0c068448c83034f258359ee32.jpeg

After training is complete, the trained policy is embedded into the compiler to provide inline/non-inline decisions during the actual compilation process. Unlike the training phase, this policy no longer generates logs. By embedding the TensorFlow model into XLA AOT (Ahead-of-Time Compilation), the model can be converted into executable code, avoiding TensorFlow runtime dependencies and additional time and memory overhead.

Experiments on large in-house software packages show that inline policies trained with MLGO can be generalized for use in the compilation process of other software and reduce time and memory overhead by 3% to 7%. This generality applies not only to comparisons between different software, but also across time in the continued development of software and compilers. After three months of evaluation, the model showed only slight degradation in performance on the same software set.

With the continuous progress and development of machine learning technology, we can expect MLGO to further promote innovation in the field of code optimization in the future, bringing us more efficient and reliable software.

Guess you like

Origin blog.csdn.net/Fsafn/article/details/131679567