[Deep Learning Compiler Series] Deep Learning Compiler Embedded in AI Framework

header

Text @小P's 001314

foreword

Hello everyone, the topic of this sharing is deep learning compiler .

We mentioned in the AI ​​framework cutting-edge technology sharing that deep learning compilation technology that links flexible model expression and efficient model execution is a major direction of AI framework evolution. In this sharing, we will focus on the following three Aspects are introduced in detail:

  • Looking at deep learning compilers from the evolution of AI frameworks
  • Challenges Facing Deep Learning Compilation Technology
  • Our deep learning compilation solution for SenseParrots

Looking at deep learning compilers from the evolution of AI frameworks

The author believes that there are two core tasks of an AI framework, the first is to define a user-friendly way of expressing a model , and the second is to be able to execute the model efficiently .

From the first-generation AI frameworks caffe, Theano, Torch, to the second-generation frameworks TensorFlow, PyTorch, including the new AI frameworks TVM, PaddlePaddle , MindSpore, MegEngine, OneFlow that have emerged in recent years , most of the depth involved Learning framework technology serves these two tasks. in,

  • Building a deep learning model based on Python, and implementing automatic derivation within the framework, dynamic graph mechanism and other technologies can be considered to serve the first task ;
  • Model representation and execution based on computational graphs, parallel computing and communication, quantization, mixed precision and other technologies can be considered to serve the second task .

Compilation is the process of turning a source program written in a source language into a target program, so the process of turning the user model expression into executable code in the AI ​​framework is itself a compilation process, and the AI ​​framework can be regarded as a deep learning compiler + execution A complex of engine + other components.

Based on this theory, with the design of TensorFlow's XLA, PyTorch's TorchScript, and two-layer IR (Intermediate Representation) in TVM, the deep learning compiler has been deeply embedded in the AI ​​framework design.

Challenges Facing Deep Learning Compilation Technology

We believe that the challenges faced by deep learning compilation technology mainly exist in the following aspects:

  • Challenges at the level of user model expression and parsing
  • Deep Learning Compiler System Design Challenges
  • Challenges at the High-Performance Code Generation Level

2.1 Challenges at the level of user model expression and analysis

Existing AI frameworks are mainly divided into two schools in terms of front-end model expression. Some frameworks use a limited set of operators to build models, such as TensorFlow and PyTorch, and the other frameworks build models based on custom DSL (Domain Specified Language), such as TVM.

The two methods have their own advantages and disadvantages. Using a limited set of operators to build a model is more convenient to convert the model expression to the intermediate representation of the framework, but the expression ability is limited, especially in expressing the control flow, there will be some difficulties; using a custom DSL to build Models often have more complete expression capabilities, but it increases the difficulty of model expression analysis and subsequent optimization based on intermediate representations.

No matter which front-end expression is used, building a deep learning model based on Python has formed a consensus, and then will encounter common challenges. We summarize these challenges into four aspects:

1. Multiple data structures:

When expressing deep learning models, especially when adding custom operators, algorithm researchers often use the tensor data structure (Tensor in PyTorch) defined in the AI ​​framework as the basic data structure, and use Python native data structures (list, tuple) and map) as a complement. Therefore, in the model expression given by the user, multiple data structures are always mixed, and it is a challenge to express various data structure types in the compilation process.

Also, Python is a dynamic language and cannot predict data types ahead of time, which further complicates the above problem.

2. Various operations:

The second challenge is that of multiple computing operations.

In the model expression given by the user, three types of operations may be used:

  1. Basic operations in the form of Python operators, such as addition, subtraction, multiplication and division;
  2. Some APIs provided by third-party software, such as a series of API interfaces provided by PyTorch;
  3. Index operation.

In particular, the implementation of indexing operations often introduces the concept of view, that is, the output and input data share the same physical memory, and the input will also change when the output is processed. This memory sharing mechanism needs to be properly documented and handled during compilation.

3. Auxiliary statement processing:

When expressing deep learning models, users may also use some auxiliary statements for code security and ease of use. Therefore, in addition to the necessary computation and memory access statements, auxiliary statements such as decorators, assertions, and function calls also need to be processed during the user model expression parsing process. These statements often do not map directly into the IR of the deep learning compiler, but they need to be guaranteed not to have side effects.

4. Uncertain Behavior:

Finally, in addition to the above challenges, there is some non-deterministic behavior in the user model representation statement. We summarize here two uncertain behaviors:

  1. value dependentThe problem is that different input data will cause the shape of the output result or the direction of the control flow to change;
  2. Impure function phenomenon, refers to the function will produce different results even if the input data are exactly the same.

Deep Learning Compiler System Design Challenges

From a system design perspective, deep learning compiler design needs to consider the following challenges:

1. Compilation timing and overhead:

First, generating code through one-time compilation optimization, and then reusing these compiled codes is the main source of performance improvement for deep learning compilation technology. In model expression, a function may be called multiple times, and the input data may be different for each call. How to deal with the compilation overhead of calling functions multiple times is a challenge for deep learning compiler design.

On the other hand, with the continuous growth of PyTorch users, some frameworks based on static graphs, such as TensorFlow and PaddlePaddle , also use the dynamic graph mode as the default execution mode. How to choose the right time in the dynamic graph-based framework to complete the dynamic graph static Optimization and compilation optimization are also a key issue.

2. Compatible with the reverse process:

Almost all AI frameworks have automatic derivation mechanisms. The implementation of these automatic derivation mechanisms often requires that the operators in the framework have the realization of both forward and backward processes.

For user-defined operators, how to obtain the IR representation of the reverse process and compile and optimize it is another challenge.

Challenges at the High-Performance Code Generation Level

The performance of the generated code has always been a key metric for evaluating the pros and cons of deep learning compilers. From the initial template-based code generation method, to the optimization strategy of separating compute and schedule proposed in Halide, to auto-schedule in TVM, to today's Ansor , polyhedron optimization , etc., all are trying to improve deep learning compilers The performance of the generated code.

SenseParrots' deep learning compilation solution

In the self-developed AI framework SenseParrots, we built the self-developed deep learning compiler Elena, and redesigned a deep learning compilation scheme by co-designing it with the AI ​​framework execution engine. The program mainly consists of three parts:

  1. User model expression parsing module;
  2. Code generation module based on two-stage IR;
  3. AI framework execution engine.

Here, the user model expression parsing module is responsible for translating the user's Python-based model expression to the AI ​​framework's own IR. We will introduce how to solve problems such as multiple data structures, multiple operations, auxiliary statement processing, and uncertain behavior in the subsequent sharing.

The code generation module based on two-stage IR is responsible for generating high-performance executable code, and some code generation technology implementations will also be introduced in the subsequent series of sharing.

The AI ​​framework execution engine plays a connecting role here, combining the user model expression parsing module and the two-stage IR-based code generation module to collude with the entire deep learning compilation process.

Epilogue

Thank you for reading. Regarding issues related to deep learning compilation, you are welcome to leave us a message in the comment area and discuss together~

We look forward to having students with the same interests join us to explore and solve the problems and challenges of deep learning compilers!


Follow the public account "SenseParrots" to get the latest industry trends and technical thoughts on artificial intelligence frameworks:

qrcode

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/5682856/blog/5504742