Chen Tianqi: Carnegie Mellon University TVM: An automated deep learning compiler
- When entering the pit, the dedicated chip only provides the instruction set, and the mapping algorithm is difficult;
- TVM is to automatically map artificial intelligence algorithms to various artificial intelligence hardware, from high-level representations to low-level operators automatically;
- One solution is to provide an operator library, which needs to be implemented for different platforms, which is very time-consuming;
- TVM: The programming is still high-level, similar to the way of writing functions, defining possible search spaces and searching for corresponding low-level operators; the search process itself can still be trained by artificial intelligence to predict the cost and choose low-cost Plan
- TVM itself should be a complete deep learning compiler, including two layers of optimization, the first layer is high-order differentiable IR optimization, and the second layer is operator-level optimization;
- For optimization at the operator level, we must first define as large a search library as possible, and try to exhaust the optimization space that one can think of as much as possible;
- General-purpose processors, GPUs, and NPUs, the number of processing sources is getting larger and larger, from one number to one-dimensional to two-dimensional; in addition, it also includes the irregularities of registers and addressing, which brings great problems to programmers and compilers. Great difficulty
- The process of Zhang quantification requires a unified document to describe which operations the NPU supports, and the program itself should be written with higher-level operations; (low-level to high-level is more flexible, and it is relatively easy to go from high-level to bottom-level)
- TVM has its own website, and there are some applications in the industry for reference; in some cases, TVM can be more efficient than manual mapping operators or common frameworks; (commonly used frameworks do not take into account the situation of variants. KPI is different)
- Few companies make their own NPU's instruction set public, so they rarely conduct research on NPU compilers. Therefore, open source a deep learning accelerator/compiler, which brings great benefits to researchers and develops software for NPU companies. Also provide reference;
- The hardware can be described, and the instruction set of the NPU can be simulated with a compiler, and how fast the hardware can run can be evaluated to realize the software and hardware co-design;
question Time:
- More professional, don’t understand well;
- At present, it should not be fully automated, but gradually advance from manual to automated;