CNCC2020_陈天奇_TVM: An automated deep learning compiler

Chen Tianqi: Carnegie Mellon University TVM: An automated deep learning compiler

  1. When entering the pit, the dedicated chip only provides the instruction set, and the mapping algorithm is difficult;
  2. TVM is to automatically map artificial intelligence algorithms to various artificial intelligence hardware, from high-level representations to low-level operators automatically;
  3. One solution is to provide an operator library, which needs to be implemented for different platforms, which is very time-consuming;
  4. TVM: The programming is still high-level, similar to the way of writing functions, defining possible search spaces and searching for corresponding low-level operators; the search process itself can still be trained by artificial intelligence to predict the cost and choose low-cost Plan
    Insert picture description here
  5. TVM itself should be a complete deep learning compiler, including two layers of optimization, the first layer is high-order differentiable IR optimization, and the second layer is operator-level optimization;
  6. For optimization at the operator level, we must first define as large a search library as possible, and try to exhaust the optimization space that one can think of as much as possible;
  7. General-purpose processors, GPUs, and NPUs, the number of processing sources is getting larger and larger, from one number to one-dimensional to two-dimensional; in addition, it also includes the irregularities of registers and addressing, which brings great problems to programmers and compilers. Great difficulty
  8. The process of Zhang quantification requires a unified document to describe which operations the NPU supports, and the program itself should be written with higher-level operations; (low-level to high-level is more flexible, and it is relatively easy to go from high-level to bottom-level)
  9. TVM has its own website, and there are some applications in the industry for reference; in some cases, TVM can be more efficient than manual mapping operators or common frameworks; (commonly used frameworks do not take into account the situation of variants. KPI is different)
    Insert picture description here
  10. Few companies make their own NPU's instruction set public, so they rarely conduct research on NPU compilers. Therefore, open source a deep learning accelerator/compiler, which brings great benefits to researchers and develops software for NPU companies. Also provide reference;
  11. The hardware can be described, and the instruction set of the NPU can be simulated with a compiler, and how fast the hardware can run can be evaluated to realize the software and hardware co-design;

question Time:

  1. More professional, don’t understand well;
  2. At present, it should not be fully automated, but gradually advance from manual to automated;

Guess you like

Origin blog.csdn.net/weixin_41754258/article/details/112095578