How to enjoy a deep learning framework?

Made by: Yuan Jinhui

https://zhuanlan.zhihu.com/p/117269565

This article authorized by the original author and may not be reproduced without secondary

In March, China had more depth learning framework open source, OneFlow is also making final preparations for the open source framework for 2020 is the depth of field of study is very busy year. A framework of good and bad, there are many dimensions to look at, and user focus and framework developers may not be the same.

For most users, the framework, not the depth into the internal framework to achieve, but the experience is the first, and case documents are complete, whether in the README can be very easy to complete the first experience.

The framework for developers, is another kind of mentality. If he is the framework of research and development, we see a new framework, above all to see, their own unique offerings Cheats in this new framework is how to solve, whether achieved, whether realized clever; and then immediately go to the other side there is no detached exercises for their own learning.

Of course, the success of the framework, the look at from the user perspective is definitely the most important, when the framework developer to make a choice, the user needs should be higher than technological aesthetic. In this article, I focus on the perspective from the framework developers to share some of my personal points of interest in the technology. First discuss how to assess a framework as a whole, and then points to specific topics in depth the details of the internal frame.

Depth learning framework to today, some features have become a regular demand, such as ease of use, efficient, complete (operator, model, supporting tool chain, documents, case), a new framework in these areas should no apparent short board. After the emergence of a framework for the pursuit of success, not only short board is not enough, there must be a long board, detached exercises, have to go beyond the framework of other places, or other frameworks simply impossible, or very difficult to do, the only way is it possible to start with market segment made a small incision, and then stand script.

The innovation is a question I first concern. Framework developers themselves as technology geeks, this innovation is extremely valued, if there is no breakthrough ideas, just press to re-create the old wheels, it is not caused by technical experts of interest, which is of great concern to small peer one problem: this new framework to frame what products to bring incremental innovation is part of. At the same time, innovation is also crucial to the success of the framework. Caffe and Theano is the originator of the frame, respectively, we have contributed some of the most original ideas. TensorFlow and MXNet have a higher quality of the project (technical grade) to calculate chart abstraction to a new height, both with respect to Caffe mainly in terms of the introduction of computing chart abstraction, as opposed to Theano, mainly high-quality C ++ realization brings efficiency and scalability. PyTorch the successful introduction of the model Eager to perform, but also to achieve a seamless Python's (strictly speaking Chainer is the first implementation of the framework of this idea was carried forward PyTorch). Domestic depth learning framework to Deus Ex, the idea of ​​innovation is definitely a big primary factor. Micro Innovation is not sufficient to change the pattern, but also easy to learn and plagiarism.

Quality of the project is the second question I am concerned. Engineering quality tests, will not go very far. Sometimes to see some evaluation of the project, "student work" or "industrial codes", which is a certain unfounded. You can examine from multiple levels, from big to small, such as design, architecture, block, pattern, abstract, until a microscopic algorithm, pay attention to the right, in particular, be careful not to over engineering, by one-half are too fat, less one sub too thin; may be the code from the aesthetic point of view, such as the use google style, clang-format, cpplint other tool; and an angle inspection, if this code is the basis for the development of mass collaboration, the code amount Although large, high-quality code is still very easy to understand, and whether the code base specification introduces some constraints, such as defensive programming, you can prevent some low-level errors. Code implementation is reflected programmers thought, write beautiful code is on the premise that there must be a brain and deep thinking nature of the problem clear.

Some specific techniques and deep learning framework related to the implementation is the third issue I am concerned. There are problems include:

(1) deep learning framework uses a data flow abstract (Eager mode is a control flow), I will see how this framework is to support the realization of how abstract, operator, chart, flow, etc., how to define operator, how Kernel multi-data type, multi-device, calculated is how to achieve, etc. FIG.

(2) calculation chart executed by the underlying engine, the general principle is that a given computation graph, according to data dependent topological scan, which is the most basic implementation, if the underlying device is a CPU, then the execution engine is the thread pool, needs attention Dispatcher and how to interact with the thread pool, if the underlying device is a GPU, and the underlying device Dispatcher how they interact, because GPU own characteristics, the use of the stream, event is the key.

(3) how memory is managed, static view of the engine can do a lot of memory management technology, the basic model is a static view of reasoning, therefore Inference framework for static memory management experience is the deepest, it does not require a separate memory allocation for each blob It can be assigned for the entire chart a memory; memory management problems dynamic figure how to do garbage collection, lifecycle management, etc. is very complex; sub-linear memory allocation, as well as Microsoft's Zero also belong to the list of memory optimization technology.

(4) the interface usability issues, and support for static operation mode dynamic map drawing, in fact, the main problem involves the interaction of c ++ and python.

(5) a single device code compiler optimization problems, mainly related to device-independent view of optimization, both dedicated to the industry to do the work compilers, such as Glow, XLA, TVM Relay, deep learning framework also includes a number of such optimization, such as Paddle, MegEngine.

(6) a device-dependent code optimization solutions, i.e. code generation, TVM, Jittor, PlaidML belong to this column.

(7) Solution to the distributed parallel, such as parallel data, the parallel model, parallel pipeline support; even a simple look at the data in parallel, there are a range of ways, such as PS, Ring allreduce, double tree allreduce, the industry's implementation contains NCCL byte beating BytePS, such as Microsoft's DeepSpeed.

(8) Other be supplemented.

If both large and innovative ideas, but also the above aspects are doing a great job, it is very close to the technically perfect.

Published 482 original articles · won praise 789 · Views 1.71 million +

Guess you like

Origin blog.csdn.net/weixin_42137700/article/details/105251232