350,000 lines of code, open depending on the depth of learning Tianyuan heavy open-source framework, the four characteristics of simple development

  

[REVIEW] 2020 March 25, as the technology of artificial intelligence enterprise Kuang held online conference, co-founder and CTO, as the Kuang Tang Wenbin announced its open source productivity platform Brain ++ AI core components - Tianyuan (MegEngine). This release is Alpha version, based on Apache License 2.0, the outside world to a total of about 35 million lines of open source code, including C ++, CUDA and Python code, publishing on GitHub.

Conference, senior technical director Kuang, as the Institute Tian Zhongbo Details of the deep learning framework, this just officially open source.

The world AI Adds a development framework, open view open source "Tianyuan"

Kuang depending on Tianyuan open source a time when the depth learning framework flourishing era.

 

Since 2007 Theano was born, after more than ten years of development, technology and application of deep learning by leaps and bounds, deep learning framework are also in the process of continuous iteration and evolution; on the other hand, the concept of open source is gaining in popularity on a global scale, this makes the artificial intelligence development environment dependent installation, deployment, testing, and continuous iterative improvement work accuracy and performance tuning easier, in the field of artificial intelligence, depth of learning open source framework has become inseparable from the platform and the developer tool.

 

Under the joint efforts of academia and industry, born out of academia from the early Caffe, Torch and Theano, is now leading the industry in TensorFlow, Amazon bet MXNet, Facebook effort to build PyTorch, Microsoft internal open source CNTK, and a relatively small minority of deep learning engine DSSTNE other deep learning framework.

 

Simple comb depth learning course these mainstream development framework, we will find that they each have their own characteristics:

        

Pictures authorized

       

TensorFlow officially open source by Google in November 2015, quickly became a deep learning areas occupy the absolute dominance of deep learning framework, many companies are developing products based on this framework, such as millet, Jingdong, Airbnb and so on. TensorFlow comprehensive development language and model training server, mobile device support, making it the largest industry using deep learning framework.

 

MXNet project was born in September 2015, was at Carnegie Mellon University CMU Du Bo Li Mu created this lightweight, portable, flexible distributed open-source framework for deep learning, deep learning framework Amazon became official the main push to support CNN, RNN, LTSM, provides excellent tools for identifying and forecasting as well as natural language processing images, handwritten text and voice.

 

Keras creator is Google AI researcher Francois Chollet, since open source in November 2015, has developed into the second most popular deep learning framework. The Python open source written by artificial neural network library can be used as high-order application program interface Tensorflow, CNTK and Theano, the design depth learning model, testing, assessment, and visualization applications, the goal is just a few lines of code can make you Construction of a neural network.

 

2016, Microsoft developed cognitive toolkit CNTK come out in support of RNN and CNN types of neural models as the best candidate for image processing, handwriting and speech recognition problems. Although CNTK high performance distributed computing, but the lack of support for the ARM architecture limits their functionality on mobile devices.

 

2017, Facebook open source Python package for the neural network training PyTorch , which is adapted from the depth of Lua-based learning library Torch, similar to Numpy, very Python based, it is easy and the rest of the Python ecosystem integration. With support for dynamic map, PyTorch greatly enhanced flexibility compared to TensorFlow, especially for quick verification algorithm and reproduce, so much favored by academia.

 

With these powerful development framework, AI developers will basically use it for scientific research or business landing. But in the field of artificial intelligence, we use more or Google, Facebook, Microsoft, Amazon's open source framework, although there are many domestic Internet giants are beginning to work in this area, but has not yet become a phenomenon.

 

By 2016, the Internet giant Baidu open source fly paddle (PaddlePaddle) , probably is the most influential AI framework; 2019, telecommunications giant Huawei announced that it will open in the first quarter of 2020 MindSpore , but there is still no further news; March 25, open depending on the research and development of deep learning framework Tianyuan (MegEngine) officially open.

               

Compared with mainstream deep learning framework, open view of MegEngine What are the characteristics of it?

Open source 350,000 lines of code, Tianyuan Technology Architecture novel idea

Tang Wenbin introduction, this exclusion depends on Tianyuan total revenue of about 35 million lines of code, including C ++, CUDA and Python code.

Kuang Tang Wenbin, as the co-founder and CTO

 

Tianyuan is a desert accompanied by practical experience depending on their AI industry framework is one of exclusion, as the Brain ++ core components. For this open source, open deemed Tianyuan had a complete upgrade.

 

From the beginning of 2014, research and development, in 2015 the full use, in March this year to open source, open view all current algorithms are based on this framework Tianyuan MegEngine training and reasoning. Not only does it add Buff as a desert Daguai upgrade depending on the AI ​​competition arena, but propped up the desert, as the engineering, product of half the sky.

 

The press conference, the head of Tianyuan project, as well as desert, as the senior technical director Tian Zhongbo Institute pointed out, Tianyuan is a set of integrated training reasoning, static and dynamic unity of industrial-grade deep learning framework.

             

From top to bottom, Tianyuan can be divided into five levels, the top is calculated interfacial layer , the outward connection Python and C ++ interface, the developer can use and programming, as well as the entire system by Python and C ++ frame two languages design and development, training and reasoning.

 

Next is a diagram showing layer , containing a functional static and dynamic graph of FIG.

 

Further down is a complete integrated core calculation engine , with automatic derivation mechanism, map compilation and optimization, with this level of support could play a dynamic, static and interfaces fully functional.

 

Under this hierarchy management run consists of two main parts, a calculation part schedule, the computing device may be abstract execution flow, reasonable scheduling these execution flow by the scheduler; the other is a set of memory management mechanisms, including static memory and dynamic memory management. In addition, this module also comes with many advanced optimization on memory, which is worth mentioning is that in which to achieve sub-linear static memory optimizer, making memory management efficiency increased dramatically.

 

The bottom is to support the entire system core compute kernel layer , wherein the operator comprises a high performance database that supports the common computing device, including X86, CUDA, ARM and professional computing chips. Meanwhile, the layer further comprises a high-performance communication library isomers, can be calculated so that the entire frame can be used on a large scale multi-node distributed, to support the larger train.

Four characteristics analysis of how simple Tianyuan Development

 

Depending Kuang Institute senior technical director Tian Zhongbo

In the past few years, depending on Kuang met a lot of common industry pain points in the development process, and the key feature of Tianyuan is the focus of these pain points.

 

It has, Tianyuan four core features: integrated training reasoning, movement unity, inclusive and flexible and efficient.

 

  1. Reasoning training integration

 

For example, one of the pain points, is deep learning from research to production process is very complex, the various stages of model accuracy is often difficult to align.

 

Tian Zhongbo pointed out that traditional deep learning development process, the training framework and reasoning frameworks are often designed and implemented training framework and reasoning framework is in two stages, when the algorithm design, the algorithm must first go through training support frame, into a model of training, and then convert it to also accept the new representation on a frame inference, then calculated on a different computing device from the frame inference.

 

Here there will be a training and reasoning of the conversion process, this process will produce a lot of problems, such as training framework and reasoning framework is designed separately, so some force operators may not be supported, making it impossible to complete the conversion automatically, you need to hand-optimize the conversion process may also introduce a large amount of redundant operator, resulting in a final model performance and accuracy is not satisfactory. When the last frame the reasoning put the issue calculated on the chip is exposed, but because the whole complex process, we are unable to accurately locate the problem.

 

Therefore, the design concept Tianyuan framework, it is hoped the training and reasoning one, that is, it can be trained, but also capable of reasoning.

              

To address this pain point, reasoning training Tianyuan integration can be solved well.

 

(1) It does not need to convert the model, the model can be used directly after training to get the reasoning;

(2) through this mechanism, to ensure the training of speed and accuracy consistent with the reasoning;

After completion (3) model training, the need for different devices on the inference using the accuracy of the frame can be secured across the device model to achieve alignment (to minimize accuracy difference);

(4) a simplified process, a built-in frame can Tianyuan automated model optimization process, reduce manual processing error model case, automatically can be directly built processes, simplifying the process, form a highly efficient system development.

 

As a result, the issue of online services and the deployment of multi-terminal AI real ground to consider it solved, greatly reducing the cost of training problems.

 

  1. Unity movement

 

Pain point two, static picture for deployment, dynamic graphics and easy to debug, but both difficult to have both. Tian Zhongbo introduced Road, deep learning framework is broadly divided into two categories, one category is represented by a static TensorFlow 1.0 deep learning framework, it is very easy to deploy, can quickly out of real estate products, the industry is now very fond of deployment, its high performance, small footprint, but difficult to debug. In academia, we prefer to PyTorch as the representative of the dynamic computing framework, because it is easier to debug in the research stage, more flexible, but also dynamic graphic defects, such as memory footprint serious, difficult to do optimization.

 

Faced with this fish and can not have both problems, open depending try to integrate the advantages of both frameworks, hoping to achieve the unity of the movement effect in the design of Tianyuan.

               

The illustration shows a frame where Tianyuan code from a dynamic to a static switch. Can be seen, some of which function to decorate a @trace by using the Python decorator implements this function includes both dynamic run correctly, it can also be converted to a state of static morphology operation. Simply "Enabled" switch is set to True or False, the user can freely choose dynamic or static calculations.

 

In this way, developers can in a dynamic process, very easily prototype development and debugging, but when you want to use in production, or hoping to better the static optimizer, static compilation mechanism speed, can make use of static diagrams speed.

 

Tian Zhongbo said that in tests, static speed can often reach 5% to 20% of the acceleration effect , save time and improve efficiency.

 

  1. Inclusive

 

Third sore point is there is a lot frame, but each frame using the interface is different, which led to the people carrying out academic exchanges, we must first understand what it is to achieve a framework in use also need common environment and framework and then re-model implementation, which for the average developer is concerned is a costly affair.

 

Therefore, in order to simplify the problem, Tianyuan hope it was also designed with an inclusive system .

               

The picture above shows the use of Tianyuan deep learning framework code, which is written with style and PyTorch Numpy very similar, Pythonic Python style of simplified API allows the user to naturally accept, so named style in design details and parameters of the function respect for tradition original Python community.

 

It is worth mentioning that the Tianyuan also provide an experimental feature, so that developers can conveniently be ever written modules, such as the PyTorch Module directly into the framework, Tianyuan and other components used together to better model reproduce.

 

In addition, Tian Zhongbo mentioned, Kuang has some unique visual accumulated in the field of computer vision, and therefore the integration of its achievements in this area to Tianyuan system , integrates a number of computer vision specially optimized operator, let the computer vision research easier.

 

  1. Flexible and efficient

 

Pain points Four, into the AI ​​for a production company, may face a lot of equipment and scenarios, the need to achieve the ultimate in performance on each device.

 

In the framework design, Tianyuan uphold the principle of flexible and efficient to be on many devices, algorithms, can get leading performance. Next, the training performance comparison Tian Zhongbo FIG release, compared with a plurality of horizontal frame good reasoning.

             

The results show that in the CPU reasoning scene, Tianyuan has significantly improved performance and advantages in training, which can at the same time maintaining high performance in training and the reasoning process. In addition, if we want a better algorithm deployed in a variety of devices, or in training to take advantage of larger existing equipment training model to support more types of algorithms, on-chip memory device or memory usage is a key the elements of. So, to save memory also Tianyuan concern.

 

Tianyuan built a highly effective memory optimization strategy that can significantly reduce memory usage during training, implemented on the same device can be trained more models, more support algorithms .

 

In addition, there are optimization mechanism Tianyuan lot of memory and speed, such as linear alkylene memory optimization. It can be found in the use of FIG Tianyuan dynamic capability, can support computing about 32 Batch; if the change to the static picture, can support 64 Batch is calculated. So, if you want in this case, bigger Batch and model training, it can here with sub-linear automatic memory optimization technology in computing speed is hardly reduced the premise, to achieve 256 Batch training capacity, and the model large, the deeper, the better its results.

 

Tian Zhongbo said that in internal evaluation, Tianyuan can achieve memory save more than 20 times when some large model training, and the speed is almost unchanged.

 

These characteristics make Tianyuan ability to achieve hour class conversion products from laboratory prototype to the industry that can be deployed, as well as large-scale, flexible training and support top research team at the forefront of academic development.

 

In this way, Tianyuan can do "simple development", allowing developers to truly experience the "training well", "training move out", "training faster."

Secret Tianyuan "Past and Present", R & D roadmap for the first time exposure

From Theano as the source iteration to continue to MegEngine Alpha version released today, the birth of Tianyuan hard-won, behind the desert, as the team from the Research Institute of the grinding process of 0-1.

 

Depending Kuang mind is to set up the computer vision used in traditional industries, using technology to change the world. When a 2013 study in depth was emerging, Tsinghua University dormitory obsessed with an intern for two weeks, developed a set of face recognition detection algorithm, algorithm performance sensational, so depending on the official took Kuang solve all the neural network road problems.

At first, open depending on the model code written Theano framework, training the neural network, but with more training larger the network, more and more complex, inefficient and time-consuming it is the collapse of the framework, in the company of some large cattle began pondering other ways .

 

By the end of 2013, depending on the time of Kuang Cao Zhimin, head of research and development proposed to create a set of data can get through training and business automation algorithm developed systems Cycle ++, does not require too much manpower and time investment can be achieved from algorithm development to self-circulation system applications ( Kuang depending Brain ++ earlier envisaged). Then, in early 2014, the first edition of the desert, as the self-development of deep learning framework was born.

 

After run-in, open standards, as the completion of the internal self-development framework with all operations in mid-2015, the company's business model changed all the lines from the research framework trained version.

 

November 9, 2015, Google officially released and open source TensorFlow, depending Kuang found that they are the same thing, to do the way computing framework graph are based, but it also caused a great impact to the desert, as the self-development framework, and within the company whether we want to continue to adhere to a disagreement on self-development framework. After an intense discussion and a detailed evaluation, Kuang found that depending on the performance of TensorFlow was not ideal, self-development framework slower than we several times. Kuang eventually chose to adhere to self-development, as the road.

 

Since then, through constant iteration, while in industrial practice exercises, in addition to the bottom of the frame, open at the same time also depending on changes in data and calculation power infrastructure. 2013, Kuang depending Research Institute set up their own data team, with the explosion of business data, data management issues continue to emerge, depending on the desert began to build their own data management system MegData.

 

By the end of 2015, Tianyuan MegEngine has entered a period of steady development, but the company "small workshop" mode start could not carry business needs, computing resources has become a bottleneck, so the desert, as the construction of a "decent room", developed a deep learning cloud computing platform MegCompute, and only a quarter of the time to complete a thorough business to migrate from stand-alone cluster.

 

Kuang view from R & D to move to a comprehensive business framework and self-depth study of its own compute clusters, marking Kuang-view data, algorithms and calculate force three core components of the formal completion of "Unity", since, as the Kuang AI productivity platform Brain ++ beginning to appear.

              

2016, open view began to build larger teams continue to optimize the entire suite Brain ++ development process, and in 2019 began preparations for the Brain ++ core framework depth study of open source, and is MegEngine played a Chinese name - Tianyuan. During this period the framework of R & D team can be said to be experiencing a rebirth, the original code into re-packaged reorganization, allowing developers to get started quickly.

 

After a year of preparation, Tianyuan is finally scheduled open source, enabling developers. Future, Tianyuan there are more plans, the conference site Kuang depending on the first exposure Tianyuan development roadmap.

              

Tian Zhongbo said that this is the desert, as the open-source Tianyuan Alpha version , future development plans this year released the Beta version in June , when Tianyuan will provide ARM series CPU support, more acceleration device support, as well as quantify and low bit computing support; to September released the official version 1.0 , the Tianyuan support of mainstream computing devices will be more comprehensive and dynamic ability to upgrade and optimize the whole process of reasoning training experience.

 

He said that in between Beta version and the official version, I hope more people can join in and contribute code, "maybe the next generation of Tianyuan is not made out by the desert, as the R & D team, but to create out of it with you and Beta the official version, so we hope that together with you to build a better depth learning framework. "

Tianyuan good to use it? how to use?

Learn Wan Tianyuan architecture, technical details and twists and turns of R & D background and R & D panorama, enter below the "soul question" link: the exclusion depends on the depth of learning open source framework in the end it really easy to use? Why I'm already familiar with NumPy, TensorFlow, PyTorch, Keras or other frameworks Tianyuan turn to learn? This learning process difficult?

 

In this regard, Tian Zhongbo dispel all doubts, he said, the entire frame interface design and usage, Tianyuan respect for the past, we in the use of computing in the traditional PyTorch machine learning and mathematics habits, in the framework of the overall design and process improvement minimize resistance, so that the more easy to use.

 

It is worth noting that the content of this release already contains a number of tools, such as out-of-the-line depth learning tool MegStudio, it enables developers to easily and quickly experience Tianyuan framework, in-depth learning and training.

 

MegStudio operation and demonstration

 

While the surrounding compression and deployment tools support modules of quantitative tools continues finishing, and everyone is expected to meet in mid-year, a visualization tool systems integration and visualization system will be more later.

 

In the open source document maintenance, Tian Zhongbo represent the basis of ability manual and code are simultaneously developed, open depending on the internal processes to ensure that the document will maintain and document quality assurance, hope to have more volunteers to join together to maintain the correction.

 

At the same time, Tianyuan also provide a model center ModelHub , pre-training model brings together leading algorithms, and publish the desert, as the Institute of the latest technologies and developments to the platform. Depending Kuang said that more SOTA model is increasing.

 

From scratch, from the "Give a man a fish" to "Give a man to fish" , open view full sincerity, by being open Brain ++, AI try to create a set of Visual Studio, the AI capabilities to more developers, in "alchemy" in the course of the study algorithm, providing the perfect set of equipment "alchemy room", as the alchemy of raw materials and firewood, it would need a user-demand self-created.

 

At the press conference, he announced the desert, as the address Tianyuan managed code on GitHub, and want to learn how better to experience directly try it!

 

GitHub : https://github.com/MegEngine/MegEngine

3 minutes depending on the depth of exclusion read learning framework Tianyuan MegEngine

 

I want to try to get started as soon as possible?

You may enter Tianyuan official website experience

https://megengine.org.cn/

 

             

Add CSDN Little Helper

Keywords reply, "conference"

Into the desert, as the exchange group conference

Acquiring dry material into the group

Click to read the original , enter Tianyuan official website experience.

Released 1874 original articles · won praise 40000 + · Views 17,060,000 +

Guess you like

Origin blog.csdn.net/csdnnews/article/details/105108965