Exclusive Interview with Yang Qing, vice president of Jia Ali Group: Why should I choose to join Alibaba?

80 really clear after close contact with Jayantha I found that a lot of people who are called "AI architecture Great God," the young scientists, more like a gentle and approachable neighborhood "school bully", although the overall skill but did not rolling shelf. Since joining Ali, Jayant clear knowledge of the group has been busy covering a very wide range of products and services, nearly two months began to debut a new identity in a number of important activities. He admits in the interview, Ali is very large, very multi-directional, just a few months still not fully understood completely. Although a number of media have reported Jayant clear base to Ali Institute of Silicon Valley, but due to the time after the team basically at home, Jia Yang Qing join Ali in Hangzhou to be more when leaving The new identity brought him a lot of challenges, too busy to Barber, three times a month down the time difference, to participate in activities most of the day did not eat the rice. But as Vice President Ali Group at the same time, Jia Yang Qing is still a typical technical people, because of the excitement and speed up the rate of speech when talking to his most familiar AI technologies and platforms.

In this interview, Jayant clear to us that the reason he joined Ali, and he is currently Ali is responsible for the work done in detail, he not only recalled the changes over the last six years occurred in AI framework, also shared his Observations of the status quo for the AI ​​field for future development. Combined with his experience, Jia Yang Qing also gives some options and career development of recommendations for the direction of AI, AI practitioners, for there are many can learn from.

From Facebook to Ali Baba

Is known internally as "Ali computing platform Master" Jia Yang Qing is currently the direct leadership of Ali intelligent cloud computing platform division, and computing platforms division is also responsible for two big data and artificial intelligence platform, big data aspects include Flink, Spark and from Ali myself up MaxCompute big data platform, artificial intelligence platform includes the underlying resource management, an intermediate layer AI framework to develop a series of work.

Compared to the original Facebook is responsible in AI AI framework and platform-related work, Jayantha clear now that Ali broader scope of responsibilities. By Jayant clear own words, he had more in Facebook is a large user data, but the AI ​​training when needed to support large data platform; and computing platform division at Ali, he needs to support both artificial intelligence and big data Piece. In Jayant clear view, Ali intelligent cloud computing platform business unit is the world's handful of big data and artificial intelligence to put together the department, but he thinks big data and artificial intelligence itself is closely integrated, so these two put a division made a lot of sense.

Over the past few years to learn the depth of the rapid development, thanks largely to data, today's artificial intelligence to some extent, in fact, can be said to be "smart data" that AI needs to be able to extract a lot of data out of our so-called model . Thus, large-scale artificial intelligence computing model can be summarized into two, the first is "smart computing", known as AI engineer training, and other iterative model, the other is a "data computing", is how to put large amounts of data irrigation to artificial intelligence and reasoning training link inside. "Data Computing" big data has always been good at things, to a combination of these two calculations when processing large amounts of data with high performance data links, now coupled with artificial intelligence algorithms, such as a high-performance computing series technology to the entire solution to make it. From this perspective, Jia Yang Qing think of artificial intelligence with big data is a natural fit together.

In addition to the range of different technologies, from Facebook to Ali Baba also gave clear Jayantha brings another new challenge, that is, changes in job roles. Early or researcher when Jayant clear need only focus on technology, and later was promoted to the Facebook director of architecture AI his transformation technology management, and now as president of computing platforms division, Jayant clear need to manage technology, products and services, the latter two for him is greater challenges, but also very interesting challenge. Jia Yang Qing, told reporters: " Like open source to commercial floor as technology eventually need to go through product and business of temper, which is to some extent why I came Ali addition, cloud native followed the direction of the entire IT industry trend of. in the cloud, there will certainly be a new round of technological evolution, this time for me is to accept the new challenge of a very good time. "

We will be in the end open source with Ali

In addition to the above-mentioned Title, Jayant clear now also supports a number of open-source work Alibaba, so he made a speech to revolve around open source is the main topic at the developer conference. For the developer community work, Jia Yang Qing has a very big enthusiasm, he hopes to better promote the development of domestic revenue after their own to join Ali.

In the interview, Jayant clear to us in detail Alibaba strategic priorities in the next few open-source level. Alibaba hopes to make more complete and comb in the following three aspects and to promote community cooperation:

  • The underlying operating system, there are flying Ali base operating system, but also a lot of applications such as Linux itself is an open source system, has just released some time ago Ali on his Alibaba Cloud Linux 2 OS, Ali will next consider how their own in this capacity in an optimal way contribute to the community. Other one is a native cloud go down to the top of this layer can be broadly called the operating system, to a certain extent, the comparison with related applications, such as AI platform, big data platform, in the case of the cloud can be considered as they are large-scale cloud native operating system base. Then Ali also looking to work with open source organizations such as the cloud on K8s native system, strengthen output Flink, Spark and other open source big data products.
  • Front-end development model, compared to the underlying system, front-end is more interested in design and interaction, ants gold dress AntDesign which is done on behalf of a very good project, very popular in the front area.
  • A direction of the tool layer Alibaba has always been very interested in, such as how to use open source projects and solutions to help open source developers do things more efficiently, which in turn includes a test deployment tools, source code management tools, project communication platform, and so on. This is a foreign community is relatively better, so the future how to better promote the establishment of domestic engineers and cultural exchange platform, focused on Alibaba revenue level of concern.

In Jayant clear view, when the application of open source projects, with the closeness of the community often determines the success or failure of the entire application . A situation occurs often in the past the industry, every company has a so-called changes to the open source community version, and finally the ability and capacity of the company's community development are difficult to fusion back. But today we have learned from past experiences learned a method for docking with the community better. Platforms Group to calculate a lot of effort put into the current operation of the real-time calculation engine Apache Flink project as an example, on the one hand, the company will be deeply involved in the development and optimization of some of them, on the other hand, the system architecture and the ability to put everyone on software ideas have contributed to the open source community, open source to avoid everyone within the company to get himself through a change this practice, thus promoting the development of open source projects better. For companies, they in turn can make more effective use of open source the latest results.

In addition to open-source strategy group level, for Apache Flink open source community, Jia Yang Qing also revealed some planning next.

Currently Flink for stream computing usage scenarios have been supported very well, the next Ali will still be from the user's needs, continue to optimize and improve Flink do. Meanwhile, Ali will continue to carry out such activities Flink Forward Assembly, through these activities and interactive developers, to convey to the next Roadmap Flink, etc.; There is also hope that through this way and docking community, to gain more ground on Flink how to go down system design, including some input on the Roadmap. Jia Yang Qing said that for a relatively healthy open source project, design their own systems with input user needs both of which are essential . So many open source projects, there will be a corresponding developer conference as with developers of interactive media, Flink will be no exception.

Present and future of AI technology

Caffe is "Before masterpiece"

AI architecture as the great God, a lot of people know Jayantha clear because he wrote the AI ​​framework Caffe, but that is already 6 years ago, things. When talking to Caffe this masterpiece, Jia Yang Qing deliberately stopped stressed that "before the masterpiece." Six years ago because no framework to meet the demand, so Jayant clear thesis to solve the problem encountered in the development of their own Caffe. Jia Yang Qing think, at that time still relatively era of slash and burn, we just some more research papers, software engineering framework for how to do, how fit are relatively early. But with the rise and development of deep learning, enter the industry more and more, and now as TensorFlow, PyTorch other mainstream AI framework in resolving images, natural language processing, speech problems and a series of modeling has been done than in the past well a lot of problems to be solved in the framework of the mainstream at this stage than six years ago Jayant clear problem when trying to solve their own R & D Caffe framework is also greater.

Jia Qing Yang will Caffe, Theano, Torch defined as the previous generation AI framework, these early framework with a strong academic mark, do some more research to try for direction. Now the second generation of AI frameworks such as TensorFlow, PyTorch has the conceptual framework of wider expansion, but do not just learning neural network modeling depth, more is how to design a common scientific computing engines, while exploring the compiler , the upper frame and the hardware and software co-design more complex modeling and other direction. "I said framework six years ago, today it seems, may just be the entire software stack inside a very narrow part of."

Caffe launched at the beginning, Jayant clear hope that it will become a "machine learning and deep learning in the field of Hadoop", now back to look at the target was set, and Jayant clear that some of them particularly interesting coincidence. Had set this goal is to be able to Caffe as widespread as Hadoop, I never thought both really experienced a similar situation in the roadmap. From the perspective of big data, today has evolved from Hadoop Big Data to Spark, Flink this more complex engines. Hadoop year major computing model is MapReduce, and now the Spark, Flink use a more sophisticated approach. Spark example using Directed Acyclic Graph (DAG) RDD relationship modeling, described RDD dependence, which is more flexible than the calculation model MapReduce. Caffe with the current framework of the new generation have compared to a similar situation, before Caffe design framework for neural networks and other very, its system design there is a concept called Network, and then there's the concept of a Layer inside, front to do and after the calculation, comparison curing, a bit like as MapReduce. Today's mainstream frameworks such as TensorFlow, PyTorch, Caffe2, etc., using a more general-purpose computing graph model. It can be said AI framework and big data framework experienced a similar development, there is a case from the first generation to second-generation forward evolution.

AI framework should focus on the challenges outside the repeat-create the wheel

Earlier Ali in an internal speech, Jia Qing Yang expressed such a view, he believes that "the homogenization of AI framework description of the technical challenges in other broader direction." For the reason to make such a judgment, Jia Yang-ching made a more detailed description of the InfoQ reporters.

Jia Qing Yang attributed the current use of AI framework for the most important two points, one simple and flexible support modeled on the framework, flexibility can also be called development; Another point is to achieve a more efficient computing, because once the AI algorithms applied to engineering, infrastructure efficiency will become very important, such as recommendation systems may run on tens of thousands or even hundreds of thousands of machines, performance optimization'll have to do. Most current framework are moving in the direction of these two efforts, including TensorFlow 2.0 added Eager Mode and PyTorch 1.0 version of the old Caffe PyTorch and merge, are gradually solve these two problems mentioned earlier, padded own short board. In fact, there are frameworks are learning from each other and learn to solve problems we have begun to gradually become clear and clear, everyone's design is also moving in the same direction. " This time, to some extent, to re-create a wheel in the end how much significance does it have? This is the question AI engineers need further consideration. "

Deeper we say, a few years ago, everyone will be a little bit when it comes to AI when AI is equivalent to such a situation AI framework, but today, the entire AI engineering solutions to do it, in fact, is part of the frame inside the thin. Jia Yang Qing think AI framework like computer programming language, such as C ++ is a more useful language, but the light reasons it was not enough, the frame truly reflect the value that it has a very good ecological, and there are a lot of science computing and external services. So from the beginning of the frame, up down there with a lot of new battlefield or more of the areas that need our attention.

Down on the system may include innovative, high-performance computing such as how to do, how to do hardware and software co-design; up to do so, the framework itself may not do a complete tool chain to work too much large-scale training model iteration and so on. So Ali is now the first concern is to embrace a framework, and the second is the AI ​​do it the whole link, such as an open source some time ago Ali MNN engine, it can make us better in the mobile phone side, the embedded end run the model. In addition, there is an open source project called Ali XDL, XDL is an idea how to build large-scale thinning recommendation system, sparse modeling is a lot missing one very common framework: the basic framework of the above, there is a need level of abstraction, or more relevant platform with business tools to solve this problem. Why is it useful for large-scale sparse systems? All systems are recommended because it is saying there is a relationship, for example, Ali Baba recommend how to do, and what news consumers in different top news sites of interest that are related to the sparse data, so this one has a light general-purpose computing framework can not be resolved, AI-end engineering development effort required over the entire stack.

Hardware and software co-design

In Jayant clear view, AI compiler is the next interesting and very important research direction . First of all, before the deep learning framework is often requested in writing to implement various operators, if there is a new hardware version comes out, these functions often require re-optimization; secondly, do optimization, these handwritten function in the end is not optimal, even experts, if you can exhaust all possible ways to find the optimal solution performance, are not necessarily. And like XLA, TVM such AI compiler that is, to solve these problems.

AI compiler belongs to the category of hardware and software co-design, designed to maximize chip capacity. The current new chip product after another, we had this model to a hardware design software on top of handwriting has begun to gradually keep up. Now more and more complicated, more complex structure, we do not know the handwriting of the design is not optimal, this time began to consider how to do AI Guided Compilation or Performance Guided Compilation, the hardware's capabilities with software flexibility better together. TVM to this project as an example, it may be time to run-time to design through calculation mode with hardware features or generate optimal code is running. These are the hardware and software co-design direction being explored, has become more meaningful than the direction of the frame, in terms of research or applications alike.

But the hardware and software co-design is very difficult to do, need to have extensive experience in the architecture of the human line, so engineers are few and culture is very difficult. If we can architecture modeled as an optimization problem, the machine learning can come in handy.

AI computer system architecture

Jeff Dean, who from the beginning of March 2018 launch SysML meeting, focusing on machine learning / depth study related hardware infrastructure and computer systems. So in the end what AI can bring new opportunities to the computer system architecture? Jayant clear thought from two perspectives.

Jeff Dean mentioned SysML when, in fact, mentioned such a concept is Machine Learning for Systems and Systems for Machine Learning. Today we do more of a System for MachineLearning, refers to the time when there is such a machine learning needs, and how we are to build a system to meet its needs. On the other hand, in the process of building a computer system, we can also consider the methods by how machine learning methods with data-driven and designed to optimize the system to solve the problem of the original system design relies on human experience, which is MachineLearning for System can solve things, but at the moment it is still in the relatively early stage of exploration, artificial intelligence is also one of the bottlenecks need to break next.

AI commercial floor

Industry landed in another direction Jayantha clear at this stage focus. Compared with when entering the field of AI, Jia Yang Qing think the biggest change is in AI applications for industry input and AI algorithms become more and more. The earliest at the beginning of 2000, when the industry we have a feeling: Machine Learning 80% of the time to solve problems with 80% accuracy rate of 80%. In fact, this means that it can not cross the threshold of the actual commercial. But now, deep learning achieved in different areas of a very successful outcome, many of the accuracy of the algorithm is elevated to above the threshold value may fall application, this way the industry has begun to use a large number of algorithms, in turn, promote the further development of algorithms , the scene needs more and more industry is fed back to the research, as well as the number of AI research and development has a very large population growth.

At the same time, artificial barriers to trade still exist in the actual landing. How do the versatility of artificial intelligence, the AI ​​+ industry can really push it, this appears to be another bottleneck currently facing in the field of artificial intelligence Jayant clear.

From a scientific point of view, the field of artificial intelligence, there are many problems to be solved, such as the nature of intelligence in the end is what we are now doing more or prediction problem, then how do causation and causal reasoning, how can do explanation of artificial intelligence, and so on.

How to choose the AI ​​research?

Last known to have a very hot issue on the peace, the main issue is the question which research special pit "of the current (2019) in machine learning have? What little direction practicality is poor or very difficult to do? Or what little direction is the only person in the circle can be made? "we put this question also thrown Jayantha clear, ask him to answer. However, National Tsing Hua Yang represent God to predict the future so do not fly, he had just started to engage in machine learning, when everyone thinks the neural network is certainly the game is, "So this thing who do not know but one thing is more positive: Do not write old framework, but rather to look at a new direction. "

Jia Yang Qing bluntly, if we go to Caffe write a framework on the boring, but if you want to see TensorFlow and PyTorch strengths and weaknesses, and write a programming language, or whether there are differences in point of view of the system with the advantages of their better the framework can still do. In short, it is: Do not Follow, and order some new ones . For example, Google recently had a project called the JAX very popular, first of all it can be very natural to combine with Python together, and capabilities of the underlying application compiler optimization to do, these are very interesting research on a new direction. Although Jia Yang-ching do not think JAX will immediately replace TensorFlow, do not think it necessarily solve all the problems, but it really is a good direction to explore, just like in 2008 Theano, the same as 2013's Caffe, is worth a visit New thing.

In addition, Jia Yang Qing said recently saw a bit more biased network parameter adjustment papers, he believed that by manually adjusting the performance parameters that enhance the value of this research is gradually declining. Researchers should to pay attention to whether there is a better methodology to achieve automatic network tuning. That " more to the manual research direction is the direction of lower-value, the more able to extract the common methodology and research on the use of large-scale systems may be more interesting. "

Talk about big data computing platform

The current depth of learning but it is still very dependent on large amounts of data, with the rapid development of the Internet and terminal equipment, data generated not only large but also very fast change, how fast the latest data input comes in, processed and generate more accurate algorithm model? This gives the AI ​​infrastructure, including large data computing platform, we put forward new requirements.

Earlier systems like Hadoop is calculated, are stored together, then the industry began to promote more separate computing and storage, easy to implement flexible scaling capacity. Said at the time calculated with separate storage, more memory in talking with the big data computing, and today we have a new computing, AI is trained to bring heterogeneous computing. Jayant clear the new system called corresponding storage, data computing and compute the three separate sciences: memory to store large amounts of data distributed mainly to solve the stability problem Throughput this series; data computing solutions how to do data preprocessing, data cleaning, data transformation of this kind of problem; how to use scientific computing solutions to quickly resolve hardware performance computing and computing models mathematical expression of a large number of issues. Such a system, how modular design, how the different modules together organically, this is a new challenge for large data systems proposed design.

For the future direction of force need to focus on big data computing platforms, Jayant clear that the main dock scene is how to further improve efficiency and optimize the user experience. He said that large data computing now has four main scene, the first one is the traditional batch computing, and the second is like stream computing Flink, and the third is how to do it in seconds or even milliseconds interactive query, the first four is how to get through large data link with AI, do the training and deployment of large-scale intelligent model.

Stream computing in recent years more and more attention, Flink has been the main stream batch unity, which, Jia Yang Qing also have their own views.

Batch and flow in the face of big data field scene was very different, in my opinion, batch and flow of these two scenarios is based on the underlying system design naturally occurring, some good batches of different engines, some good flow, Flink is second to none in the flow of the above, do batch flow unified in order to provide users with a more complete experience.

Now we often discussed and approved a uniform flow problem, in fact, because there are more practical background. We are more scenes flow calculation based, sometimes need to apply some of batch computing, but does not require high efficiency. This time if the engine completely change a whole, spending too much, this time the engine should consider how to padded short board, provides the application with an end to end experience is not required because of the different needs to move all the data in the calculation process again. In the batch stream unity, I think Flink will continue to strengthen its leading computing in the stream top position, while for batch computing, interactive query do filled, allowing users to stream at a main, relatively lower overall scene, It will be able to build their own solutions quickly.

Now we see more and more of stream computing, interactive computing these two scenarios occur. As for the future is still the main stream computing to calculate the batch-oriented, I personally feel that the next for a long time these two types of calculation will still exist, but in fact it is quite unique optimized for different scenarios, difficult to use a to solve all the problems.

Advice for AI practitioners

As AI scientist leader, Jia Yang Ching completed the transformation from engineers and researchers to technology managers in just a few years, at different stages of career development, the great God will face different challenges. In the interview last a short time, Yang Qing Jia shared a few small lessons learned from the experience of his own career development with us, we want to help.

Continuous learning, more and peers

AI technology is advancing rapidly, for developers, engineers, most headaches will likely continue to emerge every day a new technology, the new framework, a new algorithm models, an inattentive may own knowledge out of date, and it also enables selection of research has become increasingly difficult (how difficult it is, whether the new results, there is no one else done it would have to be considered).

Jia Qing Yang stressed that the field of AI engineers must learn to take the initiative to absorb new information and technological achievements, ongoing knowledge iteration. Like HackerNews, Reddit's Machine Learning plates etc. are all good sources of information. Now there are a lot of media forefront of new ideas in the field of promoting AI, new achievements and new research direction of propagation, which is a good thing.

The choice of research direction, Jayant clear enough to think two principles, one is interested in, and the other is a multi-peer exchange.

Allow themselves to be versatile technology

As previously mentioned point of view, it seems clear in Jayant, now AI framework is no longer the shackles of the AI ​​engineer, AI engineer job responsibilities will be some changes in the corresponding happen, we need to put their focus more on the future combined with practical application on the scene. The next biggest opportunity is how the AI ​​real ground for engineers, he needs only a simple transition from AI to have full-stack capability, including how IoT docking device, how the AI ​​capabilities and so do the cars go up. Engineers addition to strengthening their capacity other than AI, but also put their own technical capabilities of some cultures to be more versatile.

Jia Yang Qing frankly, to some extent, in fact, "AI engineer" This title is a bit abused, or because AI relatively more heat out of the role of birth.

"In fact, today just do not have the Java Engineer Java, because Java is just a tool, AI in fact, only a tool, more important is what do you get it, just like every engineer had to be programmed, or that each engineer had to be some of the basic engineering capabilities, AI is a basic engineering capabilities. "

For ordinary developer, no matter what those open areas, the future can learn a learn application of AI, does not necessarily need to learn to know how to build AI framework of this level, you only need to learn how to AI as Excel, Java tools such as to use, which is to enhance their ability to a more interesting direction. For specialize in AI engineers or researchers, the more still needs to dig deep in their field.

From front-line managers to develop: after learning to take a step back, the achievements of others

Jayant clear just go to management positions from front-line development in the first year, it's still hard at writing code, the code is not the first amount of output is the second, but his team in terms of support and training investment in the group stretched inside. For the development of this line may be a good performance, but for a manager it will not work. Because as a manager, he did not provide enough value to give growing team. A person's energy is limited, even if he was working hours from nine o'clock to 0:00, or even later, can not do both writing code and support team.

This year things happened great touch for Jayant clear that he later realized as managers really need to do is to help the rest of the team to maximize their abilities, rather than simply engage in technology development as to say, when a man like himself go forward on it. Managers should learn to take a step back, provide the necessary guidance and empowerment, trust others and create space for them, so that front-line students can get more exercise, change of mind was drilled from their own technical support team Scale technology.

Recommended activities

Continuous learning opportunities and peer exchange coming, to help out a clear Jayant, Ali cloud computing platform division, Tianchi platform, intel jointly organized the first Apache Flink Geeks Challenge heavy incoming!
Focus on machine learning and computational performance of two popular nowadays field, enter contests, allow themselves to be versatile technology, as well as the opportunity to win prizes 10W.

Contest details, please click :

https://tianchi.aliyun.com/markets/tianchi/flink2019

Apache_Flink_01

Guess you like

Origin yq.aliyun.com/articles/711546