[Reading Notes] Xinzhiyuan: Why is deep learning deep? --Zhou Zhihua

Original: https://mp.weixin.qq.com/s/C5Xq2P5v3lGmFOivJ_PTzw

 

At the 2018 Jingdong Artificial Intelligence Innovation Summit held on April 15 , ① Zhou Bowen , vice president of Jingdong Group and head of the AI ​​platform and research department, unveiled the AI ​​strategy panorama under the technical layout of Jingdong . , seven application scenarios and five layout directions of artificial intelligence industrialization”, namely: with the three main bodies of AI open platform, AI basic research, and AI business innovation, through the combination of production, education and research, high-end talent training, and the introduction of core talents to create Scientific and technological capabilities, using AI for financial technology, smart logistics, smart consumption, smart supply, and external empowerment; ②Professor Zhou Zhihua , director of the Department of Computer Science and Dean of the School of Artificial Intelligence of Nanjing University, gave a theme entitled "Thinking About Deep Learning" speech. Professor Zhou Zhihua pointed out that what is most lacking in the era of artificial intelligence is talents , because for this industry, only how good you are can you be as good as artificial intelligence.

 

Professor Zhou Zhihua's keynote speech "Thinking About Deep Learning" started from the theoretical basis of deep learning, and discussed the question of "why is deep neural network deep" from the perspective of model complexity, and proposed that while deep learning has many successful applications, There are also problems such as difficulty in parameter adjustment and poor repeatability, which are not the best choice for many tasks. Therefore, exploring models beyond deep neural networks is an important challenge.

 

The following is the content of Professor Zhou Zhihua's speech (abstract):

1. What is deep learning?

The theoretical basis of deep learning is unclear.

I think the answer for most people is that deep learning is almost equivalent to deep neural networks. SIAM news, the flagship newspaper of the well-known society SIAM (International Society for Industrial and Applied Mathematics), said that deep learning is a subfield of machine learning that uses deep neural networks ( so if we want to talk about deep learning, deep neural networks cannot be avoided ) .

First we have to start with the neural network . Neural networks are actually not a new thing, they have been studied in the field of artificial intelligence for more than half a century. Generally, we will use a neural network with one hidden layer in the middle, or two hidden layers, in which each unit is a very simple computational model, thus forming a system.

Today's deep neural network , in simple terms, refers to a neural network with many layers. In 2012, when deep learning was just beginning to receive attention, the champion of the ImageNet competition at that time used an 8-layer neural network, in 2015 it was 152 layers, and in 2016 it was 1207 layers. This is a very large and very huge system, and it is very difficult to train such a system. The most important activation function of the computing unit in the neural network is continuous and differentiable , such as the sigmoid function, the ReLu function or its variants. This allows us to easily perform gradient calculations to train neural networks using the well-known BP algorithm.

 

2. Why is deep learning deep? Model complexity perspective

So far, there is no unified view in academia. There are many discussions. The following discussion by Professor Zhou Zhihua is mainly discussed from the perspective of the complexity of the model.

For a machine learning model: ① The complexity is actually related to its capacity, and the capacity is related to its learning ability . ②If we can enhance the complexity of a learning model, then its learning ability can be improved.

There are two obvious ways to increase the complexity of neural networks: ① the model becomes deeper, and ② the model becomes wider, obviously deeper will be more effective. When it becomes wider, it just adds some computing units and the number of functions. When it becomes deeper, it not only increases the number, but also increases the degree of embedding between functions. So from this point of view, we should try to make it deeper.

Then everyone may ask, since it's going to get deeper, don't you already know about it? So just start doing it now? This involves another problem. We have strengthened the learning ability of machine learning, which is not necessarily a good thing. Because one of the problems that machine learning has been struggling with is that it often encounters overfitting. So we used to be reluctant to use too complicated models.

So why can we use such a model now? There are many factors, first we have bigger data; second we have powerful computing equipment; third we have many effective training techniques. This allows us to use high-complexity models, and deep neural networks are precisely a high-complexity model that is very easy to implement.

Why can't flat, or wide networks achieve the performance of deep neural networks? In fact, we widen the network, and it also has the ability to increase the complexity. It may be difficult to answer this question from a complexity perspective, and we need to think a little more deeply.

 

3. What is the most essential thing in a deep neural network?

Deep learning makes researchers no longer need to manually design features, and the model can learn data features and perform feature representation. This can be said to be a big improvement compared to previous machine learning techniques.

For " end-to-end learning ", we need to look at it from two aspects: on the one hand, combining feature learning and classifier learning can achieve a joint optimization effect (similar to the wrapper-based method of feature selection), which is an advantage; On the one hand, we don't know what happened during the learning process, such end-to-end learning is not necessarily really good. So it's not the most important thing.

The most important thing about deep neural network is the ability of representation learning.

What is the most important thing about learning? It's a layer-by-layer process . The bottom layer is some pixels, and it will gradually learn the edges, contours, and even the parts of the object layer by layer, constantly abstracting the object. From this point of view, it can explain the key factor of why deep learning is successful, because flat neural networks have no way to perform layer-by-layer depth processing.

"Processing layer by layer" is not new in machine learning either. Such as decision trees and boosting. The reasons why they can't do deep neural networks so well are: 1. Its complexity is not enough; 2. There is no feature transformation during the learning process.

So now our view is that the key reasons for the success of deep neural networks: the first is the layer-by-layer processing, and the second is the internal feature transformation.

 

4. Three factors for deep learning success

Taken together, there are three factors for the success of deep learning: first, layer-by-layer processing; second, internal transformation of features; third, sufficient model complexity.

 

5. Problems with deep learning

Common problems of deep models: 1. It is easy to overfit, so we need to use big data; 2. It is difficult to train, we need a lot of training tricks; 3. The computational cost of the system is very large, so we must have very powerful computing equipment, such as GPU and so on.

Then if these three key factors are met, we can immediately think that we do not have to use deep neural networks, but other models can also be used, as long as these three conditions can be met at the same time.

First, deep neural networks take a lot of effort to tune parameters. Experience in tuning parameters is difficult to share.

Second, the reproducibility of deep learning is the weakest.

Third, the model complexity of deep neural networks must be specified in advance. In fact, what people usually do is to set more complexity.

Therefore, in the past 3 or 4 years, many cutting-edge works in the field of deep learning have been doing to effectively reduce the complexity of the network. For example, RestNet effectively reduces the complexity, model compression, and weight binarization by adding shortcuts. Actually it uses an excessive complexity first, and then we bring it down. So is it possible for us to make the complexity of this model vary with the data at the beginning, which may be difficult for neural networks, but it is possible for other models.

There are many other problems, such as the difficulty of theoretical analysis, the need for very large data, black box models and so on. From another aspect, some people may say that you are doing academic research, and you need to consider these things. I am doing application, and I don’t care about any model, as long as you can solve the problem for me. In fact, even from this perspective, we need to study models other than neural networks.

The tasks that deep neural networks win are often typical tasks such as images, videos, and sounds. For tasks such as hybrid modeling, discrete modeling, and symbolic modeling, its performance may actually be worse than other models, such as random forest, xgboost, etc.

 

To summarize from an academic point of view, the deep models we are talking about today are basically deep neural networks. In terms of terms, it is a model composed of multi-layer, parameterizable, differentiable nonlinear modules, and this model can be trained using the BP algorithm.

 

6. Exploring methods beyond deep learning: deep forests

So there are two problems here. First, the nature of the various problems we encounter in the real world is not absolutely differentiable, or can be best modeled with a differentiable model. Second, in the past few decades, our machine learning community has made many, many models, which can be used as the cornerstone for us to build a system, and a considerable number of modules in the middle are non-differentiable. So can something like this be used to build deep models? Can we get better performance by building deep models, can we make them deeper so that deep models are not as good as random forests and other tasks today, can we get better results? Now there is such a big challenge, which is not only academic, but also a technical challenge, that is, can we build deep models with non-differentiable modules?

 

Once this question is answered, many other questions can be answered at the same time. For example, is a deep model a deep neural network? Can we make it deeper with a non-differentiable model, at this time we cannot use the BP algorithm to train, so at the same time can we let the deep model win on more tasks? After we raised this question, some scholars in the world have also put forward some similar views. As everyone may know, Professor Geoffery Hinton, a very famous leader in deep learning , also proposed that he hoped that deep learning can get rid of the BP algorithm in the future . He proposed this matter later than us.

I think such a question should be explored from a very cutting-edge perspective. The three conclusions I have just analyzed with you. First, we need to do layer-by-layer processing. Second, we need to do internal transformation of features. Third, we hope to obtain a sufficient model complexity. My own research group has recently done some work in this area.

Deep Forest . This method is a tree model-based method, which mainly borrows many ideas from ensemble learning. Second, on many different tasks, the results obtained by its model can be said to be highly close to deep neural networks. Except for some large-scale image tasks, this is basically the killer application of deep neural networks, and it performs very well on many other tasks, especially across tasks. We can use the same set of parameters and use different tasks, and the performance is not bad, so we no longer need to adjust the parameters task by task slowly, and at the same time, it needs to adjust a lot less hyperparameters, which is much easier to adjust. Another important feature is that it has adaptive model complexity, which can automatically determine how long the model should be based on the size of the data.

On the other hand, we have to see that this is actually a brand-new exploration of the development ideas in the field of deep learning. Therefore, although it has been able to solve some problems today, we should be able to see it develop further, and its prospects may not be fully foreseeable today.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324510762&siteId=291194637
Recommended