Zhou Zhihua's latest speech: Why is deep learning deep? How good the talent is, how good the artificial intelligence can be

Original link: click to open the link

Abstract:  At the 2018 Jingdong Artificial Intelligence Innovation Summit held yesterday, Professor Zhou Zhihua, Director of the Department of Computer Science and Dean of the School of Artificial Intelligence of Nanjing University, gave a keynote speech entitled "Thinking about Deep Learning". Professor Zhou Zhihua pointed out that what is most lacking in the era of artificial intelligence is talents, because for this industry, only how good you are can you be as good as artificial intelligence.

Yesterday, the 2018 Jingdong Artificial Intelligence Innovation Summit was held. Zhou Bowen, vice president of Jingdong Group and head of the AI ​​platform and research department, unveiled the AI ​​strategy panorama under the technology layout of Jingdong. This panorama can be summed up as "three major entities, seven major application scenarios and five layout directions of artificial intelligence industrialization", namely: with the three main bodies of AI open platform, AI basic research, and AI business innovation, through industry-university-research cooperation Combined, high-end talent training, and the introduction of core talents to build technological capabilities, and use AI for financial technology, smart logistics, smart consumption, smart supply, and external empowerment. At the summit, JD.com's AI open platform NeuHub was officially released, and the "JD Dialog Challenge", the world's first task-oriented multi-round dialogue system grand prix, was officially launched.

At the meeting, Professor Zhou Zhihua , Director of the Department of Computer Science and Dean of the School of Artificial Intelligence of Nanjing University, gave a keynote speech entitled " Thinking about Deep Learning ". Starting from the theoretical basis of deep learning, Professor Zhou Zhihua discussed the question of "why is deep neural network deep" from the perspective of model complexity. He pointed out that while deep learning has many successful applications, it also has difficulties in parameter adjustment and poor repeatability, etc. Problem, it's not the best choice for many tasks. Therefore, exploring models beyond deep neural networks is an important challenge.

Professor Zhou Zhihua finally mentioned his views on the development of the artificial intelligence industry. He said, " The most lacking thing in the era of artificial intelligence is talent. Because for this industry, only the talent you have is the best artificial intelligence possible. " , Xinzhiyuan reported that Professor Zhou Zhihua was appointed as a member of the Academic Committee of the Jingdong Group Artificial Intelligence Research Institute. At the same time, Jingdong Group has started the establishment of the Nanjing Branch of the Jingdong Artificial Intelligence Research Institute in Nanjing. Professor Zhou Zhihua will serve as the academic general advisor of the branch. Nanjing University will cooperate closely with JD.com in AI talent training.

The following is the content of Professor Zhou Zhihua's speech:

c99a978fe0c97f9c00ab92ed619ad2a6ca8d2354

Zhou Zhihua:

First of all, I'm very happy to come to JD.com's event today. You may have heard that Nanjing University has established an artificial intelligence college recently, which is the first artificial intelligence college in China's C9 university. We and JD.com will carry out very in-depth cooperation in scientific research and talent training.

Thanks to Dr. Bowen Zhou for the invitation. Before I came, I asked him what he would like to say today, and he told me that there were many technical people present and suggested that I talk about some cutting-edge academic issues, so today I will tell you a little bit about our deep learning. Very superficial views, only for everyone to criticize and discuss together. We all know that one of the most important technologies that directly set off the artificial intelligence boom is deep learning technology. Today, in fact, deep learning has various applications, and it is everywhere, whether it is images, videos, natural language processing of sounds, etc. So we ask a question, what is deep learning?

The theoretical basis of deep learning is unclear

I think the answer for most people is that deep learning is almost equivalent to deep neural networks. There is a very famous society called SIAM, the International Society for Industrial and Applied Mathematics, and they have a flagship newspaper called SIAM news. In June last year, there was such an article on the front page of this newspaper, which directly said this sentence, saying that deep learning is a subfield of machine learning that uses deep neural networks .

So if we want to talk about deep learning, we cannot avoid deep neural networks . First we have to start with the neural network. Neural network is not a new thing, it can be said that neural network has been studied in the field of artificial intelligence for more than half a century. But in the past, we generally use such a neural network, that is, there is a hidden layer in the middle, or there are two hidden layers. In such a neural network, each unit of it is a very simple computational model. We receive some inputs, which are amplified through some connections, and it's such a very simple formula. The so-called neural network is a system obtained by nested iterations of many such formulas. So what do we actually mean when we talk about using deep neural networks today? Simply put, the number of layers we use will be very deep, many layers. In 2012, when deep learning was just beginning to receive attention, the winner of the ImageNet competition at that time was an 8-layer neural network. Then in 2015, 152 layers were used, and in 2016, 1207 layers were used. This is a very large and very huge system, and it is very difficult to train such a system.

There is some very good news. The most important activation function of the computing unit in the neural network is continuous and differentiable. For example, we used to use such a sigmoid function in the past, which is continuously differentiable, and the same is true of the commonly used ReLu function or its variants. This allows us to easily perform gradient computations, which can then be easily trained with the well-known BP algorithm. With algorithms like this, our neural network has achieved a lot of victories.

But in fact, in academia, everyone has not figured out one thing, that is, why do we use such a deep model? Today, deep learning has achieved a lot of success, but there is a big problem, that is, the theoretical basis is not clear . In theory, we still can't tell how it does it, why it succeeds, and what is the key to it? If we're going to do a theoretical analysis, we first have to have a little intuition as to why it's useful? That's how to start the analysis. But now we don't really know how to look at it at all.

Why is deep learning deep? Model complexity perspective

Why can deep neural networks be deep? So far, there is no unified view in academia. There are many discussions. I am here to tell you about a discussion we gave some time ago. This discussion is actually mainly discussed from the perspective of the complexity of the model .

We know that a machine learning model, its complexity is actually related to its capacity, and its capacity is related to its learning ability. So it means that learning ability and complexity are related. The machine learning community has long known that if we can increase the complexity of a learning model, its ability to learn can improve . How to increase the complexity? For models like neural networks, there are two obvious ways. One is that we make the model darker, and the other is that we make it wider. From the point of view of increasing complexity, it will be more effective to go deeper. When you get wider, you just add some computing units and increase the number of functions. When you get deeper, you not only increase the number, but also increase the degree of its embedding. So from this point of view, we should try to make it deeper.

Then everyone may ask, since it's going to get deeper, don't you already know about it? So just start doing it now? This involves another problem. We have strengthened the learning ability of machine learning, which is not necessarily a good thing. Because one of the problems that machine learning has been struggling with is that it often encounters overfitting. What kind of phenomenon is this? You give me a data set, and I want to learn what is in the data set when I do machine learning. After I learn it, what I hope to learn is general rules, which can be used to predict future things. But sometimes I may have learned some characteristics of the data itself, rather than general rules. A huge mistake is made when it is mistakenly used as a general rule. This phenomenon is called overfitting .

So why do we learn some characteristics of this data itself? In fact, everyone is very clear, because our model learning ability is too strong. When your ability is very, very strong, you may learn some characteristics as a general rule. So we used to be reluctant to use too complicated models.

So why can we use such a model now? There are many factors. The first factor is that now we have big data. For example, if I only have 3000 data in hand, the characteristics I learn are generally unlikely to be general laws. But if there are 30 million, 30 million data, then the characteristics in this data may already be a general rule. So using large data is a key way to alleviate overfitting. Second, today we have a lot of very powerful computing equipment , which allows us to train such models. Third, through the efforts of many scholars in our field, there are a lot of skills and algorithms for training such complex models , which makes it possible for us to use complex models. To sum up: first, we have larger data; second, we have powerful computing equipment; third, we have many effective training techniques. This allows us to use high-complexity models, and deep neural networks are precisely a high-complexity model that is easy to implement.

So using such a theory seems to be able to explain why we can use deep neural networks now, and why are deep neural networks successful? It's because of the complexity. More than a year ago, when we explained this explanation, in fact, many colleagues at home and abroad also agreed with it, and thought it was quite reasonable. But in fact, I myself have not been particularly satisfied with this explanation, because we have not answered an underlying question.

The most important thing about neural networks is the ability to represent learning

If we explain it from the perspective of complexity, we can't tell why flat, or wide networks can't achieve the performance of deep neural networks? In fact, we widen the network, although its efficiency is not so high, but it also has the ability to increase the complexity.

实际上只要有一个隐层,加无限多的神经元进去,它的复杂度也会变得很大。但是这样的模型在应用里面怎么试,我们都发现它不如深度神经网络好。所以从复杂度的角度可能很难回答这个问题,我们需要一点更深入的思考。所以我们要问这么一个问题:深度神经网络里面最本质的东西到底是什么?

今天我们的回答是,表示学习的能力。以往我们用机器学习解决一个问题的时候,首先我们拿到一个数据,比如说这个数据对象是个图像,然后我们就用很多特征把它描述出来,比如说颜色、纹理等等。这些特征都是我们人类专家通过手工来设计的,表达出来之后我们再去进行学习。而今天我们有了深度学习之后,现在不再需要手工去设计特征了。你把数据从一端扔进去,模型从另外一端就出来了,中间所有的特征完全可以通过学习自己来解决。所以这就是我们所谓的特征学习,或者说表示学习。这和以往的机器学习技术相比可以说是一个很大的进步。我们不再需要依赖人类专家去设计特征了。

有些朋友经常说的一个东西是端到端学习。对这个其实我们要从两方面看,一方面,当我们把特征学习和分类器的学习联合起来考虑的时候,可以达到一个联合优化的作用,这是好的方面。但是另外一方面,如果这里面发生什么我们不清楚,这样的端到端学习就不一定真的是好的。因为里面很可能第一个部分在往东,第二个部分在往西,合起来看,好像它往东走的更多一点,其实内部已经有些东西在抵消了。所以实际上机器学习里面早就有端到端学习,比如说我们做特征选择,可能大家知道有一类基于wrapper的方法,它就是端到端的学习,但这类方法是不是比别的特征选择方法一定强呢?不一定。所以这不是最重要的。

真正重要的还是特征学习,或者表示学习。那如果我们再问下一个问题,表示学习最关键的又是什么呢?我们现在有这么一个答案,就是逐层的处理。我引述最近非常流行的一本书,《深度学习》这本书里面的一个图,当我们拿到一个图像的时候,我们如果把神经网络看作很多层,首先它在最底层,好像我们看到的是一些像素这样的东西。当我们一层一层往上的时候,慢慢的可能有边缘,再网上可能有轮廓,甚至对象的部件等等。当然这实际上只是个示意图,在真正的神经网络模型里面不见得会有这么清楚的分层。但是总体上当我们逐渐往上的时候,它确实是不断在对对象进行抽象。我们现在认为这好像是深度学习为什么成功的关键因素之一。因为扁平神经网络能做很多深层神经网络能做的事,但是有一点它是做不到的。当它是扁平的时候,它就没有进行这样的一个深度的加工。 所以深度的逐层抽象这件事情,可能是很关键的。

大家可能就会问,“逐层地处理”在机器学习里面也不是新东西。比如说决策树就是一种逐层处理,这是非常典型的。决策树模型已经有五六十年的历史了,但是它为什么做不到深度神经网络这么好呢?我想答案是这样。首先它的复杂度不够,决策数的深度,如果我们只考虑离散特征的话,它最深的深度不会超过特征的个数,所以它的模型复杂度是有限的。第二,整个决策树的学习过程中,它内部没有进行特征的变换,始终是在一个特征空间里面进行的。这可能也是它的一个问题。大家如果对高级点的机器学习模型了解,你可能会问,那boosting呢?比如说现在很多获胜的模型,xgboost 等等都属于这个boosting的一类,它也是一层一层的往下走。你说他为什么没有取得像深度神经网络这样的成功呢?我想其实问题是差不多的,首先它的复杂度还不够。第二可能是更关键的一点,它始终是在原始空间里面做事情,所有的这些学习器都是在原始特征空间,中间没有进行任何的特征变化。所以现在我们的看法是,深度神经网络到底为什么成功?或者成功的关键原因是什么?我想第一是逐层地处理,第二我们要有一个内部的特征变换。

深度学习成功的三个因素

而当我们考虑到这两件事情的时候,我们就会发现,其实深度模型是一个非常自然的选择。有了这样的模型,我们很容易就可以做上面两件事。但是当我们选择用这么一个深度模型的时候,我们就会有很多问题,它容易overfit,所以我们要用大数据;它很难训练,我们要有很多训练的trick;这个系统的计算开销非常大,所以我们要有非常强有力的计算的设备,比如 GPU 等等。

实际上所有这些东西是因为我们选用了深度模型之后产生的一个结果,它们不是我们用深度学习的原因。所以这和以往的思考不太一样,以往我们认为有了这些东西,导致我们用深度模型。其实现在我们觉得这个因果关系恰恰是反过来,因为我们要用它,所以我们才会考虑上面这些东西。另外还有一点我们要注意的,当我们有很大的训练数据的时候,这就要求我们必须要有很复杂的模型。否则假设我们用一个线性模型的话,给你 2000 万样本还是 2 亿的样本,其实对它没有太大区别。它已经学不进去了。而我们有了充分的复杂度,恰恰它又给我们使用深度模型加了一分。所以正是因为这几个原因,我们才觉得这是深度模型里面最关键的事情。

这是我们现在的一个认识:第一,我们要有逐层的处理;第二,我们要有特征的内部变换;第三,我们要有足够的模型复杂度。这三件事情是我们认为深度神经网络为什么能够成功的比较关键的原因。或者说,这是我们给出的一个猜测。

深度学习存在的问题

那如果满足这几个条件,我们其实马上就可以想到,那我不一定要用神经网络。神经网络可能只是我可以选择的很多方案之一,我只要能够同时做到这三件事,那我可能用别的模型做也可以,并不是一定只能是用深度神经网络。

第一,凡是用过深度神经网络的人都会知道,你要花大量的精力来调它的参数,因为这是个巨大的系统。那这会带来很多问题。首先我们调参数的经验其实是很难共享的。有的朋友可能说,你看我在第一个图像数据集上调参数的经验,当我用第二个图像数据集的时候,这个经验肯定是可以重用一部分。但是我们有没有想过,比如说我们在图像上面做了一个很大的深度神经网络,这时候如果要去做语音的时候,其实在图像上面调参数的经验,在语音问题上基本上不太有借鉴作用。所以当我们跨任务的时候,这些经验可能就很难共享。

第二个问题,今天大家都非常关注我们做出来的结果的可重复性,不管是科学研究也好,技术发展也好,都希望这个结果可重复。 而在整个机器学习领域,可以说深度学习的可重复性是最弱的。我们经常会碰到这样的情况,有一组研究人员发文章说报告了一个结果,而这个结果其他的研究人员很难重复。因为哪怕你用同样的数据,同样的方法,只要超参数的设置不一样,你的结果就不一样。

还有很多问题,比如说我们在用深度神经网络的时候,模型复杂度必须是事先指定的。因为我们在训练这个模型之前,我们这个神经网络是什么样就必须定了,然后我们才能用 BP算法等等去训练它。其实这会带来很大的问题,因为我们在没有解决这个任务之前,我们怎么知道这个复杂度应该有多大呢?所以实际上大家做的通常都是设更大的复杂度。

如果大家关注过去 3、4 年深度学习这个领域的进展,你可以看到很多最前沿的工作在做的都是在有效的缩减网络的复杂度。比如说 RestNet 这个网络通过加了shortcuts,有效地使得复杂度变小。还有最近大家经常用的一些模型压缩,甚至权重的二值化,其实都是在把复杂度变小。实际上它是先用了一个过大的复杂度,然后我们再把它降下来。那么我们有没有可能在一开始就让这个模型的复杂度随着数据而变化,这点对神经网络可能很困难,但是对别的模型是有可能的。

还有很多别的问题,比如说理论分析很困难,需要非常大的数据,黑箱模型等等。那么从另外一个方面,有人可能说你是做学术研究,你们要考虑这些事,我是做应用的,什么模型我都不管,你只要能给我解决问题就好了。其实就算从这个角度来想,我们研究神经网络之外的模型也是很需要的。

虽然在今天深度神经网络已经这么的流行,这么的成功,但是其实我们可以看到在很多的任务上,性能最好的不见得完全是深度神经网络。比如说如果大家经常关心Kaggle上面的很多竞赛,它有各种各样的真实问题,有买机票的,有订旅馆的,有做各种的商品推荐等等。我们去看上面获胜的模型,在很多任务上的胜利者并不是神经网络,它往往是像随机森林,像xgboost等等这样的模型。深度神经网络获胜的任务,往往就是在图像、视频、声音这几类典型任务上。而在别的凡是涉及到混合建模、离散建模、符号建模这样的任务上,其实它的性能可能比其他模型还要差一些。那么,有没有可能做出合适的深度模型,在这些任务上得到更好的性能呢?

我们从学术的观点来总结一下,今天我们谈到的深度模型基本上都是深度神经网络。如果用术语来说的话,它是多层、可参数化的、可微分的非线性模块所组成的模型,而这个模型可以用 BP算法来训练。

探索深度学习之外的方法:深度森林

那么这里面有两个问题。第一,我们现实世界遇到的各种各样的问题的性质,并不是绝对都是可微的,或者用可微的模型能够做最佳建模的。第二,过去几十年里面,我们的机器学习界做了很多很多模型出来,这些都可以作为我们构建一个系统的基石,而中间有相当一部分模块是不可微的。那么这样的东西能不能用来构建深度模型?能不能通过构建深度模型之后得到更好的性能,能不能通过把它们变深之后,使得深度模型在今天还比不上随机森林等等这些模型的任务上,能够得到更好的结果呢?现在有这么一个很大的挑战,这不光是学术上的,也是技术上的一个挑战,就是我们能不能用不可微的模块来构建深度模型?

这个问题一旦得到了回答,我们同时就可以得到很多其他问题的答案。比如说深度模型是不是就是深度神经网络?我们能不能用不可微的模型把它做深,这个时候我们不能用BP算法来训练,那么同时我们能不能让深度模型在更多的任务上获胜?我们提出这个问题之后,在国际上也有一些学者提出了一些相似的看法。可能大家都知道,深度学习非常著名的领军人物Geoffery Hinton教授,他也提出来说,希望深度学习以后能摆脱 BP 算法来做,他提出这件事比我们要晚一些。

我想这样的问题是应该是站在一个很前沿的角度上探索。刚才跟大家分析所得到的三个结论,第一我们要做逐层处理,第二我们要做特征的内部变换,第三,我们希望得到一个充分的模型复杂度。我自己领导的研究组最近在这方面做了一些工作。我们最近提出了一个叫做Deep Forest(深度森林)的方法。这个方法是一个基于树模型的方法,它主要是借用了集成学习里面的很多的想法。第二,在很多不同的任务上,它的模型得到的结果可以说和深度神经网络是高度接近的。除了一些大规模的图像任务,这基本上是深度神经网络的杀手锏应用,它在很多的其它任务上,特别是跨任务的表现非常好。我们可以用同样一套参数,用不同的任务,性能都还不错,就不再需要逐任务的慢慢去调参数,同时它要调的超参数少很多,容易调的多。还有一个很重要的特性,它有自适应的模型复杂度,可以根据数据的大小,自动的来判定模型该长到什么程度。

另外一方面,我们要看到,这实际上是在深度学习这个学科领域发展思路上一个全新的探索。所以今天虽然它已经能够解决一部分问题了,但是我们应该可以看到它再往下发展下去,它的前景可能是今天我们还不太能够完全预见到的。

我经常说我们其实没有什么真正的颠覆性的技术,所有的技术都是一步一步发展起来的。比方说现在深度神经网络里面最著名的CNN,从首次提出到ImageNet上获胜是经过了30年,从算法完全成形算起,到具备在工业界广泛使用的能力也是经过了20年,无数人的探索改进。所以,今天的一些新探索,虽然已经能够解决一些问题,但更重要的是再长远看,经过很多进一步努力之后,可能今天的一些探索能为未来的技术打下重要的基础。

以前我们说深度学习是一个黑屋子,这个黑屋子里面有什么东西呢?大家都知道,有深度神经网络。现在我们把这个屋子打开了一扇门,把深度森林放进来了,那我想以后可能还有很多更多的东西。可能这是从学科意义来看,这个工作更重要的价值。

最后我想谈一谈关于人工智能产业发展的一些看法,因为大家都知道我们南京大学人工智能学院马上要跟京东开展深入的在科学研究和人才培养方面的合作。关于人工智能产业的发展,我们要问一个问题,我们到底需要什么?大家说需要设备吗?做人工智能的研究,不需要特殊机密的设备,你只要花钱,这些设备都能买得到。那么缺数据吗?现在我们的数据收集、存储、传输、处理的能力大幅度提升,到处都是数据。

真正缺的是什么?人工智能时代最缺的就是人才。因为对这个行业来说,你有多好的人才,才可能有多好的人工智能。所以我们现在可以看到,全球是在争抢人工智能人才。不光是中国,美国也是这样。所以我们要成立人工智能学院,其实就有这样的考虑。信息化之后,人类社会必然进入智能化,可以说这是个不可逆转、不可改变的一个趋势。我们基于数据信息,为人提供智能辅助,让人做事的时候更容易,那是我们所有人的愿望。蒸汽机的革命是把我们从体力劳动里面解放出来。人工智能革命应该是把我们从一些繁复性强的、简单智力劳动中解放出来。

The subject of artificial intelligence is different from some other short-term investment trends and short-term hot spots. After more than 60 years of development, it already has a large and real body of knowledge. The scarcity of high-level artificial intelligence talents is a worldwide problem. Many of our companies are now poaching people with a lot of money, but in fact, poaching people can't bring incremental gains. So I think we need to start from the source and cultivate high-level artificial intelligence talents for the development of the country, society and industry, so in this regard, we thank JD.com, as a socially responsible enterprise, who is willing to build a special building next to my college. A research institute, together to carry out new explorations on the training and cooperation of high-level artificial intelligence talents. Finally, friends from all walks of life are welcome to support our School of Artificial Intelligence, Nanjing University in various ways, thank you!


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324511164&siteId=291194637