General explanation [reprint] CNN, RNN, DNN's

General explanation CNN, RNN, DNN's

https://www.jianshu.com/p/bab3bbddb06b?utm_campaign=maleskine&utm_content=note&utm_medium=seo_notes&utm_source=recommendation

 


0.0952017.10.16 19:10:36 number of words read 4,648 3,145

CNN (convolution neural network), RNN (Recurrent Neural Network), the internal network structure DNN (depth neural networks) What is the difference?

Turn knew almost  Section Introduction Jun  answer

Neural network technology originated in the 1950s and 1960s, when called Perceptron (perceptron), it has an input layer, a hidden layer and output layer. Feature vectors by the hidden layer converting input reaches the output layer, the output layer obtained classification results. Early promoters of Perceptron is Rosenblatt. (Pull an unrelated: As the technology behind computing, perceptual transfer function is then used to pull wire rheostat to change the resistance of the mechanical implementation, brain scientists pulled up about the way the wires crowded ...)

However, Rosenblatt single-layer Perceptron could not have a serious serious problem that it is slightly more complex functions are powerless (such as most typical "exclusive or" operation). XOR can not even fit, you can expect that the goods have any practical use it o (╯ □ ╰) o

With the development of mathematics, this shortcoming until the 1980's was only Rumelhart, Williams, Hinton, LeCun et al (vote anyway, is a large cattle) MLP invention (multilayerperceptron) to overcome. MLP, by definition, there are more hidden layer Perceptron (nonsense ......). Come, we look at the structure of the MLP:

 

FIG 1 the neural network on the entire lower-layer neurons connected - Multilayer Perceptron

MLP shedding their early discrete transfer function, or using sigmoid function tanh the like in response to continuous excitation of analog neuron, back propagation BP algorithm Werbos invention is used in the training algorithm. Yes, this is what we are talking about goods neural network NN-- neural network-aware than it sounds not know where to go high-end machine! This again tells us plays a good name is important for research (zhuang) study (bi)!

MLP model overcomes the limitations of analog XOR logic before, while more layers also make the network better able to describe the complex situation in the real world. Hinton was younger as I believe it must be flushed with success.

Inspiration MLP has brought us is the number of layers of the neural network will determine its ability to portray reality - the use of fewer neurons in each layer to fit a more complex function [1].

(Bengio says: functions that can be compactly

represented by a depth k architecture might require an exponential number of

computational elements to be represented by a depth k − 1 architecture.)

Even Daniel, had expected a neural network needs to become deeper, but there is always a nightmare linger around. With the deepening layers of neural networks, optimization functions more easily fall into local optimal solution, and this "trap" more and more deviated from the true global optimum. Use of limited training of Deep Web data, network performance not as good as the more shallow. Meanwhile, another problem can not be ignored with increasing network layers, "gradient disappeared" phenomenon is more serious. Specifically, we often use as a sigmoid neurons in the input and output functions. 1 to the amplitude of a signal, when the gradient back propagation BP, each layer transmission, a gradient of the attenuation of the original 0.25. Layers of more than one, the low-level decay gradient index substantially not receive a valid training signal.

In 2006, Hinton with pre-training methods to alleviate the problem of local optima, will promote the hidden layer to 7 layers [2], the neural network has a real sense of "depth", which opened a boom in depth learning. Here the "depth" does not define a fixed - in speech recognition network layer 4 can be considered as "deep", and more than 20 in the image recognition common network layer. To overcome gradient disappears, ReLU, maxout other transfer function instead of the sigmoid, the now formed substantially in the form of DNN. Single Structurally, DNN and fully connected multilayer perceptron. 1 there is no difference.

It is worth mentioning that this year's emergence of the motorway network (highway network) and residual depth learning (deep residual learning) to avoid further gradient disappears, the network reached an unprecedented number of layers hundred multi-layer (residual depth study: 152 layer) [3,4]! The main problem specific structure can search on their own understanding. If you have not previously suspected in many ways marked a gimmick "deep learning", the result is really won the people convinced.

 

Figure 2 scaled-down version of the depth of the residual learning network, only 34 layers, the final version of a 152 layer, self-feel

1, we see the structure of the lower layer fully connected DNN's neurons and neurons in the upper layer are all capable of forming a connection, the potential problem caused by the expansion of the number of parameters. Assuming that the input is an image 1K * 1K pixels, the hidden layer 1M nodes, a layer of light which have 10 ^ 12 weights need to be trained, which is not only easy to over-fitting and very easy to fall into local optimum . Further, the image has the inherent local modes (such as contour, the border, the eyes, nose, mouth, etc.) can be used, obviously should combine image processing and neural network technology. At this point we can reposition itself in the title Lord said convolution neural network CNN. For CNN, it is not all upper and lower neurons can be directly connected, but through the "convolution kernel" as an intermediary. With a convolution kernel is shared across all image, the image still retains its original position relationships through convolution operation. Schematic convolution transmission between the two layers is as follows:

 

FIG 3 convolutional neural network hidden layer (From Theano Tutorial)

通过一个例子简单说明卷积神经网络的结构。假设图3中m-1=1是输入层,我们需要识别一幅彩色图像,这幅图像具有四个通道ARGB(透明度和红绿蓝,对应了四幅相同大小的图像),假设卷积核大小为100*100,共使用100个卷积核w1到w100(从直觉来看,每个卷积核应该学习到不同的结构特征)。用w1在ARGB图像上进行卷积操作,可以得到隐含层的第一幅图像;这幅隐含层图像左上角第一个像素是四幅输入图像左上角100*100区域内像素的加权求和,以此类推。同理,算上其他卷积核,隐含层对应100幅“图像”。每幅图像对是对原始图像中不同特征的响应。按照这样的结构继续传递下去。CNN中还有max-pooling等操作进一步提高鲁棒性。

 

图4一个典型的卷积神经网络结构,注意到最后一层实际上是一个全连接层(摘自Theano教程)

在这个例子里,我们注意到输入层到隐含层的参数瞬间降低到了100*100*100=10^6个!这使得我们能够用已有的训练数据得到良好的模型。题主所说的适用于图像识别,正是由于CNN模型限制参数了个数并挖掘了局部结构的这个特点。顺着同样的思路,利用语音语谱结构中的局部信息,CNN照样能应用在语音识别中。

全连接的DNN还存在着另一个问题——无法对时间序列上的变化进行建模。然而,样本出现的时间顺序对于自然语言处理、语音识别、手写体识别等应用非常重要。对了适应这种需求,就出现了题主所说的另一种神经网络结构——循环神经网络RNN。

在普通的全连接网络或CNN中,每层神经元的信号只能向上一层传播,样本的处理在各个时刻独立,因此又被成为前向神经网络(Feed-forward Neural Networks)。而在RNN中,神经元的输出可以在下一个时间戳直接作用到自身,即第i层神经元在m时刻的输入,除了(i-1)层神经元在该时刻的输出外,还包括其自身在(m-1)时刻的输出!表示成图就是这样的:

 

图5RNN网络结构

我们可以看到在隐含层节点之间增加了互连。为了分析方便,我们常将RNN在时间上进行展开,得到如图6所示的结构:

 

图6RNN在时间上进行展开

Cool,(t+1)时刻网络的最终结果O(t+1)是该时刻输入和所有历史共同作用的结果!这就达到了对时间序列建模的目的。

不知题主是否发现,RNN可以看成一个在时间上传递的神经网络,它的深度是时间的长度!正如我们上面所说,“梯度消失”现象又要出现了,只不过这次发生在时间轴上。对于t时刻来说,它产生的梯度在时间轴上向历史传播几层之后就消失了,根本就无法影响太遥远的过去。因此,之前说“所有历史”共同作用只是理想的情况,在实际中,这种影响也就只能维持若干个时间戳。

为了解决时间上的梯度消失,机器学习领域发展出了长短时记忆单元LSTM,通过门的开关实现时间上记忆功能,并防止梯度消失,一个LSTM单元长这个样子:

 

图7LSTM的模样

除了题主疑惑的三种网络,和我之前提到的深度残差学习、LSTM外,深度学习还有许多其他的结构。举个例子,RNN既然能继承历史信息,是不是也能吸收点未来的信息呢?因为在序列信号分析中,如果我能预知未来,对识别一定也是有所帮助的。因此就有了双向RNN、双向LSTM,同时利用历史和未来的信息。

 

图8双向RNN

事实上,不论是那种网络,他们在实际应用中常常都混合着使用,比如CNN和RNN在上层输出之前往往会接上全连接层,很难说某个网络到底属于哪个类别。不难想象随着深度学习热度的延续,更灵活的组合方式、更多的网络结构将被发展出来。尽管看起来千变万化,但研究者们的出发点肯定都是为了解决特定的问题。题主如果想进行这方面的研究,不妨仔细分析一下这些结构各自的特点以及它们达成目标的手段。入门的话可以参考:

Ng写的Ufldl:UFLDL教程 - Ufldl

也可以看Theano内自带的教程,例子非常具体:Deep Learning Tutorials

欢迎大家继续推荐补充。

当然啦,如果题主只是想凑个热闹时髦一把,或者大概了解一下方便以后把妹使,这样看看也就罢了吧。

参考文献:

[1]

Bengio Y. Learning Deep

Architectures for AI[J]. Foundations & Trends® in Machine Learning, 2009,

2(1):1-127.

[2]

Hinton G E, Salakhutdinov R R.

Reducing the Dimensionality of Data with Neural Networks[J]. Science, 2006,

313(5786):504-507.

[3]

He K, Zhang X, Ren S, Sun J. Deep

Residual Learning for Image Recognition. arXiv:1512.03385, 2015.

[4]

Srivastava R K, Greff K,

Schmidhuber J. Highway networks. arXiv:1505.00387, 2015.

Guess you like

Origin www.cnblogs.com/jinanxiaolaohu/p/11933032.html