What is the impact on the network depth deep learning model performance?

Original Address: https://www.jianshu.com/p/72481d794a6b

Hello everyone, this is the column "AI perplexed territory," the second article, about the depth of the relational model and model performance.

Into the realm of middle age, it is to master the move began, in this state needs its own independent thinking. If a study is from imitation, to follow, to the creative process, so at this stage, should imitate and follow the jump across the stage, entered the stage of creation. From the beginning of this realm, tells the story of the problem may not have the answer, the more we work together is to stimulate thinking.

Author & editor | There are three words

The reason why the depth learning model achieved success in a variety of tasks, sufficient network depth has played a key role, it is not the deeper the model, the more good performance?

 

1 Why can deepen improve performance

Bengio and LeCun 2017 article [1], it reads, "We claim that most functions that can be represented compactly by deep architectures can not be represented by a compact shallow architecture", generally meaning that if a majority function deep structure just right to solve the problem, it is impossible to use a lighter compact structure to solve the same.

To solve more complex problems, or to increase the depth or width increases, increasing the width of the cost is often much higher than the depth.

Ronen Eldan et al can even design a small layer 3 network, it said arbitrary function can not be represented by the layer 2 network. In short, a certain depth is necessary.

So with the deepening of the model, in the end What are the benefits of it?

1.1, better fitting characteristics.

The main module is now the depth is the convolution learning network structure, pooled, activation, which is a standard non-linear transformation module. Deeper model, non-linear means better communication skills, you can learn more complex transformations, which can fit more complex features input.

See below a comparison of FIG. [2], a solid line is only one layer, 20 neurons in the model, the broken line is a two-layer, 10 neurons in each layer of the model. As can be seen from the figure, a two-layer network has the ability to better fit, this feature also applies to a deeper network.

 
 

1.2, network deeper each layer to do things more simple.

Each network layer carry out their duties, we see a classic network each network layer to learn weights from zfnet deconvolution.

The first layer to learn the edge, a second layer learned simple shape, the third layer begins to learn the shape of the target, the deeper layer of the network can learn more complex expression. If only one layer, transform it means to learn very complex, it is difficult to do.

 
 

 

2 how the quantitative assessment of model performance and depth

The above is the network to deepen the two main benefits of the more powerful features of learning skills and layer by layer.

Theoretically a layer 2 network can fit any bounded continuous function, but require a large width, which is not realistic in actual use, so we will use the Deep Web.

We know that a model the more the better, but how to use quantitative indicators to measure a direct relationship between the ability and the depth model, there is a two schemes direct and indirect methods .

Direct method is the definition of index theory analysis of network capacity, the indirect method is to compare a series of indicators such as accuracy in the task and so on.

2.1, the direct method

Early shallow network, the ability to research by approximating function, and compare Boolean circuit, VC dimension of the network such as assessed, but does not apply to the Deep Web.

Currently direct assessment of network performance is a good research idea linear range (linear regions). Expression of neural network may be considered as a piecewise linear function, if a curve to a perfect fit, requires an infinite number of linear interval (linear regions). The more linear range, indicating more flexible network.

 
 

Yoshua Bengio, who by then to measure the flexibility of the model number of the linear range. A deeper network, the input space can be divided into more linear response space, is its ability to fold exponential shallow network.

For an input has n0, n-output, single hidden layer kn network, which is the maximum number:

 
 

对于拥有同样多的参数,n0个输入,n个输出,k个隐藏层,每一层n个节点的多层网络,其最大数量为:

 
 

因为n0通常很小,所以多层网络的数量是单层的指数倍(体现在k上),计算方法是通过计算几何学来完成,大家可以参考论文[3]。

除此之外还有一些其他的研究思路,比如monica binachini[4]等使用的betti number,Maithra Raghu等提出的trajectory length[5]。

虽然在工程实践中这些指标没有多少意义甚至不一定有效,但是为我们理解深度和模型性能的关系提供了理论指导。

2.2、间接法

间接法就是展现实验结果了,网络的加深可以提升模型的性能,这几乎在所有的经典网络上都可以印证。比较不同的模型可能不够公平,那就从同一个系列的模型来再次感受一下,看看VGG系列模型,ResNet系列模型,结果都是从论文中获取。

 
 
 
 

在一定的范围内,网络越深,性能的确越好。

 

3 加深就一定更好吗?

前面说到加深在一定程度上可以提升模型性能,但是未必就是网络越深越越好,我们从性能提升和优化两个方面来看。

3.1、加深带来的优化问题

ResNet为什么这么成功,就是因为它使得深层神经网络的训练成为可行。虽然好的初始化,BN层等技术也有助于更深层网络的训练,但是很少能突破30层。

VGGNet19层,GoogleNet22层,MobileNet28层,经典的网络超过30层的也就是ResNet系列常见的ResNet50,ResNet152了。虽然这跟后面ImageNet比赛的落幕,大家开始追求更加高效实用的模型有关系,另一方面也是训练的问题。

深层网络带来的梯度不稳定,网络退化的问题始终都是存在的,可以缓解,没法消除。这就有可能出现网络加深,性能反而开始下降。

3.2、网络加深带来的饱和

网络的深度不是越深越好,下面我们通过几个实验来证明就是了。公开论文中使用的ImageNet等数据集研究者已经做过很多实验了,我们另外选了两个数据集和两个模型。

第一个数据集是GHIM数据集,第二个数据集是从Place20中选择了20个类别,可见两者一个比较简单,一个比较困难。

第一个模型就是简单的卷积+激活的模型,第二个就是mobilenet模型。

首先我们看一下第一个模型的基准结构,包含5层卷积和一个全连接层, 因此我们称其为allconv6吧,表示深度为6的一个卷积网络。

 
 

接下来我们试验各种配置,从深度为5到深度为8,下面是每一个网络层的stride和通道数的配置。

 
 

我们看结果,优化都是采用了同一套参数配置,而且经过了调优,具体细节篇幅问题就不多说了。

 
 

看的出来网络加深性能并未下降,但是也没有多少提升了。allconv5的性能明显更差,深度肯定是其中的一个因素。

我们还可以给所有的卷积层后添加BN层做个试验,结果如下,从allconv7_1和allconv8_1的性能相当且明显优于allconv6可以得出与刚才同样的结论。

 
 

那么,对于更加复杂的数据集,表现又是如何呢?下面看在place20上的结果,更加清晰了。

 
 

allconv5,allconv6结果明显比allconv7,allconv8差,而allconv7和allconv8性能相当。所以从allconv这个系列的网络结构来看,随着深度增加到allconv7,之后再简单增加深度就难以提升了。

接下来我们再看一下不同深度的mobilenet在这两个数据集上的表现,原始的mobilenet是28层的结构。

不同深度的MobileNet在GHIM数据集的结果如下:

 
 

看得出来当模型到16层左右后,基本就饱和了。

不同深度的MobileNet在Place20数据集的结果如下:

 
 

与GHIM的结果相比,深度带来的提升更加明显一些,不过也渐趋饱和。

这是必然存在的问题,哪有一直加深一直提升的道理,只是如何去把握这个深度,尚且无法定论,只能依靠更多的实验了。

除此之外,模型加深还可能出现的一些问题是导致某些浅层的学习能力下降,限制了深层网络的学习,这也是跳层连接等结构能够发挥作用的很重要的因素。

关于网络深度对模型性能的影响,这次就先说这么多。

[1] Bengio Y, LeCun Y. Scaling learning algorithms towards AI[J]. Large-scale kernel machines, 2007, 34(5): 1-41.

[2] Montufar G F, Pascanu R, Cho K, et al. On the number of linear regions of deep neural networks[C]//Advances in neural information processing systems. 2014: 2924-2932.

[3] Pascanu R, Montufar G, Bengio Y. On the number of response regions of deep feed forward networks with piece-wise linear activations[J]. arXiv preprint arXiv:1312.6098, 2013.

[4] Bianchini M, Scarselli F. On the complexity of neural network classifiers: A comparison between shallow and deep architectures[J]. IEEE transactions on neural networks and learning systems, 2014, 25(8): 1553-1565.

[5] Raghu M, Poole B, Kleinberg J, et al. On the expressive power of deep neural networks[C]//Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017: 2847-2854.

 

to sum up

The name of the depth of learning took "deep", the importance of the depth of the model is visible. This time we covered the principles of the depth of the model upgrade brings, how to quantitatively assess the contribution of the depth of the model performance, and enhance network problems encountered.



Author: There are three AI
link: https: //www.jianshu.com/p/72481d794a6b
Source: Jane books
are copyrighted by the author. Commercial reprint please contact the author authorized, non-commercial reprint please indicate the source.

Guess you like

Origin www.cnblogs.com/lzhu/p/11856350.html