Notes depth learning portal (xx): Classic neural network (LeNet-5, AlexNet and VGGNet)

Column - deep learning introductory notes

statement

1) The article organize information from the Internet Daniel and machine learning experts selfless dedication, specific information see references cited.
2) This article is for academic, non-commercial use. Therefore, each particular portion no detailed references correspond. If a section does not accidentally infringing on people's interests, forgive us, and contact the bloggers to delete.
3) Caishuxueqian blogger, article if inappropriate, please point out the common progress, thank you.
4) This belongs to the first version, if wrong, need to continue to revise and deletions. Also hope a lot of pointing. We all share a little bit, together contribute to promote the country's scientific research.

Notes depth learning portal (xx): Classic neural network (LeNet-5, AlexNet and VGGNet)

1, why should instantiate explore?

The fastest most visually familiar with network structures (such as convolution layer, layer, and a pool of fully connected layers of these components) is a look at some examples of convolution neural network analysis, just like a lot of people by looking at other people's code to learn Like programming, by studying other people build cases effective components is a good way to actually perform well neural network framework in computer vision tasks are often applicable to other tasks, for example, someone has already trained or calculated good at identifying cats, dogs, human neural networks or neural network framework, for computer vision recognition task of autonomous vehicles, is completely learn.

According to schedule, you should be able to read some of the research papers of computer vision, such as these classic network:
Here Insert Picture Description

  • LeNet-5 network, the first generation of digital handwriting recognition network, I should remember the 1980s;
  • Often quoted AlexNet , open a new round of deep learning craze;
  • There VGG network, found a huge impact on the network to enhance the effect of depth;
  • Then ResNet , also known as residual network;
  • It will tell Google Inception neural network analysis examples.

Understanding of these neural networks, I believe how you will build an effective convolutional neural network more sense! ! There are various network components period of use, even if the computer is not your main visual direction, you will from ResNet and Inception find some good ideas example of such a network.

2, classic network

The notes to learn a few classic neural network architecture are:

  • LeNet-5
  • AlexNet
  • VGGNet

let's start.


1)LeNet-5

First look LeNet-5 network structure.

Papers Address: http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf

Suppose an image of 32 × 32 × 1, LeNet-5 can recognize handwritten figure number, such as 7, as LeNet-5 is a grayscale image for the training, the size of the picture is only 32 × 32 × 1, because LeNet -5 structure 6 using a 5 × 5 filter, a stride. 1, padding is 0, the output is 28 × 28 × 6, then the pool operation, the paper written at that time, people are more like the average pooled, and now we could use the largest pool of some more! In any case, the width of the filter 2, steps 2, size, height and width of the image are reduced 2-fold, the output is an image of 14 × 14 × 6.
Here Insert Picture Description
The next layer is the convolution, a group of 16 with a 5 × 5 filter, the output of the new channel 16, while LeNet-5 papers have been written in 1998, when it does not use padding , or always use valid convolution, which is why once for each convolution of the image height and width are reduced, the image is reduced from 14 to 14 to 10 × 10, and then is pooled layer, the height and width is further reduced by half, an output image of 5 × 5 × 16. All numbers are multiplied, the product is 400.

Then the next layer is fully connected layers, 400 nodes, each node has 120 neurons, here have a fully connected layers, but sometimes from node 400 to extract a portion of the node 84, to build another layer fully connected, like this, there are two fully connected layers.

The final step is to use this feature to get the final 84 output, where a node coupled to predict Y ^ \ Hat {y} Value, Y ^ \ Hat {y} There are 10 possible values, corresponding to the 10 identification numbers 0-9. It is used in the present version softmax function output, and at that time, LeNet. 5- network uses another free now rarely used in the output layer.

LeNet-5 neural networks will be smaller, only about 60 000 parameters, and now, often see neural networks contain ten million to one hundred million argument, 1,000 times larger than this neural network are also many. In any case, if left to right, as more and more networks, the height and width of the reduced image, reduced from the initial 32 × 32 to 28 × 28, then 14 × 14,10 × 10, and finally only 5 × 5, at the same time, as the network level deeper, the number of channels has been increased, from 1 to 6, and then to 16.
Here Insert Picture Description
For those students who want to try to read the paper, and then add a few points:

  • The next sections for those students who intend to read classic paper, it will be more in-depth.
  • If you read the content, you can be skipped.
  • Of course, you can look at what can be regarded as a kind of neural network history review it.
  • Finally, I do not understand it does not matter, slowly.

If you carefully read this classic paper LeNet-5 , you will find that in the past people use sigmod function and tanh function, rather than ReLu function, special network structure is also that there is a link between the network layer , which seems to appear very interesting today. For example, at the time a computer is very slow speed in order to reduce the amount of calculation and parameters, classic LeNet-5 network uses a very complex calculation, these complex details mentioned in the papers, now generally no.

2)AlexNet

第二种经典神经网络是 AlexNet,是以论文的第一作者 Alex Krizhevsky 的名字命名的,另外两位合著者是 ilya SutskeverGeoffery Hinton,努力也许以后会有你自己的名字命名的网络结构也未可知。

论文地址:http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
Here Insert Picture Description
首先用一张227×227×3的图片作为输入,实际上原文中使用的图像是224×224×3,但是如果你尝试去推导一下,会发现227×227这个尺寸更好一些。

第一层使用96个11×11的过滤器,步幅为4,因此尺寸缩小到55×55,缩小了4倍左右。然后用一个3×3的过滤器构建最大池化层, f = 3 f=3 ,步幅 s s 为2,卷积层尺寸缩小为27×27×96。

接着再执行一个5×5的卷积,padding 之后,输出是27×27×276。然后再次进行最大池化,尺寸缩小到13×13。

再执行一次 same 卷积,相同的 padding,得到的结果是13×13×384,384个过滤器。

再做一次 same 卷积,就像这样。再做一次同样的操作,最后再进行一次最大池化,尺寸缩小到6×6×256。

6×6×256等于9216,将其展开为9216个单元,然后是一些全连接层。

最后使用 softmax 函数输出识别的结果,看它究竟是1000个可能的对象中的哪一个。

实际上,你应该可以看出,这个神经网络与 LeNet 有很多相似之处,不过 AlexNet 要大得多,正如前面讲到的 LeNetLeNet-5 大约有6万个参数,而 AlexNet 包含约6000万个参数。当用于训练图像和数据集时,AlexNet 能够处理非常相似的基本构造模块,这些模块往往包含着大量的隐藏单元或数据,这一点 AlexNet 表现出色。

AlexNetLeNet 表现更为出色的另一个原因是它使用了 ReLu 激活函数。
Here Insert Picture Description
同样的,还会讲一些比较深奥的内容,如果你并不打算阅读论文,不听也没有关系。

第一点,在写这篇论文的时候,GPU 的处理速度还比较慢,所以 AlexNet 采用了非常复杂的方法在两个 GPU 上进行训练,大致原理是,这些层分别拆分到两个不同的 GPU 上,同时还专门有一个方法用于两个 GPU 进行交流。

论文还提到,经典的 AlexNet 结构还有另一种类型的层,叫作 局部响应归一化层Local Response Normalization),即 LRN 层,这类层应用得并不多,甚至现在已经没人用了,,,所以没有专门讲。局部响应归一层的基本思路是,假如这是网络的一块,比如是13×13×256,LRN 要做的就是选取一个位置,从这个位置穿过整个通道,能得到256个数字,并进行归一化。
Here Insert Picture Description
你可能会问,为什么要进行局部响应归一化?

对于这张13×13的图像中的每个位置来说,可能并不需要太多的高激活神经元,但是后来,很多研究者发现 LRN 起不到太大作用,这也是它被划掉的内容之一,现在并不用 LRN 来训练网络。

3)VGGNet

这次笔记要讲的第三个,也是最后一个范例是 VGG,也叫作 VGG-16 网络(VGG 有16和19,不过一般说的是16)。

论文地址:https://arxiv.org/pdf/1409.1556.pdf

值得注意的一点是,VGG-16 其实是一个很深的网络,但是它的一大优点是简化了神经网络结构,所以并没有那么多超参数,是一种只需要专注于构建卷积层的简单网络。。
Here Insert Picture Description
输入图像尺寸是224×224×3,进行第一个卷积之后得到224×224×64的特征图,接着还有一层224×224×64,得到这样2个厚度为64的卷积层,意味着进行了两次卷积,这里采用的大小都为3×3,步幅为1,并且都是 same 卷积。

接下来创建一个池化层,池化层将输入图像进行压缩,从224×224×64缩小到多少呢?没错,减少到112×112×64。

然后又是若干个卷积层,使用128个过滤器,以及一些 same 卷积,输出112×112×128。

然后进行池化,可以推导出池化后的结果是56×56×128。

接着再用256个相同的过滤器进行三次卷积操作,然后再池化,然后再卷积三次,再池化,如此进行几轮操作后,将最后得到的7×7×512的特征图进行全连接操作,得到4096个单元,然后进行 softmax 激活,输出从1000个对象中识别的结果。
Here Insert Picture Description
顺便说一下,VGG-16 的这个数字16,就是指在这个网络中包含16个层(卷积层和全连接层),所以确实是个很大的网络,总共包含约1.38亿个参数,即便以现在的标准来看,都算是非常大的网络,但 VGG-16 的结构并不复杂,这点是非常吸引人的,而且网络结构很规整,都是几个卷积层后面跟着可以压缩图像大小的池化层,池化层缩小图像的高度和宽度。

同时,卷积层的过滤器数量变化存在一定的规律,由64翻倍变成128,再到256和512,作者可能认为512已经足够大了,所以后面的层就不再翻倍了。无论如何,每一步都进行翻倍,或者说在每一组卷积层进行过滤器翻倍操作,正是设计此种网络结构的另一个简单原则。这种相对一致的网络结构对研究者很有吸引力,而它的主要缺点是需要训练的特征数量非常巨大。

有些文章还介绍了 VGG-19 网络,它甚至比 VGG-16 还要大,但是由于 VGG-16 的表现几乎和 VGG-19 不分高下,所以很多人还是会使用 VGG-16。另外一点,我最喜欢它的一点是,文中揭示了,随着网络的加深,图像的高度和宽度都在以一定的规律不断缩小,每次池化后刚好缩小一半,而通道数量在不断增加,而且刚好也是在每组卷积操作后增加一倍,也就是说,图像缩小的比例和通道数增加的比例是有规律的,就是一个非常完美的对称结构,从这个角度来看,这篇论文真的很吸引人。

3、总结

以上就是三种经典的网络结构,如果你对这些论文感兴趣,建议的顺序是从介绍 AlexNet 的论文开始,然后就是 VGG 的论文,最后有空了再看 LeNet 的论文,虽然有些晦涩难懂,但对于了解这些网络结构很有帮助。

推荐阅读

Reference article

  • Andrew Ng - "neural networks and deep learning" video course
Published 215 original articles · won praise 4462 · Views 620,000 +

Guess you like

Origin blog.csdn.net/TeFuirnever/article/details/104338254