Deep learning with Theano official Chinese Tutorial (translation) (d) - convolutional neural network (CNN)

For everyone to interact and learn, I am limited, if various sizes error, please giant cow cattle calf cow who flew micro Paizhuan, so as to make progress together! If the reference translation please indicate the source http://www.cnblogs.com/charleshuang/ .

 This article translated: http://deeplearning.net/tutorial/lenet.html

Screenshot of the article code is not very clear, you can go to the above URL to view the original text.

 

1, motivation

   Convolutional neural network (CNN) is a multi-layer perceptron (MLP) is a variant of the model, it is from a biological concept evolved. Hubel and Wiesel from the early research work on the cat's visual cortex, we know there is a cellular complex distribution of these cells in the visual cortex ,, local is very sensitive to input the outside world, they are called "receptive field" (cell ), which in some way to cover the entire visual field. These cells are the same as some filters, which are sensitive to the partial image is input, thus better able to dig out a natural image of the spatial relationship of the target information.

   In addition, there are two types associated visual cortex cells, S cells (Simple Cell) and C (Complex Cell) cells. S cells in their receptive fields in the image to make the maximum edge patterns similar response to stimuli, and cell C has a larger receptive field, it can be precisely positioned on the spatial position of a stimulus pattern image.

   Currently it is known as the visual cortex of the most powerful vision system wide attention. There have been many academic fields based on its nerve heuristic model. For example: NeoCognitron  [Fukushima] , HMAX  [Serre07]  and key LeNet-5 to be discussed in this tutorial  [LeCun98] .

 

2, sparse connection

   CNNs by strengthening the local connection mode (Local Connectivity Pattern) neural network nodes between adjacent layers to exploit the natural image of the partial correlation (the target of interest) spatial information. Local node a subset of nodes in the first layer is m-1 m-th layer of the hidden layer, and having a space continuous visual receptive field node (m-1 layer is part of the node, which nodes are part of the m 1-layer adjacent) is connected. You can use the following to express this connection FIG.

   Suppose, m-1 layer is an input layer of the retina (the image receiving NATURAL). The description of the diagram, the node neurons m 1-m layer top layer has a width below the connection retinal layer 3 adjacent nodes receptive fields, layer 3 m in each node. m + 1 node layer and the layer below it has a similar connection node attributes, node m + 1 m layer remains connected to the layer 3 adjacent nodes, but for the input layer (retinal layer) connections to increases, and 5 in this figure. This structure The trained filter (corresponding to the input producing the strongest response) constructed becomes a local spatial mode (since each node only upper, lower topical connected nodes responsive to the receptive field). The top view, a multilayer stacked filter formed (the longer linear), it becomes a more global (e.g., comprising a large space pixel). For example, In the figure, first layer encoding m + 1 (in terms of pixel space) can be of a width of 5 nonlinearity.

 

3, weight sharing

   In CNNs, each of the sparse filters hi are repeated throughout the receptive field superimposed, these nodes in the form of a repeated characteristic graph (feature map), the map may share the same special parameters, such as the same weight matrix and offset vector.

 

   In the figure, three belonging to the same feature hidden layer nodes of FIG, because of the need to share the same color heavy weight, they are limited to the same. Here, the gradient descent algorithm can still be used to train these shared parameters, need only minor changes to the basis of the original algorithm. Weight gradient may be sharing rights gradient shared parameters simple summation.

   Why so interested in sharing the weight of it? No matter what position the repeating units in the field of experience, they can be detected characteristics. In addition, the weight sharing provides an efficient way to achieve this, because in this way greatly reduces the number of parameters need to learn (training) of. If the control capacity of a good model, CNN will have better generalization ability in solving computer vision problem.

 

4, the detailed description and label description

   Conceptually, characterized by FIG convolve the input image on a linear filter, increasing a bias term, and then acting on this obtained result a nonlinear function. If we put a layer of the k-th feature is HK stamp, its filter weights Wk and the bias bk determined, the feature map is defined by the following equation (non-linear function takes tanh):

 

Convolution Description:

 

In order to better represent the data in the hidden layer is constituted by a plurality of feature maps, {hk, k = 1,2,3 ... K}. W weights determined by the four parameters (index FIG target features, wherein the source of FIG. index, index source vertical position, horizontal position index source) (W can be said to be a 4-dimensional tensor), b is a bias vector, each vector element corresponding to a feature index of FIG. We use the following diagram to represent:

 

FIG 2 is an upper layer comprising neurons node CNN, comprising FIGS. 4 and wherein m m-1 layer 2 layer features (h0, h1) FIG. Neurons h0, h1 output (pixel) was calculated by the pixel in the m-1 layer, its receptive field of 2 * 2. Note here that the field is how to feel across four characteristic graph, the weights W0, W1 is a three-dimensional tensor (3D tensor), a characteristic diagram showing the input index, the other two represent pixel coordinates. In general, each represents a right pixel on the m-th layer connected to the k-th feature characteristic graph of FIG weight connected thereto is first l m-1 layer in FIG feature with coordinates (i, j) of pixels.

 

5 ConvOp

   ConvOp is an implementation of the convolution Theano layer. It repeats the function-Scipy in scipy.signal.convolve2d, in general, ConvOp includes two input (parameter):

   (1) corresponding to the input image 4D tensor of mini-batch. Each tensor size: [mini-batch size, the width of the number of input feature map, the height of the image, image].

   (2) corresponding to the weight W of the tensor 4D. Each tensor size: [wherein the number m of FIG layer, the width of the number m-1 layer in FIG feature, the height of the filter, the filter].

   The following code implements convolution layer in Figure 1, the input comprises a size of 120 * 160, wherein FIG. 3 (corresponding to the RGB). We can have two convolution filters receptive field 9 * 9.

 

 

We have found that filters can be initialized randomly generates an edge detection operator action.

In addition, we use the same formula for the weights is initialized, these weights are from a range of [-1 / fan-in, 1 / fan-in] the uniform distribution sampled, fan-in is hidden number of nodes in the input layer, the MLP, the fan-in node that is the number one below, but for CNN, we need to take into account the number of input feature map, and the size of the receptive field.

 

6, the largest pool of

   Another important concept is CNNs pooled maximum (max-pooling), which is a non-linear method of downsampling. The maximum of the input image into a cell to become a matrix do not overlap, each sub-region (rectangular region), the maximum value are output.

   Technology is the largest pool in vision problems in very useful for two reasons: (1) reducing the upper computational complexity. (2) invariance proposed a form of change. To understand this invariance, we assume that the maximum cell layer and a base layer combining volumes, for a single pixel, there are eight directions transform, if the maximum total layer implemented in the window above 2 * 2, the eight possible configuration, there are three convolution can accurately generating layer and the same results. If the window becomes the probability of 3 * 3, then produce accurate results became 5/8.

   Therefore, it has a good change for the displacement of robustness, the largest pool of in a very flexible way to reduce the dimensions of the intermediate representation layer.

   The largest pool of the Theano in theano.tensor.signal.downsample.max_pool_2d achieved. This function in a N-dimensional tensor as input (N> 2), and a scaling factor to this maximum pooled tensor transformation. Here is the sample code:

 

注意到和大部分代码不同的是,这个函数max_pool_2d 在创建Theano图的时候,需要一个向下采样的因子ds (长度为2的tuple变量,表示了图像的宽和高的缩放. 这个可能在以后的版本中升级。

 

7、一个完整的CNN模型:LeNet

   稀疏性、卷积层和最大池化是LeNet系列模型的核心概念。犹豫模型细节变化较大,我们用下图来展示整个LeNet模型。

 

模型的低层由卷基层和最大池化曾组成,高层是一个全连接的MLP神经网络(隐层+逻辑回归,ANN),高层的输入是下层特征图的集合。

从实现的角度讲,这意味着低层操作了4D的张量,这个张量被压缩到了一个2D矩阵表示的光栅化的特征图上,以便于和前面的MLP的实现兼容。

   

 

8、全部代码

 

 

 

应该注意的是,在初始化权重的时候,fan-in是由感知野的大小和输入特征图的数目决定的。最后,采用前面章节定义的LogisticRegressionHiddenLayer类,LeNet就可以工作了。

 

9、注意要点和技巧

   超参数选择:由于CNNs比标准的MLP有着更多的超参数,所以CNNs的模型训练是很需要技巧的。不过传统的学习率和惩罚项仍然是需要使用的,下面说的的这些技巧在优化CNNs模型的过程中需要牢记。

  (1)滤波器的数量选择:在选定每一层的滤波器的数量的时候,要牢记计算一个卷积层滤波器的激活函数比计算传统的MLPs的激活函数的代价要高很多!假设第(i-1)层包含了Ki-1个特征图和M*N个像素坐标(如坐标位置数目乘以特征图数目),在l层有Kl个m*n的滤波器,所以计算特征图的代价为:(M-m)*(N-n)*m*n*Kl-1。整个代价是Kl乘级的。如果一层的所有特征图没有和前一层的所有的特征图全部连起来,情况可能会更加复杂一些。对于标准的MLP,这个代价为Kl * Kl-1,Kl是第l层上的不同的节点。所以,CNNs中的特征图数目一般比MLPs中的隐层节点数目要少很多,这还取决于特征图的尺寸大小。因为特征图的尺寸随着层次深度的加大而变小,越靠近输入,所在层所包含的特征图越少,高层的特征图会越多。实际上,把每一次的计算平均一下,输出的特征图的的数目和像素位置的数目在各层是大致保持不变的。To preserve the information about the input would require keeping the total number of activations (number of feature maps times number of pixel positions) to be non-decreasing from one layer to the next (of course we could hope to get away with less when we are doing supervised learning).所以特征图的数量直接控制着模型的容量,它依赖于样本的数量和任务的复杂度。

  (2)滤波器的模型属性(shape):一般来说,在论文中,由于所用的数据库不一样,滤波器的模型属性变化都会比较大。最好的CNNs的MNIST分类结果中,图像(28*28)在第一层的输入用的5*5的窗口(感受野),然后自然图像一般都使用更大的窗口,如12*12,15*15等。为了在给定数据库的情况下,获得某一个合适的尺度下的特征,需要找到一个合适的粒度等级。

  (3)最大池化的模型属性:典型的取值是2*2或者不用最大池化。比较大的图像可以在CNNs的低层用4*4的池化窗口。但是要需要注意的是,这样的池化在降维的同事也有可能导致信息丢失严重。

  (4)注意点:如果想在一些新的数据库上用CNN进行测试,可以对数据先进行白化处理(如用PCA),还有就是在每次训练迭代中减少学习率,这样可能会得到更好的实验效果。

转载于:https://www.cnblogs.com/charleshuang/p/3651843.html

Guess you like

Origin blog.csdn.net/weixin_34153893/article/details/93677532