How the features of convolution neural network is learning?

https://www.toutiao.com/a6700385442048508420/

 

introduction

Convolution neural network, this stuff may at first sound like a biological computer technology and mathematics and then a little strange mix of things. Strange strange return, have to say, convolutional neural network computer vision is one of the most influential creation.

2012 is the rise of the convolution neural network. This year, Alex Krizhevsky with a convolution neural network took part in the contest ImageNet (which is equivalent to the importance of the Olympic Games) and a blockbuster, the recognition error rate from 26% down to 15% ,. Since then, many companies started using deep learning as their core services. For example, Facebook use in their auto-tagging algorithms it, Google uses in the photo search, Amazon in commodity recommendation, Printerst be used to provide customization services for their families rearing, but Instagram applies to their search engine.

How the features of convolution neural network is learning?

 

However, the neural network is up to the beginning of image processing applications. Then we choose to talk about this, how to use convolution neural network (hereinafter abbreviated CNN) for image classification.

Problem Description

Image classification refers to the machine input a picture, then the machine tells us this picture of category (cat, dog, etc.), or if it is uncertain if it will tell us that the possibility of belonging to a category (it may be a dog, but I'm not sure). For us humans, this thing simply could not be simpler, from birth, we can quickly identify what objects around yes. When we see a scene, we can always quickly identify all objects, even subconsciously, without deliberate thinking. But this capability, the machine does not have. So we have more to cherish their brains ah! _ (: З ゝ ∠) _

How the features of convolution neural network is learning?

 

Input and output

Computers and people see the picture is not the same. When we enter the picture, and get a computer array, record the information of the pixels. The size of the array is determined by the clarity and size of the image. Suppose we have a 480 * 480 size image a jpg format, it also means it is an array of 480 * 480 * 3 size. All figures are an array of pixel information is described at that location, a size between [0,255].

These figures are meaningless to us, but this is the only information a computer can get (is enough). Abstract Simply put, we need to accept a array of input, output, an array representation model belonging to each category probability.

We want the computer doing?

Since the problem we've figured it out, and now we have to think of a way to solve it. We want the computer to do is to find differences between different pictures, and can identify the dog (for example) features.

We humans can be identified by a number of distinctive features of the picture, such as the dog's paws and dog has four legs. Likewise, the computer image may be recognized by the recognition lower level characteristic (curve, a straight line). These features convolution computer recognizes layers and more layers by convolution layers together, like a human can recognize the high-level features like claws and legs, to complete the task. This is done by CNN about the matter the context. Below, we discuss in more detail.

Associated with biology

Before the official start, let's talk about CNN's backstory. When you first heard about the convolution neural network, you might think of something associated with neurological or biology, have to say, convolution neural networks really have a relationship with them.

CNN indeed inspired from the visual cortex in the brain. Some neurons in the visual cortex region of the visual field sensitive only specific. In 1962, test (video) a Hubel and Wiesel conducted, this idea was confirmed and expanded. They found that some independent neuronal excitability at the border only if the specific direction on the horizon. For example, when the number of neurons in the horizontal edge of excitement, while others only appear when a vertical edge. And all such types of neurons in a columnar structure, and is thought to have the ability to produce visual.

In one system, certain components specific role to play (in the visual cortex nerve membered find their specific characteristics). This idea applies to many machines, and also the basic principle behind CNN. (Translator's Note: The author does not make clear analogy to CNN, we should be looking for different convolution kernel image in different features)

Neural Network Architecture

Back to the topic.

In more detail, CNN workflow is this: you take a picture passed to the model, after some convolution layer, nonlinear (activation function), pooling, and even the whole layer, and finally get the results. As we previously stated, the output can be a single type, it can be a set of probabilities belong to different types. Now, the most difficult part comes: understand the role of each layer.

(Convolution layer) of the first layer - Mathematical Description

First, you have to figure out is what kind of data entry convolution layer. As we mentioned before, it is an array of input pixel values of a record 32 × 32 × 3 (for example) a. Now, let me explain what convolution layer Yes. The best way to explain convolution layer, imagine a flashlight at the top left of the picture. Let us assume that light flashlight can hire a region of 5 × 5. Now, let us imagine that the torch has passed them a picture of all regions. Machine learning terminology, this is referred to as a flashlight convolution kernel (or said filter, neuronal) (Kernel, filter, Neuron) . And it strikes the region is referred to as perceptual domain (receptive Field) . Convolution is also an array (which is referred to as the number or weight parameter). It is important that the depth and the depth of the input image convolution kernel is the same (which guaranteed it will work), so here convolution kernel size is 5 × 5 × 3.

Now, let us take the initial position of the convolution kernel as an example, it should be in the upper left corner of the image. When scanning its perception convolution kernel domain (i.e. this region of the upper left corner of FIG. 5 × 5 × 3) is, it will save their weights multiplying pixel values in the image (or, each matrix element multiplying the matrix multiplication distinguish noted), the resulting product will be added together (in this position, the convolution kernel will be 5 × 5 × 3 = 75 th product). Now you get a number. However, this figure illustrates only the case convolution kernel in the upper left corner of the image. Now, we repeat the process, so that the convolution kernel scan the entire picture, (the next step should be moved one space to the right, then the next step is further to the right a grid, and so on), each with a different position have had a number. When scanning a complete picture, you'll get a new set number of 28 × 28 × 1. (Translator's Note: (32 - +. 1. 5) × (32 - +. 1. 5) ×. 1) . This set of numbers, we call or feature activation maps FIG (or Feature Activation Map Map) .

How the features of convolution neural network is learning?

 

If you increase the number of convolution kernel, for example, we now have two convolution kernels, then we will get an array of 28 × 28 × 2 in. Spatial dimensions by using more convolution kernel, we can better retain data.

Mathematically level, the layer that's what convolution do.

The first layer (Layer convolution) - a higher angle

Let's talk about, from a higher perspective, convolution doing. Each convolution kernel can be seen as feature recognizer. I said feature means something straight, simple colors, curves and the like. These are all pictures trait. Take a convolution kernel of the 7 × 7 × 3 as an example, its role is to recognize one curve. (In this section, for simplicity, we ignore the depth of the convolution kernel, considering only the first layer). As a profile identifier, the convolution kernel configuration, greater numbers in the area of ​​the curve. (Remember, the convolution kernel is an array)

How the features of convolution neural network is learning?

 

Now let's look at this intuitive. For example, suppose we want to classify this picture. Let's put this convolution kernels at hand in the upper left corner of the picture.

How the features of convolution neural network is learning?

 

Remember that we do is the convolution kernel of weights and multiplying the pixel values ​​of the input picture.

How the features of convolution neural network is learning?

 

(Translator's Note: The figure should be at the bottom because many are so put 0 0 skip do not write.)

基本上,如果输入图像中有与卷积核代表的形状很相似的图形,那么所有乘积的和会很大。现在我们来看看,如果我们移动了卷积核呢?

How the features of convolution neural network is learning?

 

可以看到,得到的值小多了!这是因为感知域中没有与卷积核表示的相一致的形状。还记得吗,卷积层的输出是一张激活图。所以,在单卷积核卷积的简单情况下,假设卷积核是一个曲线识别器,那么所得的激活图会显示出哪些地方最有可能有曲线。在这个例子中,我们所得激活图的左上角的值为6600。这样大的数字表明很有可能这片区域中有一些曲线,从而导致了卷积核的激活(译者注:也就是产生了很大的数值。)而激活图中右上角的数值是0,因为那里没有曲线来让卷积核激活(简单来说就是输入图像的那片区域没有曲线)。

但请记住,这只是一个卷积核的情况,只有一个找出向右弯曲的曲线的卷积核。我们可以添加其他卷积核,比如识别向左弯曲的曲线的。卷积核越多,激活图的深度就越深,我们得到的关于输入图像的信息就越多。

在文中提到的卷积核的主要目的是说明,是经过简化的。在下图中你会看到真正的经过训练后的神经网络中第一层卷积层中卷积核可视化后的样子。不管怎样,道理还是一样的。第一层的卷积核扫描整张网络,并在识别到相应特征时激活。

How the features of convolution neural network is learning?

 

走向网络的深处

在传统的CNN结构中,还会有其他层穿插在卷积层之间。我强烈建议有兴趣的人去阅览并理解他们。但总的来说,他们提供了非线性化,保留了数据的维度,有助于提升网络的稳定度并且抑制过拟合。一个经典的CNN结构是这样的:

How the features of convolution neural network is learning?

 

网络的最后一层很重要,我们稍后会讲到它。

现在,然我们回头看看我们已经学到了什么。

我们讲到了第一层卷积层的卷积核的目的是识别特征,他们识别像曲线和边这样的低层次特征。但可以想象,如果想预测一个图片的类别,必须让网络有能力识别高层次的特征,例如手、爪子或者耳朵。让我们想想网络第一层的输出是什么。假设我们有5个5 × 5 × 3的卷积核,输入图像是32 × 32 × 3的,那么我们会得到一个28 × 28 × 5的数组。来到第二层卷积层,第一层的输出便成了第二层的输入。这有些难以可视化。第一层的输入是原始图片,可第二层的输入只是第一层产生的激活图,激活图的每一层都表示了低层次特征的出现位置。如果用一些卷积核处理它,得到的会是表示高层次特征出现的激活图。这些特征的类型可能是半圆(曲线和边的组合)或者矩形(四条边的组合)。随着卷积层的增多,到最后,你可能会得到可以识别手写字迹、粉色物体等等的卷积核。

如果,你想知道更多关于可视化卷积核的信息,可以看这篇研究报告,以及这个视频。

还有一件事情很有趣,当网络越来越深,卷积核会有越来越大的相对于输入图像的感知域。这意味着他们有能力考虑来自输入图像的更大范围的信息(或者说,他们对一片更大的像素区域负责)。

全连层

到目前为止,我们已经识别出了那些高层次的特这个吧。网络最后的画龙点睛之笔是全连层。

简单地说,这一层接受输入(来自卷积层,池化层或者激活函数都可以),并输出一个N维向量,其中,N是所有有可能的类别的总数。例如,如果你想写一个识别数字的程序,那么N就是10,因为总共有10个数字。N维向量中的每一个数字都代表了属于某个类别的概率。打个比方,如果你得到了[0 0.1 0.1 0.75 0 0 0 0 0 0.05],这代表着这张图片是1的概率是10%,是2的概率是10%,是3的概率是75%,是9的概率5%(小贴士:你还有其他表示输出的方法,但现在我只拿softmax(译者注:一种常用于分类问题的激活函数)来展示)。全连层的工作方式是根据上一层的输出(也就是之前提到的可以用来表示特征的激活图)来决定这张图片有可能属于哪个类别。例如,如果程序需要预测哪些图片是狗,那么全连层在接收到一个包含类似于一个爪子和四条腿的激活图时输出一个很大的值。同样的,如果要预测鸟,那么全连层会对含有翅膀和喙的激活图更感兴趣。

基本上,全连层寻找那些最符合特定类别的特征,并且具有相应的权重,来使你可以得到正确的概率。

How the features of convolution neural network is learning?

 

训练(也即:如何让网络工作)

现在让我们来说说我之前有意没有提到的神经网络的可能是最重要的一个方面。刚刚在你阅读的时候,可能会有一大堆问题想问。第一层卷积层的卷积核们是怎么知道自己该识别边还是曲线的?全连层怎么知道该找哪一种激活图?每一层中的参数是怎么确定的?机器确定参数(或者说权重)的方法叫做反向传播算法。

在讲反向传播之前,我们得回头看看一个神经网络需要什么才能工作。我们出生的时候并不知道一条狗或者一只鸟长什么样。同样的,在CNN开始之前,权重都是随机生成的。卷积核并不知道要找边还是曲线。更深的卷积层也不知道要找爪子还是喙。

等我们慢慢长大了,我们的老师和父母给我们看不同的图片,并且告诉我们那是什么(或者说,他们的类别)。这种输入一幅图像以及这幅图像所属的类别的想法,是CNN训练的基本思路。在细细讲反向传播之前,我们先假设我们有一个包含上千张不同种类的动物以及他们所属类别的训练集。

反向传播可以被分成四个不同的部分。前向传播、损失函数、反向传播和权重更新。

在前向传播的阶段,我们输入一张训练图片,并让它通过整个神经网络。对于第一个输入图像,由于所有权重都是随机生成的,网络的输出很有可能是类似于[.1 .1 .1 .1 .1 .1 .1 .1 .1 .1]的东西,一般来说并不对任一类别有偏好。具有当前权重的网络并没有能力找出低层次的特征并且总结出可能的类别。

下一步,是损失函数部分。注意,我们现在使用的是训练数据。这些数据又有图片又有类别。打个比方,第一张输入的图片是数字“3”。那么它的标签应该是[0 0 0 1 0 0 0 0 0 0]。一个损失函数可以有很多定义的方法,但比较常见的是MSE(均方误差)。被定义为(实际−预测)22(实际−预测)22。

How the features of convolution neural network is learning?

 

记变量L为损失函数的值。正如你想象的那样,在第一组训练图片输入的时候,损失函数的值可能非常非常高。来直观地看看这个问题。我们想到达CNN的预测与数据标签完全一样的点(这意味着我们的网络预测的很对)。为了到达那里,我们想要最小化误差。如果把这个看成一个微积分问题,那我们只要找到哪些权重与网络的误差关系最大。

How the features of convolution neural network is learning?

 

这就相当于数学中的δLδWδLδW(译者注:对L关于W求导),其中,W是某个层的权重。现在,我们要对网络进行反向传播。这决定了哪些权重与误差的关系最大,并且决定了怎样调整他们来让误差减小。计算完这些导数以后,我们就来到了最后一步:更新权重。在这里,我们以与梯度相反的方向调整层中的权重。

How the features of convolution neural network is learning?

 

There is a learning rate parameter programmer decision. A high learning rate means the weight adjustment rate will be large, which may have a faster set of weights make good models. However, the pace of learning is too high a rate could make the adjustment is too large, but not exactly reach the optimum point.

How the features of convolution neural network is learning?

 

Forward propagation, loss of function, back-propagation and update the weights, these four processes is the next iteration. Each group training program will repeat the process images (often referred to as a group of pictures a batch). After completion of the training each image, it is likely your network has been trained well, the weight has been adjusted well.

test

Finally, in order to verify whether the CNN job well, we have another special set of data. We see this picture set of data entered into the network, and get the output and compare labels, so you can see how the performance of the network.

Disclaimer

Although this article is a study of CNN good start, but this is not a comprehensive description. For non-linear, the network layer and the pool ultra parameters (such as the size of the convolution kernel, steps, edge processing) and not discussed in this article. As well as network configuration, data normalization, disappearance of the gradient, Dropout (Translator's Note: a method of preventing overfitting network, find a suitable translation) , initialization techniques, non-convex optimization, offset the loss function selected , data enhancement, standardization of methods, as well as considerations relating to operations, counter-propagating optimization, etc. we have not discussed.

Guess you like

Origin blog.csdn.net/weixin_42137700/article/details/91360087