Some think the depth of learning

This article summarizes some thoughts on the record depth of learning. Stream of consciousness writing, think of what to write where, from time to time update supplement in the future.

In the absence of contact with deep learning, I feel this is a very tall on the technical, mathematical basis required very large, very high threshold to get started. I think a lot of people and I have the same idea. This impression of depth learning, I think it's large part from the overwhelming media reports about AI self-interpretation, resulting in a deep learning, artificial intelligence, feeling very tall. in fact there is no theoretical basis related to media people, and even no engineering background, but also to attract the article written people can bring traffic, sometimes it is inevitable exaggeration, what various terms neural networks, artificial intelligence, gradient explosion, parallel computing optimization, people do not come into contact with the related art, it is very likely to cause large on a demanding very mysterious impression .

In fact, the neural network is not anything new, as early as a few years there, convolution is not a new thing, it has long been widely used in pattern recognition, image processing in. Now, deep learning Rounds, or because the times at this stage, considered the force of development, the development of the chip, the faster the gpu appear better computing optimization technology, the Internet age we have a lot of data, making the training possible convolution neural network and not only at the theoretical level. large amount of training data, so that the depth of learning have better performance in many business areas, particularly in image processing.

Depth study of nature

Let me talk Conclusion: The essence of it is to find the most suitable convolution kernel, the most appropriate feature weights to maximize fit the training sample. Nature is calculated.

To image processing, for example, the traditional method of how to do? + Manual extraction feature traditional machine learning. Depth learning how to do? End-to-learn, no longer need to manually extract the features of their own learning characteristics.
For example, the following figure depicts you to easily recognize this is a cat.

Have you ever wondered how your brain is recognized that a cat? Through the eyes, ears, mouth, tail, legs? Or by a combination of these? Here the "ears, mouth, tail," and so is the so-called "feature." Although you do not realize your brain through a series of complex operations, but it really is the result of a complex series of operations, then tell you it's a cat. But the speed of neurons too fast, and you do not realize it.

For a computer, the above figure is just a bunch of numbers, such as 800 * 600 chart, is a matrix of 800 * 600 * 3, in order to simplify the expression, use grayscale for instance, that is 800 * 600 matrix, the matrix elements in the value of the corresponding pixel value represents the course "pixel value" concept, is our man given by the computer only what pixel value is not the pixel values ​​do not care, it is, is a digital only.

That question is, what kind of figures represent the "cat's eyes", what kind of figures represent the "cat's ears"? Before the job is done manually in. Now is the neural network to automatically do .
For example, for five consecutive pixel value of 1,34,67,89,213, I think it's five pixel represents owl, this is my rule pre-defined. (of course, this is my blind definition, which of course can not represent five pixels cat's head). Do it manually, you need to understand the business, such as vehicle identification and recognition cats (1,34,67,89,213) in the cat in the picture may represent a cat's head in the car in the picture may represent ass car, its meaning is different a. It is unfortunate, different areas, different rules, and that his mother is the engineering characteristics of metaphysics ah, features thousands find also find a finish, good and bad characteristic features of the final results of the identification band the impact to vary widely, which focused on how to measure the characteristics considered? .

The depth of learning, the input is just a bunch of numbers, it does not understand the meaning of numbers represents the business, it does not need to know, it just try out the most suitable convolution matrix such that the minimum loss on the line, in front also said, nature is calculated.

Here you can also see the traditional method and the method of deep learning, the idea is entirely on two ideas. The former is more looking at the definition of the rule, which is to extract features of the latter are more valued input data, as long as you enter data enough, the quality is good enough, I will be able to automatically extract the effective features and weights. (of course, this is also the meaning given to our people, to learn it is actually just a bunch of number, we call them features, this time, they no longer represent the cat's head, the ears, etc., may be a point, a line has been unable to understand the naked eye looked, as shown below)

which is also mentioned earlier, the reason why the depth of learning the history of the station to station before one of the two reasons: massive data.

Now it comes to the concept of convolution.

convolution


Why is
this convolution nuclear energy play a role in edge detection? Think about how you are to judge edges of the image? If you were given a solid graph, the edge of it?
For example,
apparently did not ah. Because pure color chart, each matrix of pixel values are the same, it is too close to the edge. Look out now, you are judged by different judges with neighboring pixel is not the edge. The greater the difference, the more likely that edge. Now look at the above convolution kernel, it is clear now, after the convolution (convolution operation do not know, take a look at the top essay https://www.cnblogs.com/sdu20112013/p/10149529.html). each a pixel value x turned into 8 * x + (-1) * peripheral pixels, i.e., 8 * x - peripheral pixels. This is the time to compare the difference in the current pixel and the surrounding pixels do? So complete convolution matrix, graphics plotted, have the effect of edge.

According to this idea, it is easy to understand why different different convolution kernels have the effect, that is, we have different features are extracted out. A lot of people a lot better design convolution kernel, perform different functions, respectively. The traditional image processing, is to use these convolution kernel, which together with the rules, their own business data to complete the adaptation of feature extraction.

好,重点来了.上面说了,"边缘"这个特征被提取出来了,那对这个代表图像边缘的矩阵,叫matrix_a吧,继续找一个卷积矩阵kenerl_a,对其做卷积,得到matrix_b,这个martrix_b什么意义呢?再对这个martrix_b做卷积,得到martrix_c,这个martix_c又代表啥呢?答案是我们不知道,像上面提到的,最终的矩阵绘制出来,可能已经是一个点,一条线了,我们已经无法肉眼识别他们在现实世界的对应物体了.但并不代表这是无意义的,不同于猫的眼睛,耳朵等等这些高级特征,这时候得到的这些点啊线啊,已经是非常抽象的低级特征了.而图片正是由这些大量的低级特征组成的.

深度学习干的啥事?就是寻找成千上万的卷积核,得到成千上万的特征,然后用分类也好,回归也罢,认为我们的目标=特征权重*特征之和.比如obj=0.3*feature1 + 0.5*feature2,obj=1代表猫,obj=2代表狗. 这样拿到一个新的图片,输入给模型,模型通过卷积就计算出对应的feature1,feature2,然后计算obj,然后我们就知道了这张图是猫和狗.

当然,卷积核不是瞎找的,卷积核矩阵里面的数字到底填几,要是一个个瞎试,再牛逼的gpu,再牛逼的芯片也试不完啊.这里面就涉及到损失函数定义,梯度下降了.
详细的去看我机器学习的文章吧,不想看的就知道模型学习的过程里,卷积核的值填什么不是随机乱填的,每次反向传播更新卷积核的时候都是朝着让loss更小,也就是让模型更准确(所谓更准确,是针对你的训练数据来说的,同样的网络结构,你机器上跑出来的模型的参数和别人跑出来的模型参数是不一样的,如果你们的训练数据不一样的话)这样一个目标去更新的就完了.

怎么设计出一种滤波器/卷积核


比如上图的卷积核可以识别右边的曲线.道理也是很显然的,上图的卷积核的形状就是类似我们想要的曲线的形状的.如果遇到类似形状的图像,卷积(对应位置像素值相乘再相加)之后得到的数会很大,反之很小.这样就把想要的形状的曲线识别出来了.

与信号处理的关系

大学的时候,学信号处理,天天就是各种傅里叶变换,完全不知道有啥用.说实在的,大学的很多老师水平其实也不咋地,基本就是照本宣科,要么放万年不变的PPT,可能自己都不能深刻理解,或者与产业界太脱离,完全不讲这些理论的现实应用.其实讲清楚这些现实意义也没那么难么.所以还在上学的同学们,要好好学习啊,要好好学习啊,要好好学习啊,重要的事情说三遍,你现在以为没用的东西,不知道哪天就派上用场了.

现在回头看,卷积不就是离散的傅里叶变换吗. 从信号的角度理解卷积,卷积核不就是滤波器吗,卷积核对图像的作用,不就是对图像这种信号做滤波吗.啥叫滤波,其实也就是特征提取。
傅里叶变换将时域和空域信息-->转换到频域上. 对图像处理而言,我们处理的大部分时候是空域的信息.说人话就是空间信息,对单帧图像而言,我们卷积出来的特征,点也好,线也罢,是一种形状,是空间上的信息. 连续的图像才存在这时间信息,多帧图像是有联系的,比如视频,时域信息就很重要了.

https://www.zhihu.com/question/20099543/answer/13971906

首先说说图像频率的物理意义。图像可以看做是一个定义为二维平面上的信号,该信号的幅值对应于像素的灰度(对于彩色图像则是RGB三个分量),如果我们仅仅考虑图像上某一行像素,则可以将之视为一个定义在一维空间上信号,这个信号在形式上与传统的信号处理领域的时变信号是相似的。不过是一个是定义在空间域上的,而另一个是定义在时间域上的。所以图像的频率又称为空间频率,它反映了图像的像素灰度在空间中变化的情况。例如,一面墙壁的图像,由于灰度值分布平坦,其低频成分就较强,而高频成分较弱;而对于国际象棋棋盘或者沟壑纵横的卫星图片这类具有快速空间变化的图像来说,其高频成分会相对较强,低频则较弱(注意,是相对而言)。

图像的空间信息丢掉了是什么意思

先看CNN中全连接层参数是怎么来的.参考https://zhuanlan.zhihu.com/p/33841176.

以VGG-16举例,在VGG-16全连接层中,对224x224x3的输入,最后一层卷积可得输出为7x7x512,如后层是一层含4096个神经元的FC,则可用卷积核为7x7x512x4096的全局卷积来实现这一全连接运算过程。

这样做会有什么好处和问题?

好处和坏处是一样的,就是去除掉位置信息的影响.主要看你处理的是什么问题.对分类来说,我们不关心位置,希望某种像素组合被识别为某种特征,我们不在乎这种像素组合在图片矩阵的什么位置出现,我都要能识别它,这时候就是好处.

但是对于图像分割来说,就是坏处了.因为我需要知道位置信息.比如需要知道图片里的猫在左上角还是右下角,这样才能准确分割.所以分割模型会用卷积层替代掉全连接层.

在我写这篇文章的时候,我做了一点google,想看看有没有人写过类似的主题,发现有2篇文章写的很好,我也引用了部分图,推荐之。

Guess you like

Origin www.cnblogs.com/sdu20112013/p/10958960.html