Day 06: image processing tool - convolution neural network (Convolutional Neural Network)

Convolutional neural network (Convolutional Neural Network)


'Natural user interface' (Natural User Interface, NUI)

This wave of artificial intelligence in the natural user interface (Natural User Interface, NUI) there is a breakthrough, including images (Image, Video), voice (Voice) and text (Text) identification, generation and analysis, through machine had this innate human ability to communicate, to interact with the user not only more accessible, but also to make more reasonable, more wisdom, judgment and reaction on the surrounding environment, in particular, the ability to attach to this product the product application development have unlimited development potential, including unmanned aerial vehicles, UAVs, smart home (Smart home), manufacturing robots (robot), bot (ChatBot) ... and so on.
From a start, one by one we have to explore the image (Image, Video), voice (Voice), text (Text) correlation algorithm, before we only use 10 digits to identify a few lines, so I was just excited, then down, introduce another algorithm 'convolutional neural network' (convolutional neural network, CNN) , which can be automatically 'feature extraction' (feature extraction), so that in the image recognition applications and natural language processing (the NLP), also due to ' introducing the concept of convolution layer '(convolution layer), it can be very effectively reduce the load Neural Network training.

Convolutional neural network (Convolutional Neural Network, CNN)

CNN also mimic the way the human brain's cognitive, for example, we identify a picture, it will first notice the color of bright point, line, surface, after which they constitute a distinct shapes (eyes, nose, mouth ...), which kind of abstraction is the process of establishing the way CNN algorithm model. Convolution layer (Convolution Layer) is the ratio of the local point turn into alignment, characterized in a block judged through progressively integrated stack comparison result, it is possible to obtain good recognition results, as follows FIG.
https://ithelp.ithome.com.tw/upload/images/20171206/20001976ks1ZnVMOYD.png
Figure CNN concept Source:. An Intuitive Explanation of Convolutional Neural Networks

Convolution layer (Convolution Layer)

How do we turn from the point into the surface of it? It is simply an image of each point as the center, taking around N x N grid points constitute a surface (N called Kernal Size, weight N x N matrix re-called 'volume convolution kernel '), each cell given different weights, calculating a weighted sum, as this point Output, and then treated in the same manner as it moves to the next, up to the last point of the image, which is the convolution of CNN layers (a convolution Layer), please refer to the diagram, CS231n: Convolution Demo paragraph Convolutional Neural Networks for Visual Recognition of the article, it is an animated fashion by way of illustration sampling. Convolution layer processing and image processing method analogous manner, using a sliding window (Sliding Window) calculating, by means of administering 'convolution' recombination of different weights, can detect the shape of the edges, corners, but also remove noise (Noise) and sharpen (sharpen) the effect of extraction of these features as a basis for identification, which overcomes the disadvantages of regression (regression) will be affected by 'outliers' (outliers) seriously affect the result of speculation, like that of a man with a long nose a mole, we should be able to identify that it is based on the shape of the nose.
https://ithelp.ithome.com.tw/upload/images/20171206/200019764fPvV5vbGn.png
Figure convolution layer (Convolution Layer) operation mode Source:. CS231n: Convolutional Neural Networks for Visual Recognition

若不引入卷积层,使用单纯的隐藏层(Dense),即第二篇的作法,不仅需要很大的内存,计算也会耗费很长的时间,我们看一个真实的案例,ImageNet 2012 挑战赛的题目,辨识 227 x 227 点的全彩图案,每一点R/G/B各占24 bits,故输入层单一张图的数据量就有 227 x 227 x 96,假设有60,000个样本,隐藏层输出1000个变量,那矩阵运算就是(60000, 227 x 227 x 96) 与 (227 x 227 x 96, 1000)的内积,那是一个多么庞大的循环运算。而卷积层的概念是假设我们在看一张图时,每个神经元只会接收一小块区域的反射光线,称为‘感受野’(receptive field),也就是说,隐藏层的神经元只会连接上一层‘感受野’内的Input(11x11),而不会连接‘所有’的Input(227x227),称之为‘局部连接’(Locally Connected),而非‘完全连接’所有 Input。
https://ithelp.ithome.com.tw/upload/images/20171206/20001976qngwkb1dnN.png
图. ‘完全连接’(Fully Connected) vs.‘局部连接’(Local Connected),图片来源:CS231n Convolutional Neural Networks for Visual Recognition

每个隐藏层的神经元就只跟Input矩阵(11, 11)作运算,运算负担就明显减轻了,另外,还有一个假设,称为‘共享权值’(Shared weights),就是每一个‘感受野’对下一隐藏层均使用相同的一组权重(Weight Matrix),请参阅下图,这样要推估的权重数量减少,又可以减轻运算的负担,所以,运用卷积层的目的就是针对图像或语言的特性,简化计算的过程,进而缩短运算的时间。
https://ithelp.ithome.com.tw/upload/images/20171206/20001976Fi8bleLow0.png
图. ‘权值共享’(Shared weights),图片来源:What exactly is meant by shared weights in convolutional neural network?

在使用卷积层函数(Conv1D、Conv2D、Conv3D...)时,我们可以设定滤波器(Filter)的数目,系统在训练的过程中,就会根据Input图形,帮我们找出图中出现的各种形状滤波器(Filter),例如(+、X、O...),再往下加几层卷积层,我们就可能找出图像会包含的各种特征,例如,眼睛、嘴巴、鼻子等,我们来看卷积四次的滤波器(Filter),图片来源为 https://cs.nyu.edu/~fergus/drafts/utexas2.pdf ,第一层只侦测到线,到了第四层,就几乎得到整个轮廓了。
https://ithelp.ithome.com.tw/upload/images/20171206/20001976R3o43jBPr2.png
图. 第一层滤波器(Filter)。
https://ithelp.ithome.com.tw/upload/images/20171206/20001976ErhREDupYr.png
图. 第二层滤波器(Filter)。
https://ithelp.ithome.com.tw/upload/images/20171206/20001976NX1pYtnkcr.png
图. 第三层滤波器(Filter)。
https://ithelp.ithome.com.tw/upload/images/20171206/20001976ig5SL4Evbr.png
图. 第四层滤波器(Filter)。

池化层(Pooling Layer)

Typically plus a pooled layer (Pooling Layer) convolution between layers, it is important to retain a compressed image and information about a method, method of sampling is taken as a sliding window, but typically takes a maximum value (Max-Pooling), and unweighted sum, if the sliding window size is set to 2, 'sliding step' (Stride) is also 2, the amount of data is reduced to a quarter of the original, but because the maximum value, it is retained on the local area ratio the maximum likelihood. In other words, the pooling of information after the picture is more focused on whether there is a consistent feature, and not in the picture 'where' the existence of these features, CNN help determine whether to include a particular feature of the picture, without concern for the location where the feature, such image shift, the same can be identified and isolated (part of the text quoted the operating principle of self-convolution neural network article).

https://ithelp.ithome.com.tw/upload/images/20171206/20001976muBoxxCMbE.png
Figures Max-Pooling, the window size is 2, 'step' (Stride) also sampling method 2, Source: A Beginner's Guide To Understanding Convolutional Neural Networks

Epilogue

Through multi-layered convolution / pooling, feature extraction as Input, followed to a fully connected to multiple layers, classify, and this is CNN's typical practice, the next we will use CNN to make the identification of Arabic numerals to see what is different, immediately, we will introduce two CNN application, indicating Neural Network not only for classification only.

Understand these concepts have to follow the practice of great help, please be patient reading, after the application of related Benpian are closely linked.
Tomorrow met !!

 

Original: Large column  Day 06: image processing tool - convolution neural network (Convolutional Neural Network)


Guess you like

Origin www.cnblogs.com/petewell/p/11489757.html