Popular Science - convolution depth learning

Convolution purpose is to extract useful features from the input. In image processing, there are a lot of filters can be for us to choose. Each filter help us extract different features. Such as horizontal / vertical / diagonal edges and the like. In CNN, extract different features by convolution, right filter automatically during training focuses on learning. Then all extracted feature "combined" to make a decision.

 

Convolution advantage is that the weight sharing and translation invariance. It also takes into account the relationship between the pixel space, which is useful, especially in computer vision tasks, since these tasks usually involve identifying a target spatial relationships. (Example: the dog's body is usually connected head, limbs and tail).

 

Single-channel version

Convolution single channel

 

In depth study, convolution is an element of the elements of addition and multiplication. As shown having a channel for image convolution FIG. Here the filter is a 3x3 matrix [[0,1,2], [2,2,0], [0,1,2]]. Glide input filter, a convolution is completed at each location, each of the slide positions to obtain an output. (Note that in the above example, stride = 1, padding = 0)

 

Multi-channel version

In many applications, we need to deal with multi-channel images. The most typical example is the RGB image.

Different channels to emphasize different aspects of the original image

Another example of multi-channel data is in the CNN layer. Convolutional network is generally composed of a plurality of channels (typically hundreds of channels). Different aspects of each channel described in the previous layer. How do we convert between layers of different depths? How to a depth of layer n is converted into m depth of the next layer?

 

Before describing this process, we first introduce some terminology: layers (layer), channels (channel), feature maps (FIG feature), Filters (filters), kernels' (convolution). View from the hierarchy, the concept of layers and filters at the same level, and a structure of the channel in the next convolution. And wherein the channel is the same thing FIG. One can have a plurality of channels (or feature image). If the input is an RGB image, then there will be three channels. "Channel" is generally used to describe the structure of "layer" of. Similar, "kernel" is used to describe the structure of "filter" the.

filter和kernel之间的不同很微妙。很多时候,它们可以互换,所以这可能造成我们的混淆。那它们之间的不同在于哪里呢?一个"kernel"更倾向于是2D的权重矩阵。而'filter"则是指多个Kernel堆叠的3D结构。如果是一个2D的filter,那么两者就是一样的。但是一个3Dfilter, 在大多数深度学习的卷积中,它是包含kernel的。每个卷积核都是独一无二的,主要在于强调输入通道的不同方面。

 

讲了概念,下面我们继续讲解多通道卷积。将每个内核应用到前一层的输入通道上以生成一个输出通道。这是一个卷积核过程,我们为所有Kernel重复这样的过程生成多个通道。然后把这些通道加在一起形成单个输出通道。下图:

 

输入是一个5x5x3的矩阵,有三个通道。filter是一个3x3x3的矩阵。首先,filter中的每个卷积核分别应用于输入层的三个通道。执行三次卷积,产生3个3x3的通道。

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Guess you like

Origin www.cnblogs.com/elitphil/p/12040671.html