Popular Science - convolution depth learning

Convolution purpose is to extract useful features from the input. In image processing, there are a lot of filters can be for us to choose. Each filter help us extract different features. Such as horizontal / vertical / diagonal edges and the like. In CNN, extract different features by convolution, right filter automatically during training focuses on learning. Then all extracted feature "combined" to make a decision.


Convolution advantage is that the weight sharing and translation invariance. It also takes into account the relationship between the pixel space, which is useful, especially in computer vision tasks, since these tasks usually involve identifying a target spatial relationships. (Example: the dog's body is usually connected head, limbs and tail).


Single-channel version

Convolution single channel


In depth study, convolution is an element of the elements of addition and multiplication. As shown having a channel for image convolution FIG. Here the filter is a 3x3 matrix [[0,1,2], [2,2,0], [0,1,2]]. Glide input filter, a convolution is completed at each location, each of the slide positions to obtain an output. (Note that in the above example, stride = 1, padding = 0)


Multi-channel version

In many applications, we need to deal with multi-channel images. The most typical example is the RGB image.

Different channels to emphasize different aspects of the original image

Another example of multi-channel data is in the CNN layer. Convolutional network is generally composed of a plurality of channels (typically hundreds of channels). Different aspects of each channel described in the previous layer. How do we convert between layers of different depths? How to a depth of layer n is converted into m depth of the next layer?


Before describing this process, we first introduce some terminology: layers (layer), channels (channel), feature maps (FIG feature), Filters (filters), kernels' (convolution). View from the hierarchy, the concept of layers and filters at the same level, and a structure of the channel in the next convolution. And wherein the channel is the same thing FIG. One can have a plurality of channels (or feature image). If the input is an RGB image, then there will be three channels. "Channel" is generally used to describe the structure of "layer" of. Similar, "kernel" is used to describe the structure of "filter" the.

filter和kernel之间的不同很微妙。很多时候,它们可以互换,所以这可能造成我们的混淆。那它们之间的不同在于哪里呢?一个"kernel"更倾向于是2D的权重矩阵。而'filter"则是指多个Kernel堆叠的3D结构。如果是一个2D的filter,那么两者就是一样的。但是一个3Dfilter, 在大多数深度学习的卷积中,它是包含kernel的。每个卷积核都是独一无二的,主要在于强调输入通道的不同方面。

























Guess you like

Origin www.cnblogs.com/elitphil/p/12040671.html