The difference between the convolution filter and kernal

In a pile introduce the convolution of the post, this example is unique in that the color is sprouting, it shines, of course, is very intuitive visual drops, to ensure that the reading before sleep can see Zhou Gongzi.

 

Intuitive describes the various fascinating CNN layer

A short introduction

Convolution using "Kernel" extract some "features" from the input image . is a kernel matrix, and multiplied by the input slidably on the image, so that the desired behavior in some enhancement output. Look at the following GIF.

The above can be used to sharpen the image kernel. But the kernel what is so special about it? Consider the two input image shown in FIG. The first image, the center value + 2 * 3 * 5 * -1 -1 + 2 + 2 + 2 * -1 * -1 = 7, the value of 3 to 7. The second image, the output is 1 + 5 * 2 * 2 * -1 + -1 + 2 * * -1 = -1 + 2 -3, the value of 1 is reduced to -3. Obviously, the contrast between 3 and 7 and increased 1 -3, more clear and sharp image.

By deep CNN, then we do not need to extract the kernel of manual design features, but you can learn extractable potential value of these kernel features directly.

 

Kernel and Filter

Before depth discussion, I would like a clear distinction between "kernel" and "filter" the two terms, because I've seen a lot of people confused them. As described above, kernel is a weighting matrix, the weighting matrix is ​​multiplied with the input to extract relevant features. Convolution kernel name is the dimension of the matrix. For example, 2D convolution kernel matrix is ​​a 2D matrix.

However, filter kernel is a plurality of series, each kernel is assigned to a particular input channel. are always dimensional filter kernel than freshman. For example, in the 2D convolution, filter is a 3D matrix (essentially a 2D matrix (i.e., kernel) in series). Thus, for the CNN layer having a kernel size h * w k of input channel, filter size k * h * w.

A common convolutional layer is actually composed of a plurality of such filter. To simplify the following discussion, unless otherwise noted, assume that there is only one filter, because the filter will all repeat the same operation.

 

1D, 2D and 3D convolution

A dimensional convolution generally used for analyzing time series data (in this case because the input one-dimensional). As described above, one-dimensional data may have a plurality of input channels. Filter only move in one direction, so the output of 1D. See the following example of a single-channel one-dimensional convolution.

We have already seen an example of a single-channel 2D convolution in the beginning of my post, so let us visualize a multi-channel 2D convolution, and try to understand it. In the following figure, kernel size of 3 * 3, and there are a plurality of such filter kernel (marked in yellow). This is because there are a plurality of input channels each channel (marked in blue), and we have a kernel corresponding to the input. Clearly, filter here can be moved in two directions, so the final output is a 2D. 2D convolution convolution is the most common, widely used in computer vision.

It is difficult to visualize 3D filter (4D matrix because it is), so we discuss only a single channel 3D convolution. Shown in 3D convolution, kernel can be moved below in three directions, thus obtaining also outputs 3D.

Most of the work done in modifying and custom CNN layer only focused on the aspects of 2D convolution, so from now on, I will only discuss variants of 2D convolution.

 

Transposition (Transposed) Convolution

The following well-documented GIF 2D convolution how to reduce the size of the input. But sometimes we need to process the inputs such as increasing the size (also referred to as "sampling") and so on.

In order to use convolution to achieve this goal, we use called the transpose convolution or deconvolution (it is not true "reversal" convolution operation, so a lot of people do not like to use the term deconvolution) convolution modification Version. GIF below the dashed line block represents padding.

这些动画很直观的展示了如何基于不同的padding模式从同一输入产生不同的上采样输出。这种卷积在现代CNN网络中非常常用,主要是因为它们具有增加图像尺寸的能力。

可分离卷积

可分离卷积是指将卷积kernel分解为低维kernel。可分离卷积有两种主要类型。首先是空间上可分离的卷积,如下

标准的2D卷积核

 

空间可分离的2D卷积

空间可分离的卷积在深度学习中并不常见。但是深度可分离卷积被广泛用于轻量级CNN模型中,并提供了非常好的性能。参见以下示例。

具有2个输入通道和128个filter的标准2D卷积

 

深度可分离的2D卷积,它首先分别处理每个通道,然后进行通道间卷积

但是为什么用可分离的卷积呢?效率!!使用可分离卷积可以显著减少所需参数的数量。随着当今我们的深度学习网络的复杂性不断提高和规模越来越大,迫切需要以更少的参数提供相似的性能。

扩张/空洞(Dilated/Atrous)卷积

如你在以上所有卷积层中所见,无一例外,它们将一起处理所有相邻值。但是,有时跳过某些输入值可能更好,这就是为什么引入扩张卷积(也称为空洞卷积)的原因。这样的修改允许kernel在不增加参数数量的情况下增加其感受野

显然,可以从上面的动画中注意到,kernel能够使用与之前相同的9个参数来处理更广阔的邻域。这也意味着由于无法处理细粒度的信息(因为它跳过某些值)而导致信息丢失。但是,在某些应用中,总体效果是正面的。

 

可变形(Deformable)卷积

就特征提取的形状而言,卷积非常严格。也就是说,kernel形状仅为正方形/矩形(或其他一些需要手动确定的形状),因此它们只能在这种模式下使用。如果卷积的形状本身是可学习的呢?这是引入可变形卷积背后的核心思想。

实际上,可变形卷积的实现非常简单。每个kernel都用两个不同的矩阵表示。第一分支学习从原点预测“偏移”。此偏移量表示要处理原点周围的哪些输入。由于每个偏移量都是独立预测的,它们之间无需形成任何刚性形状,因此具有可变形的特性。第二个分支只是卷积分支,其输入是这些偏移量处的值。

What's next?

CNN层有多种变体,可以单独使用或彼此结合使用以创建成功且复杂的体系结构。每个变化都是基于特征提取应如何工作的直觉而产生的。因此,尽管这些深层CNN网络学到了我们无法解释的权重,但我相信产生它们的直觉对于它们的性能非常重要,朝着这个方向进行进一步的研究工作对于高度复杂的CNN的成功至关重要。

Guess you like

Origin www.cnblogs.com/yibeimingyue/p/11964515.html