Common convolution kernel understanding in convolutional neural network and Pytorch-based example application (with complete code)

Article directory

0. Preface

The purpose of this article : To illustrate the commonly used convolution kernels and their working principles in convolution operations, and to use these convolution kernels through examples based on the Pytorch framework.

The basic knowledge to be mastered before reading this article : You need to understand the operation principle of convolution in the convolutional neural network CNN. There are many such articles on CSDN, so I won’t repeat them here. Recommended article: Convolution process of RGB color image (gif animation demonstration)

1. Commonly used convolution kernels

The convolution kernel (kernel) is the most important weight parameter in the convolutional neuron network CNN. During the learning process of the convolutional neuron network, it is mainly to learn a suitable convolution kernel. The following will divide the commonly used convolution kernels into three categories through the image processing method.

1.1 Edge recognition class convolution kernel

This type of convolution kernel is the most studied, and there are many kinds of convolution kernels. The common features of this type of convolution kernel are: 卷积核内所有的值求和为0, this is because the pixel value of the image will change suddenly in the edge area, and convolution with such a convolution kernel will get a value other than 0. In areas other than the edge, the pixel values are very close, and convolution with such a convolution kernel will give a value approximately equal to 0.

Commonly used edge detection convolution kernels are:
①Robert operator
$\left\{ \begin{matrix} -1 &0 \\ 0 & 1 \\ \end{matrix} \right\}$
或
$\left\{ \begin{matrix} 0& -1 \\ 1 &0 \\ \end{matrix} \right\}$

2Prewitt function
$\left\{ \begin{matrix} -1&-1 & -1 \\ 0 & 0&0\\ 1 & 1&1 \\ end{matrix} \right\}$
或
$\left\{ \begin{matrix} -1&0 & 1 \\ -1 & 0&1\\ -1 & 0&1 \\end{matrix} \right\ } }$

3Sobel function
$\left\{ \begin{matrix} -1&-2 & -1 \\ 0 & 0&0\\ 1 &2&1 \\ \end{matrix}\ right\}$
或
$\left\{ \begin{matrix} -1&0 & 1 \\ -2 & 0&2\\ -1 & 0&1 \\end{matrix} \right\ } }$

4The placeholder
$\left\{ \begin{matrix} 0&1 &0 \\ 1 & -4&1\\ 0& 1&0\\ end{matrix}\right\}$

1.2 Fuzzy convolution kernel

The principle of this type of convolution kernel is 像素值求平均值，使得像素变化更加平缓to achieve the purpose of blurring within an area, for example:
$\left\{ \begin{matrix} 1/9&1/9 &1/9 \\ 1/9 & 1/9&1/9 \\ 1/9&1/9&1/9 \\ \end{matrix} \right\}$

1.3 Sharpened convolution kernel

The function of this type of convolution kernel is 凸显像素值有变化的区域to make the area (edge area) with a relatively large gradient of pixel values become a larger gradient of pixel values. In edge detection, the design of the convolution kernel requires that all the values in the convolution kernel sum to 0. The requirement here is just the opposite. It is required that all the values in the convolution kernel should not be 0, highlighting the larger gradient of pixel values. Area, such as the following convolution kernel:
$\left\{ \begin{matrix} -1&-1 &-1 \\ -1 & 9&- 1 \\ -1&-1&-1 \\ \end{matrix} \right\}$

As long as you grasp the principle of the above convolution kernel, you can design the convolution kernel yourself. For example, blurring the convolution kernel, this is also possible
$\left\{ \begin{matrix} 1/16&1/16 &1/16&1/16 \\ 1/16&1/16 &1/16&1/16 \\ 1/16&1/16 &1 /16&1/16 \\ 1/16&1/16 &1/16&1/16 \\ \end{matrix} \right\}$

2. Example application of convolution kernel based on Pytorch

2.1 Mutual conversion between tensor and image

①Image to tensor : use Image.open() in PIL to open the image, and then use torchvision.transforms.ToTensor() to convert to tensor:

image = Image.open('image_path').convert('RGB') #导入图片
image_to_tensor = torchvision.transforms.ToTensor()   #实例化ToTensor
original_image_tensor = image_to_tensor(image).unsqueeze(0)     #把图片转换成tensor

The purpose of using .unsqueeze(0) to increase the dimension here is to prepare for the subsequent convolution operation, because Conv2d requires the input tensor dimension to be 4 dimensions, that is, [batch, channel, H, W]. This corresponds to increasing the dimension of batch.

②tensor to image : use torchvision.utils.save_image():

torchvision.utils.save_image(tensor, 'save_image_path')

2.2 Specification of convolution kernel

Because in the nn.Conv2d() method, the convolution kernel (weight) is random by default, so the convolution kernel needs to be specified first:

#卷积核：laplace
conv_laplace = torch.nn.Conv2d(in_channels=3,out_channels=1,kernel_size=3,padding=0,bias=False)
conv_laplace.weight.data = torch.tensor([[[[-1,0,1],[-1,0,1],[-1,0,1]],
                                            [[-1,0,1],[-1,0,1],[-1,0,1]],
                                            [[-1,0,1],[-1,0,1],[-1,0,1]]]], dtype=torch.float32)

Pay attention to the dimension of the convolution kernel, as mentioned above, there must be 4 dimensions.

3. Image processing results of different convolution kernels

①Original image:
Please add a picture description
②After convolution of the edge detection convolution kernel (prewitt horizontal operator):

③After convolution of the edge detection convolution kernel (prewitt vertical operator):

④Edge detection convolution kernel (laplace operator) after convolution:
Please add a picture description

Experience the same edge detection, the output difference of the above three operators. Especially the difference between horizontal and vertical.

⑤ After convolution with blurred convolution kernel:
Please add a picture description
⑥ After convolution with sharpened convolution kernel:

Here the background noise points are highlighted.

4. Complete code

import torch
from PIL import Image
import torchvision


image = Image.open('girl.png').convert('RGB') #导入图片
image_to_tensor = torchvision.transforms.ToTensor()   #实例化ToTensor
original_image_tensor = image_to_tensor(image).unsqueeze(0)     #把图片转换成tensor


#卷积核：prewitt横向
conv_prewitt_h = torch.nn.Conv2d(in_channels=3,out_channels=1,kernel_size=3,padding=0,bias=False)  #bias要设定成False，要不然会随机生成bias，每次结果都不一样
conv_prewitt_h.weight.data = torch.tensor([[[[-1,-1,-1],[0,0,0],[1,1,1]],
                                            [[-1,-1,-1],[0,0,0],[1,1,1]],
                                            [[-1,-1,-1],[0,0,0],[1,1,1]]]], dtype=torch.float32)


#卷积核：prewitt纵向
conv_prewitt_l = torch.nn.Conv2d(in_channels=3,out_channels=1,kernel_size=3,padding=0,bias=False)
conv_prewitt_l.weight.data = torch.tensor([[[[-1,0,1],[-1,0,1],[-1,0,1]],
                                            [[-1,0,1],[-1,0,1],[-1,0,1]],
                                            [[-1,0,1],[-1,0,1],[-1,0,1]]]], dtype=torch.float32)

#卷积核：laplace
conv_laplace = torch.nn.Conv2d(in_channels=3,out_channels=1,kernel_size=3,padding=0,bias=False)
conv_laplace.weight.data = torch.tensor([[[[-1,0,1],[-1,0,1],[-1,0,1]],
                                            [[-1,0,1],[-1,0,1],[-1,0,1]],
                                            [[-1,0,1],[-1,0,1],[-1,0,1]]]], dtype=torch.float32)


#卷积核：模糊化
conv_blur = torch.nn.Conv2d(in_channels=3,out_channels=1,kernel_size=5,padding=0,bias=False)
conv_blur.weight.data = torch.full((1,3,5,5),0.04)


#卷积核：锐利化
conv_sharp = torch.nn.Conv2d(in_channels=3,out_channels=1,kernel_size=3,padding=0,bias=False)
conv_sharp.weight.data = torch.tensor([[[[-1,-1,1],[-1,-1,-1],[-1,-1,-1]],
                                            [[-1,-1,1],[-1,22,-1],[-1,-1,-1]],
                                            [[-1,-1,1],[-1,-1,-1],[-1,-1,-1]]]], dtype=torch.float32)

#生成并保存图片
tensor_prewitt_h = conv_prewitt_h(original_image_tensor)
torchvision.utils.save_image(tensor_prewitt_h, 'prewitt_h.png')

tensor_prewitt_l = conv_prewitt_l(original_image_tensor)
torchvision.utils.save_image(tensor_prewitt_l, 'prewitt_l.png')

tensor_laplace = conv_laplace(original_image_tensor)
torchvision.utils.save_image(tensor_laplace, 'laplace.png')

tensor_blur = conv_blur(original_image_tensor)
torchvision.utils.save_image(tensor_blur, 'blur.png')

tensor_sharp = conv_sharp(original_image_tensor)
torchvision.utils.save_image(tensor_sharp, 'sharp.png')