Image convolution and filtering~

Some summaries of convolution-related knowledge points.

1. The basic concepts of linear filtering and convolution

Linear filtering can be said to be the most basic method of image processing, which allows us to process images and produce many different effects. The method is very simple. First, we have a two-dimensional filter matrix (with a tall name called a convolution kernel) and a two-dimensional image to be processed. Then, for each pixel of the image, calculate the product of its neighboring pixels and the corresponding elements of the filter matrix, and then add them up as the value of the pixel position. This completes the filtering process.

The operation of multiplying and summing the image and the filter matrix element by element is equivalent to moving a two-dimensional function to all positions of another two-dimensional function. This operation is called convolution or co-correlation. The difference between convolution and co-correlation is that convolution needs to flip the filter matrix by 180 first, but if the matrix is ​​symmetric, then there is no difference between the two.

Correlation and Convolution can be said to be the most basic operations of image processing, but they are very useful. These two operations have two very key properties: they are linear and shift-invariant. Translation invariance means that we perform the same operation at every location in the image. Linear means that the operation is linear, that is, we replace each pixel with a linear combination of its neighborhood. These two properties make this operation very simple, since linear operation is the simplest, and then it is even simpler to do the same operation everywhere.

In fact, in the field of signal processing, convolution has a wide range of meanings and has its strict mathematical definition, but we will not focus on this here. 2D convolution requires 4 nested 4-double loops, so it is not fast unless we use a small convolution kernel. 3x3 or 5x5 are generally used here. Moreover, for filters, there are certain rules and requirements:

1) The size of the filter should be odd so that it has a center, eg 3x3, 5x5 or 7x7. There is a center, and there is also a name for the radius. For example, the radius of a 5x5 kernel is 2.

2) The sum of all elements of the filter matrix should be equal to 1, which is to ensure that the brightness of the image before and after filtering remains unchanged. Of course, this is not a hard requirement.

3) If the sum of all elements of the filter matrix is ​​greater than 1, the filtered image will be brighter than the original image, otherwise, if it is less than 1, the resulting image will be darker. If the sum is 0, the image will not become black, but will be very dark.

4) For the filtered structure, negative numbers or values ​​greater than 255 may appear. In this case, we can directly truncate them to between 0 and 255. For negative numbers, the absolute value can also be taken.

The magic convolution kernel

As mentioned above, the filtering process of the image is to apply a small convolution kernel to the image, so what magic does this small convolution kernel have, which can make an image from horrible to beautiful. Let's take a look at the magic of some simple but not simple convolution kernels.

do nothing

Haha, can you see anything? This filter does nothing and the resulting image is the same as the original. Because only the center point has a value of 1. The weights of the neighborhood points are all 0, which has no effect on the filtered value.

Image sharpening filter Sharpness Filter

Image sharpening is similar to edge detection. First find the edge, and then add the edge to the original image, which strengthens the edge of the image and makes the image look sharper. The combination of these two operations is the sharpening filter, that is, on the basis of the edge detection filter, add 1 to the center position, so that the filtered image will have the same brightness as the original image, but will be sharper.

Mainly to emphasize the details of the image. The simplest 3x3 sharpening filter is as follows:

In fact, it calculates the difference between the current point and the surrounding points, and then adds this difference to the original position. In addition, the weight of the middle point is greater than the sum of all weights greater than 1, which means that this pixel should maintain its original value.

Edge DetectionEdge Detection

We are looking for horizontal edges: It should be noted that the sum of the elements of the matrix here is 0, so the filtered image will be very dark, and only the edges are bright.

Why does this filter find horizontal edges? Because convolution with this filter is equivalent to a discrete version of derivation: you subtract the previous pixel value from the current pixel value, so you can get the difference or slope of the function at these two positions. The following filter finds vertical edges, where both the pixel values ​​above and below the pixel are used:

The following filter can find the 45-degree edge: taking -2 is not for anything, just to make the sum of the elements of the matrix 0.

Then the following filter can detect edges in all directions:

In order to detect edges, we need to compute gradients in the corresponding directions of the image. Use the following convolution kernel to convolve the image, and that's it. But in practice, this simple method will also amplify the noise. In addition, it should be noted that all the values ​​​​of the matrix add up to 0.

Embossing Embossing Filter

The Emboss filter can give the image a 3D shadow effect. Just subtract the pixels on one side of the center from the pixels on the other side. At this time, the pixel value may be a negative number. We regard the negative number as a shadow and the positive number as a light, and then we add an offset of 128 to the resulting image. At this time, most of the image becomes gray. Below is the 45 degree emboss filter

As long as we increase the filter, we can get a more exaggerated effect

This effect is very beautiful, like carving an image on top of a stone, and then lighting it from one direction. It is different from the previous filter in that it is asymmetrical. Also, it produces negative values, so we need to offset the result to get the range of grayscale in the image.

A: Original image. B: Sharpen. C: Edge detection. D: Relief

Mean Fuzzy Box Filter (Averaging)

We can average the current pixel and the pixels in its four neighbors together, and then divide by 5, or directly take the value of 0.2 in the 5 places of the filter, as shown in the following figure:

It can be seen that this blur is relatively gentle, we can make the filter larger, which will become rough: pay attention to dividing the sum by 13.

So, if you want a more blurry effect, just increase the filter size. Or applying multiple blurs to the image would also work.

Gaussian blur

Mean Blur is simple, but not very smooth. Gaussian blur has this advantage, so it is widely used in image noise reduction. Especially before edge detection, it is used to remove details. Gaussian filter is a low pass filter.

 

Motion Blur Motion Blur

Motion blur can be achieved by blurring in only one direction, for example the 9x9 motion blur filter below. Note that the result of the sum is divided by 9.

The effect is as if the camera is moving from the upper left corner to the lower right corner.

Calculation of convolution

For image processing, there are two broad categories of methods: spatial domain processing and frequency domain processing! Spatial domain processing refers to directly calculating the original pixel space, and frequency processing refers to transforming the image into the frequency domain first, and then performing filtering and other processing.

Spatial calculation - direct 2D convolution

2D convolution

Direct 2D convolution is what it said at the beginning. For each pixel of the image, calculate the product of its neighboring pixels and the corresponding elements of the filter matrix, and then add them up as the value of the pixel position.

Straightforward implementation is also called brute force because it implements exactly as defined without any optimizations. Of course, in parallel implementation, it is also more flexible. In addition, there is also an optimized version. If our kernel is separable, then we can get a convolution method that is about 5 times faster.
Boundary processing

What if the convolution kernel encounters the edge of the image? For example, the pixel at the top of the image has no pixels above it, so how to calculate its value? There are currently four mainstream processing methods, we use one-dimensional convolution and mean filtering to illustrate.

We replace the value of each pixel with the average of its two-neighborhood in 1D images. Suppose we have a 1D image I like this:

Operations on pixels that are not borders of the image are relatively simple. Suppose we do local averaging on the fourth pixel 3 of I. That is, we use 2, 3 and 7 as an average to replace the pixel value at this position. That is, the average will produce a new image J, this image in the same position J (4) = (I(3)+I(4)+I(5))/3 = (2+3+7)/ 3 = 4. Similarly, we can get J(3) = (I(2)+I(3)+I(4))/3 =(4+2+3)/3 = 3. It should be noted that each pixel of the new image depends on the old image. It is wrong to use J (3) when calculating J (4), but use I (3), I (4) and I ( 5). So each pixel is the average of its two neighbors. Averaging is a linear operation because each new pixel is a linear combination of older pixels. 

For convolution, there is also a situation that must be considered, what should be done when the image borders? What should be the value of J(1)? It depends on I(0), I(1) and I(2). But we don't have I(0)! The left side of the image has no value anymore. There are four ways to deal with this problem:

1) The first is to imagine that I is a part of an infinitely long image, except for the part where we give the value, the pixel values ​​​​of other parts are all 0. In this case, I(0)=0. So J(1) = (I(0) + I(1) + I(2))/3 = (0 + 5 + 4)/3= 3. Likewise, J(10) = (I(9)+ I(10)+I(11))/3 = (3+ 6 + 0)/3 = 3.

2) The second method is also to imagine that I is part of the infinite image. But the unspecified part is extended with the value of the image boundary. In our example, because the leftmost value of image I is I(1)=5, all values ​​to its left are considered to be 5. And all the values ​​​​on the right side of the image, we all think that they are 6 the same as the value I(10) on the right boundary. At this time J(1) = (I(0) + I(1) + I(2))/3 = (5 + 5 + 4)/3= 14/3. And J(10) = (I(9 )+I(10)+I(11))/3 = (3 + 6 + 6)/3 = 5. 

3) The third case is that the image is considered to be periodic. That is, I keep repeating. The period is the length of I. In our case, the values ​​of I(0) and I(10) are the same, and the values ​​of I(11) and I(1) are also the same. So J(1) = (I(0) + I(1) + I(2))/3 = (I(10) + I(1)+ I(2))/3 = (6 + 5 + 4 )/3 = 5. 

4) The last case is to ignore other places. We think that the situation other than I is undefined, so there is no way to use these undefined values, so there is no way to calculate the pixels that use the undefined value of image I. Here, neither J(1) nor J(10) can be calculated, so the output J will be smaller than the original image I.

These four methods have their own advantages and disadvantages. If we imagine that the image we use is just a small window of the world, and then we need to use values ​​outside the window boundary, then in general, the values ​​outside and on the boundary are almost similar, so the second method may make more sense .

Frequency Domain Calculation - Fast Fourier Transform FFT Convolution

This fast implementation benefits from the convolution theorem: a convolution in the time domain is equal to a product in the frequency domain. So after transforming our image and filter into the frequency domain through an algorithm, multiply them directly, and then transform back to the time domain (that is, the spatial domain of the image).

o represents matrix element-wise multiplication. Then what method is used to transform the image and filter in the spatial domain to the frequency domain. That is the famous Fast Fourier Transformation FFT (in fact, in CUDA, FFT has been implemented).

To filter an image in the frequency domain, the size of the filter and the size of the image must match so that the multiplication of the two is easy. Because the size of the general filter is smaller than the image, we need to expand our kernel to make it the same size as the image.

Because the FFT implementation in CUDA is periodic, the value of the kernel should also be arranged in this way to support this periodicity.

In order to ensure that the pixels on the border of the image can also get a corresponding output, we also need to expand our input image. At the same time, the way of expansion should also support periodic expression.

If we just use the convolution theorem without any modification to the input, then what we get is the result of periodic convolution. But this may not be what we want, because periodic convolution will fill in the input data and introduce some artifacts.

Given I and K of length N, in order to get a linear convolution, we need zero padding on I and K. Why do you need to add 0, because DFT assumes that the input is infinite and periodic, and the period is N. As shown in the figure above, for I and K, if there is no padding, it is implicitly assumed that I and K are periodic, and their length N is the period. In the figure, the I and K of N length are the parts of the black dotted line, and if there is no padding, it is implied that the same innumerable I will be added outside N, such as the red dotted line, adding a period. The same is true for K. If it is zero padding, the rest of the black dotted line is all 0, as shown in the blue part in the figure. Convolve I and K. If there is no padding, such as the black dotted line, there will be artifacts in the red part. If there is padding, it is a solid blue line.

experimental code

This is the Matlab experiment code for the second part:

clear,close all, clc

 

%% readimage

image =imread('test.jpg');

 

%% definefilter

% -----Identity filter -----

kernel =[0, 0, 0

                     0, 1, 0

                     0, 0, 0];

 

% -----Average Blur -----

kernel =[0, 1, 0

                     1, 1, 1

                     0, 1, 0] / 5;

 

% -----Gaussian Blur -----

kernel =fspecial('gaussian', 5 , 0.8);

 

% -----Motion Blur -----

kernel =[1, 0, 0, 0, 0

                     0, 1, 0, 0, 0

                     0, 0, 1, 0, 0

                     0, 0, 0, 1, 0

                     0, 0, 0, 0, 1] / 5;

                    

% -----Edges Detection -----

kernel =[-1, -1, -1

                     -1, 8, -1

                     -1, -1, -1];

 

% -----Sharpen filter -----

kernel =[-1, -1, -1

                     -1, 9, -1

                     -1, -1, -1];

                    

% -----Emboss filter -----

kernel =[-1, -1, 0

                     -1,  0,1

                     0,   1,1];

                    

%% convolethe image with defined kernel or filter

result =zeros(size(image));

result(:,:, 1) = conv2(double(image(:, :, 1)), double(kernel), 'same');

result(:,:, 2) = conv2(double(image(:, :, 2)), double(kernel), 'same');

result(:,:, 3) = conv2(double(image(:, :, 3)), double(kernel), 'same');

 

%% showthe result

imshow(image);

figure

imshow(uint8(result))

references

[1] Correlation and Convolution.pdf

[2] Lode's Computer GraphicsTutorial Image Filtering

whaosoft aiot http://143ai.com

Guess you like

Origin blog.csdn.net/qq_29788741/article/details/131016036
Recommended