Enter the world of Computer Vision - digital image + interpolation algorithm + histogram + convolution & filtering

Enter the world of Computer Vision - digital image + interpolation algorithm + histogram + convolution & filtering

1. Digital image

1.1 Image Basics

Regarding images, what we most often hear is the pixels of images, so what exactly are pixels? What does its value mean? Let's have a chat together. Through this article, we can roughly understand some terms about computer vision, and leave an impression of computer vision in our minds to facilitate our continuous learning.

Digital image: The image saved by the computer is essentially a pixel, these pixels are called digital image

1.1.1 Pixels, Grayscale and Contrast

Pixel , a very simple but significant vocabulary, pixel is the unit of resolution, is the most basic unit of a bitmap image, each pixel has its own color, these one by one pixels make up our various images, distinguish The rate is also called the resolution. The resolution of the image is the number of pixels in the unit inch (the unit is PPI), and the PPI indicates the number of pixels on the diagonal per inch.

There are three very important terms about images: grayscale, channel and contrast

The grayscale represents the value of the brightness and darkness of the image pixels, that is, the color depth of the midpoint of the black and white image. How does the grayscale image come from? It shows the visual effect of black, white and gray by modifying the channel of the image. The channel is the color channel, which decomposes the image into one or more color components.

  • Single channel: A pixel needs only one numerical representation, which can only represent grayscale (0 is black and 255 is white)

  • Three-channel: Also known as RGB mode, the image is divided into three channels of red, green and blue, which can represent color

  • Four channels: Add a transparency alpha channel on the basis of three channels RGB, when it is 0, it means full transparency

Contrast is also a very important concept in digital images. It refers to the difference between different colors. Contrast = maximum gray value/minimum gray value

After we understand what a color channel is, one of the most commonly used RGB models also requires us to take a closer look. The three primary colors of color refer to magenta + yellow + cyan, and the three colors of the RGB model we are talking about are optical Three primary colors, including red + green + blue

1.1.2 RGB color model

RGB color model

The RGB color model refers to a unit cube in the three-dimensional rectangular coordinate system color system. On the diagonal of the cube, the amounts of each primary color are equal to produce white from dark to bright, that is, grayscale, and the other 6 corners of the cube Points are red, yellow, green, cyan and magenta

RGB values ​​are converted to floating-point numbers: the results of floating-point calculations are more accurate, and the color value may be seriously distorted due to the loss of decimal parts during integer calculations

Note : Regarding the color mode RBG, there is a big hole in OpenCV. The channel arrangement of OpenCV for the read pictures is BGR instead of RGB.

# 由于在OpenCV中使用imread()方法读入的图像是BGR通道,我们怎么把它转为RGB通道
img = cv.imread('test.png')
img = cv.cvtColor(img, cv.COLOR_BGR2RGB)

1.1.3 Frequency and assignment

The frequency and amplitude of the image are also two commonly used concepts. The so-called frequency refers to the intensity of the change of the gray value, which is the gradient of the gray level in the plane space, and the amplitude is within a cycle. A maximum absolute value of 1 appears, which is also a sine wave, with a peak-to-trough distance of the general

1.2 Image sampling and quantization

As we said before, images saved by computers are saved one by one, and these pixels are digital images, but how do we digitize an image, that is, convert it into pixels one by one, which involves the image Sampling and Quantization

Sampling : How many points are used to describe an image, and the quality of the sampling result is measured by the resolution of the image

Quantization : refers to the range of values ​​to be used to represent a point after image sampling

The digitized coordinate value is called sampling, and the digitized magnitude is called quantization

The so-called upsampling and downsampling are said to shrink the image and enlarge the image:

Downsampling (reducing image) is to make the image conform to the size of the display area or generate a thumbnail of the corresponding image, and the main purpose of upsampling (enlargement image/image interpolation) is to enlarge the original image so that it can be displayed in a higher resolution display device

2. Interpolation algorithm

What is interpolation? For example, if we want to enlarge an image with 100 images, simply stretching the image will reduce the resolution of the image. How can we not reduce its resolution or try not to reduce it? Then you need to insert some new pixels, so that an image with 100 pixels is enlarged ten times, so that it has 1000 pixels, which solves the problem of distortion and resolution reduction. These new pixels Where does it come from, what is its pixel value, and how to interpolate? This is what our interpolation algorithm does, here we introduce several commonly used interpolation methods

2.1 The nearest interpolation The nearest interpolation

Let's use an example to illustrate the essence of the nearest neighbor interpolation algorithm. For example, there are four adjacent pixels in the upper left corner of the image img1 we need to enlarge. After the image is enlarged, we need to add some new pixels between these four pixels. The determination of the pixel value of these new points depends on the position where it falls

Suppose i+u, j+v (i, j are positive integers, u, v are decimals greater than zero and less than 1, the same below) as the coordinates of the pixel to be sought, then the value f(i+ u, j+v) depending on its position

Regarding the code implementation of the nearest neighbor interpolation algorithm, let's tap it to understand it. For example, if we enlarge an original 400 * 400 image to 800 * 800, the extra pixels will be completed by the nearest neighbor interpolation algorithm.

import cv2
import numpy as np
# 定义一个函数,用来实现算法
def function(img):
    # 这三个值分别是高、宽和通道数(ing.shape返回的是一个三元组的值)
    height,width,channels =img.shape
    emptyImage=np.zeros((800,800,channels),np.uint8)
    sh=800/height
    sw=800/width
    for i in range(800):
        for j in range(800):
            x=int(i/sh)
            y=int(j/sw)
            emptyImage[i,j]=img[x,y]
    return emptyImage
 
img=cv2.imread("lenna.png")
zoom=function(img)
print(zoom)
print(zoom.shape)
cv2.imshow("nearest interp",zoom)
cv2.imshow("image",img)
cv2.waitKey(0)

Code analysis : In python, we need to know several representative third-party libraries numpy, matplotlib and opencv-python for image processing. They provide many methods to help us realize the operation. The np. zeros() is a method provided by numpy. It returns an array of a given shape and type filled with 0. In image processing, we can understand that it creates an empty canvas. This canvas is used in this In the example, the size is 800x800, we fill it with pixels, it becomes a digital image, and displaying it fulfills the requirements of our method image

The two nested for loops (the image is 2-dimensional) are the key codes for our interpolation. We traverse each pixel of the 800x800 in turn, so how do we determine what value should be filled in this point? We said above that the determination of this value is determined by the position it falls into. That is to say, we need to put the 800x800 image back into the 400x400 image to determine its position. In the 7th and 8th lines, we find the zoom ratio , use this ratio in the 12th and 13th lines, that is, use the pixel position of the large image to remove this ratio, and then get the position of this point in the small image, and convert it to an int type, which is the same as the original image The pixel value of a certain coordinate is maintained, which achieves our operation of assigning new pixels

2.2 Bilinear interpolation

Before learning bilinear interpolation, let's take a look at single linear interpolation. In the process of single linear interpolation, we know the coordinates of two points, which is equivalent to getting the equation of a straight line passing through these two points, so we will The coordinates of all points on this line segment can be obtained

Scale conversion: y = [(x1-x)/(x1-x0) ]*y0 + [(x-x0)/(x1-x0)]y1

Bilinear interpolation is actually two single linear interpolation

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-10DTbxAI-1649816773458)(OpenCV%E7%AC%94%E8%AE%B0/image-20220412130627741.png )]

The specific process: do a single linear interpolation from Q12 and Q22 to get R2, then use the same method to get R1 from Q11 and Q21, get R1 and R2, and then do a single linear interpolation on these two points to get the P point we need , then use the nearest neighbor interpolation to determine the value of the P pixel

Let's feel the implementation process of this algorithm in the above code

import numpy as np
import cv2
  
'''
python implementation of bilinear interpolation
'''
def bilinear_interpolation(img,out_dim):
    src_h, src_w, channel = img.shape
    dst_h, dst_w = out_dim[1], out_dim[0]
    print ("src_h, src_w = ", src_h, src_w)
    print ("dst_h, dst_w = ", dst_h, dst_w)
    if src_h == dst_h and src_w == dst_w:
        return img.copy()
    dst_img = np.zeros((dst_h,dst_w,3),dtype=np.uint8)
    scale_x, scale_y = float(src_w) / dst_w, float(src_h) / dst_h
    for i in range(3):
        for dst_y in range(dst_h):
            for dst_x in range(dst_w):
  
                # find the origin x and y coordinates of dst image x and y
                # use geometric center symmetry
                # if use direct way, src_x = dst_x * scale_x
                src_x = (dst_x + 0.5) * scale_x-0.5
                src_y = (dst_y + 0.5) * scale_y-0.5
  
                # find the coordinates of the points which will be used to compute the interpolation
                src_x0 = int(np.floor(src_x))
                src_x1 = min(src_x0 + 1 ,src_w - 1)
                src_y0 = int(np.floor(src_y))
                src_y1 = min(src_y0 + 1, src_h - 1)
  
                # calculate the interpolation
                temp0 = (src_x1 - src_x) * img[src_y0,src_x0,i] + (src_x - src_x0) * img[src_y0,src_x1,i]
                temp1 = (src_x1 - src_x) * img[src_y1,src_x0,i] + (src_x - src_x0) * img[src_y1,src_x1,i]
                dst_img[dst_y,dst_x,i] = int((src_y1 - src_y) * temp0 + (src_y - src_y0) * temp1)
  
    return dst_img
  
  
if __name__ == '__main__':
    img = cv2.imread('lenna.png')
    dst = bilinear_interpolation(img,(700,700))
    cv2.imshow('bilinear interp',dst)
    cv2.waitKey()

The calculation of the bilinear interpolation method is more complicated than the nearest neighbor interpolation method, and the calculation amount is also very large, but it does not have the disadvantage of grayscale discontinuity, and the image will look smoother

Problems with bilinear interpolation - selection of coordinate system

If the origin (0, 0) of the source image and the target image are selected in the upper left corner, then calculate each pixel of the target image according to the interpolation formula, assuming that you need to reduce a 5x5 image to 3x3, then each pixel of the source image and the target image The correspondence between them is as follows:

insert image description here

The results obtained in this way will have certain errors, so we need to change the selection of the coordinate system, that is, to make the geometric centers coincide (add 0.5 to all coordinates), so that all points participate in the calculation

3. Histogram, filtering and convolution

3.1 Histogram

3.1.1 Understanding the histogram, its properties and applications

Histogram is also a very important concept in image processing. The grayscale histogram describes the grayscale distribution in the image, and can intuitively show the proportion of each grayscale level in the image. Generally speaking, it is about A function of the gray level, which describes the number of pixels with a certain gray level in the image , the abscissa is the gray level, and the ordinate is the frequency of the gray level

It is easy for everyone to have a misunderstanding about the histogram, that is, it is not clear what the relationship between the spatial position of the pixel and the histogram is. There is only one relationship between them-they have no relationship! The image histogram does not care about the spatial position of the pixels, so it is not affected by the image rotation and translation changes, and can be used as a feature of the image

Any particular image corresponds to a unique histogram, but different images can have the same histogram. If an image consists of two disjoint regions, and the histogram of each region is known, Then the histogram of the entire image is the sum of the histograms of the two regions

Application of histogram: Through the histogram, we can understand the overview of the light and shade of the entire image

insert image description here

3.1.1 Histogram equalization

Histogram equalization is to change the histogram of the original image into a uniform histogram through the transformation function, and then modify the original image according to the uniform histogram, so as to obtain a new image with uniform gray distribution. The histogram equalization is to use a certain An algorithm to make the histogram roughly flat, its role is to enhance the image (enhance the contrast)

Transform an image with uneven distribution of pixel values ​​into an image with relatively uniform distribution of pixel values

In order to expand the brightness range of the original image, a mapping function is needed to evenly map the pixel values ​​of the original image to the new histogram. This mapping function has two conditions: 1. In order not to disturb the original order, after mapping The size relationship between light and dark cannot be changed; 2. The pixel value of the image after mapping must be within the original range;

The equalization algorithm has several main steps:

  1. Scan each pixel of the original grayscale image in turn to calculate the grayscale histogram of the image
  2. Calculate the cumulative histogram of the grayscale histogram
  3. According to the principle of cumulative histogram and histogram equalization, the mapping relationship between input and output is obtained
  4. Finally, the result is obtained according to the mapping relationship: dst(x,y) = H'(src(x,y)) for image transformation

Detailed explanation of the equalization process:

insert image description here


insert image description here

In actual development, we can use the opencv interface, but we need to understand the process of histogram equalization

3.2 Filtering and Convolution

3.2.1 Basic principles of filtering and convolution

Linear filtering can be said to be the most basic method of image processing. It allows us to process images and produce many different effects. The principle of convolution is similar to filtering, but convolution has small differences. The convolution operation is also a convolution kernel. The multiplication sum of the position corresponding to the image, but the convolution operation needs to flip the convolution kernel 180 degrees before doing the multiplication (the convolution is represented by the symbol *)

insert image description here

Convolution and filtering are two different concepts. It does not mean that convolution is derived from filtering. The two are already in their final form when they are used. No additional operations are required. Only when filtering and convolution are converted During the operation, a convolution kernel flipping 180 degrees will be added

3.2.2 Convolution - filter/convolution kernel (Kernel)

The convolution kernel is when the image is processed, given the input image, the weighted average of the pixels in a small area of ​​the input image becomes each corresponding pixel in the output image, where the weight is defined by a function, this function is called the convolution kernel

There are certain rules for the convolution kernel:

  1. The size of the filter should be an odd number so that it has a center, such as 3x3, 5x5 or 7x7. The convolution kernel has a center and a center, and it also has a radius. For example, the radius of a 5x5 kernel is 2
  2. The sum of all elements of the filter matrix should be equal to 1, which is to ensure that the brightness of the image before and after filtering remains unchanged. But it's not a requirement
  3. If the sum of all elements of the filter matrix is ​​greater than 1, the filtered image will be brighter than the original image, otherwise, if it is less than 1, the resulting image will be darker. If the sum is 0, the image will not become black, but will be very dark
  4. For filtered structures, negative numbers or values ​​greater than 255 may appear. In this case, we can directly truncate them to between 0 and 255. For negative numbers, you can also take the absolute value

In actual development, we will design a variety of convolution kernels, and images processed with different convolution kernels will have different effects, so we can think that different convolution kernels represent different image modes , convolution It is very useful in extracting edges, and extracting edges also extracts features, which is the cornerstone of our later learning

If the value of an image convolved with this convolution kernel is relatively large, the image is considered to be very close to the convolution kernel

3.2.3 Application of convolution kernel

smooth mean filter

In the application example of smooth mean filtering, the image to be processed generally has a lot of noise (high-frequency signal), and the difference between them and the surrounding pixels is relatively large, making the image look numb and not smooth, so we use a 3x3 The convolution kernel is used to convolve the image. Each element of this 3x3 matrix is ​​1/9. After convolution (the pixels corresponding to the noise will be "dissolved"), the entire image looks smoother (more blurred)

Gaussian filter

Gaussian smoothing has a Gaussian distribution in the horizontal and vertical directions, which highlights the weight of the center point after pixel smoothing, and has a better smoothing effect than mean filtering

image sharpening

The image sharpening uses the Laplace transform kernel function. Except for the center position of the 3x3 convolution kernel is 9, the rest positions are -1. It is intended to multiply the center pixel by 9, and then subtract the remaining adjacent pixel values. This widens the gap between the center pixel value and other adjacent pixel values, resulting in enhanced image contrast and sharpening of the image.

Sobel edge extraction

Sobel edge detection can be divided into horizontal detection and vertical detection. Generally speaking, we will fuse the results of the two edge extractions, but the horizontal or vertical depends on the use of the convolution kernel.

insert image description here

3.2.4 Calculation of the convolution itself

First of all, we need to introduce a concept - step size . As the name implies, the step size here represents the number of pixels that the convolution kernel moves each time when convolving the image. Some of the examples we mentioned above are moving one pixel at a time and then sequentially Change the pixel value of the center pixel to achieve various effects, but what if we want to move s pixels each time? The result becomes: ((hf)/s + 1 , (wf)/s + 1)

But this will cause many problems. If f or step size s is greater than 1, the image will become smaller after each convolution, which will lose a lot of information. We have to think about solving problems. Generally speaking, we will use To a concept - filling/Padding, the simplest one is to fill a circle of pixels with a pixel value of 0 on the periphery, so that every original pixel will be covered, and there will be no loss of pixels

insert image description here

Convolution - three filling methods

According to the number of outer circles of filled pixels, we can divide them into three modes: full, same and valid. What kind of impact will the different numbers of filled circles cause? The setting of the number of filling circles directly determines whether the size of the output image is larger, unchanged or smaller than the original image, because when the filter filters the image, the center position of the convolution kernel determines this

insert image description here

Three-channel convolution

The three-channel convolution is actually a three-dimensional single-channel convolution, and the corresponding multi-channel is actually a multi-dimensional single-channel convolution, which has different convolution kernels for different channels (because the extraction features are different). For channel convolution, each convolution kernel is 3x3x3. It can be understood that a single-channel convolution kernel is a two-dimensional convolution kernel, while a three-channel convolution kernel is a cube. Filter W0 in the figure refers to is a whole convolution kernel, and Filter W1 refers to another convolution kernel

The Output Volume in the figure is the picture after convolution. Why does the 7x7x2 picture become a 3x3x2 picture? This happens because the stride of the convolution is 2 instead of 1, even with padding

insert image description here
For a picture, we can set n multiple convolution kernels to operate. In our above picture, there are two channels of output W0 and W1, because we used two convolution kernels, this time leads to A very important concept. The input and output of the image have nothing to do with the convolution. The output is determined by the convolution kernel. Whether it is the style or the number, it is determined by the type and number of the convolution kernel. of

The great thing about CNN (Convolutional Neural Network) is that the characteristics of the filter are not artificially set, but trained by yourself through a large number of pictures.


Copyright statement: The above learning content and pictures come from or refer to - Badou Artificial Intelligence
If the article is helpful to you, remember to support it with one click~

Guess you like

Origin blog.csdn.net/qq_50587771/article/details/124142152