Summary of common methods for computer vision image processing

1. Fundamentals of digital image processing

1.1 Formation of human eye image

  1. Light enters the eye: When light reflects or scatters off an object and enters the eye, it travels through the cornea and lens into the interior of the eyeball.
  2. Focusing Light: The cornea and lens focus light onto the retina. The lens can adjust the focus distance by adjusting its shape, so that the image of the object is clearly reflected on the retina.
  3. Light-sensitive cells that sense light: The retina is a layer of tissue that contains light-sensitive cells, which are divided into two types of cells: cones and rods. Cones are responsible for color and brightness perception, and rods are responsible for low-light perception.
  4. Nerve Signaling: When photosensitive cells are stimulated by light, they generate nerve signals that are then transmitted to the optic nerve and brain. In the optic nerve and visual cortex, these signals are further processed and interpreted to form the visual images we see.

insert image description here

1.1 Image Digitization

Image digitization is the process of converting an image into a digital signal. A digitized image usually consists of a matrix of numbers, each element represents a pixel on the image, and the color and brightness of the pixel are represented by numbers.

The process of digitizing an image usually involves the following steps:

  1. Acquisition: Acquiring images requires the use of a digital device such as a digital camera, scanner, or video camera. Digitizing devices convert images into digital signals that can be processed by computers.
  2. Sampling : Sampling is the process of converting a continuous image into discrete pixels. Digitizing equipment divides the image into grids, each grid is called a pixel, and collects the color and brightness information of each pixel,For example: an image with a resolution of 640 480 means that the image is composed of 640 480 = 307200 points.
  3. Quantization : Quantization is the process of converting the color and brightness values ​​of each pixel into digital values. During quantization, a continuous signal is converted to a discrete signal. The quantization level determines the amount of color and brightness that can be displayed in a digital image.For example: an 8-bit image means that each sampling point has 2 8 2^828 = 256 levels, from the darkest to the brightest, can be distinguished into 256 levels
  4. Encoding: Encoding is the process of storing digitized pixel values ​​in a digital format. Encoding formats usually include JPEG, PNG, BMP, etc.

For a color picture, the content of this picture is composed of pixels on the order of resolution (example: 1920x1080). Similar to the nail painting sold on Taobao, it is a painting composed of 1920 times 1080 nails. The color of each nail is composed of three channels (RGB) (three kinds of nails called R, G, and B respectively) ),

These three channels are like the three central colors (red, yellow and blue) that we learn about watercolor painting. Different colors can be called out through these three colors. It
can also be understood that the three channels are three layers, and layers and layers between colors.

The color of each nail under each channel is expressed as a value from 0 to 255 from the perspective of the computer

Now that we know, from the perspective of computer science, pictures are numerical values. The so-called Opencv image processing, P-picture, beauty and other functions are actually changes in numerical values. Understand the mathematical formulas and logic in them, and the common algorithm functions of Opencv It's clear. Therefore, this paper mainly explains the algorithm method from the perspective of mathematical linear algebra.
insert image description here

1.2 Types of images

According to the imaging effect of images in vision or equipment, images can be divided into:

  1. Grayscale image: also known as black and white photos, single channel
  2. Color image: RGB, HSV, YUV, CMYK, Lab

Usually when we do specific color detection, we generally choose images in HSV space. In the following example, we can adjust the value of HSV to obtain the color we want in the image.

1.2.1 Color Segmentation

To do color feature detection, we need to convert our image RGB mode to HSV mode, H: color, S: saturation, V: lightness

With the following code, we can observe the color change in the picture by adjusting the value of HSV in the slider.

createTrackbar is an API in Opencv, which can quickly create a sliding control in the window displaying the image, which is used to manually adjust the threshold, and has a very intuitive effect. cv2.createTrackbar(trackbarName, windowName, value, count, onChange) creates a sliding bar
function

  • trackbarName: the name of the sliding space;
  • windowName: the name of the image window that the sliding space is attached to;
  • value: initialization threshold;
  • count: the scale range of the slider control; the minimum value is 0 by default.
  • onChange: the name of the callback function (the so-called callback function is the function that needs to pass in new variables every time the slider is modified)

cv2.getTrackbarPos gets the value at the position of the slider

import cv2
import numpy as np

#定义HSV滑块的值
def empty(a):
    h_min = cv2.getTrackbarPos("Hue Min","TrackBars")
    h_max = cv2.getTrackbarPos("Hue Max", "TrackBars")
    s_min = cv2.getTrackbarPos("Sat Min", "TrackBars")
    s_max = cv2.getTrackbarPos("Sat Max", "TrackBars")
    v_min = cv2.getTrackbarPos("Val Min", "TrackBars")
    v_max = cv2.getTrackbarPos("Val Max", "TrackBars")
    print(h_min, h_max, s_min, s_max, v_min, v_max)
    return h_min, h_max, s_min, s_max, v_min, v_max

#图片拼接,将4张图片拼接到一起
def stackImages(scale,imgArray):
    rows = len(imgArray)
    cols = len(imgArray[0])
    rowsAvailable = isinstance(imgArray[0], list)
    width = imgArray[0][0].shape[1]
    height = imgArray[0][0].shape[0]
    if rowsAvailable:
        for x in range ( 0, rows):
            for y in range(0, cols):
                if imgArray[x][y].shape[:2] == imgArray[0][0].shape [:2]:
                    imgArray[x][y] = cv2.resize(imgArray[x][y], (0, 0), None, scale, scale)
                else:
                    imgArray[x][y] = cv2.resize(imgArray[x][y], (imgArray[0][0].shape[1], imgArray[0][0].shape[0]), None, scale, scale)
                if len(imgArray[x][y].shape) == 2: imgArray[x][y]= cv2.cvtColor( imgArray[x][y], cv2.COLOR_GRAY2BGR)
        imageBlank = np.zeros((height, width, 3), np.uint8)
        hor = [imageBlank]*rows
        hor_con = [imageBlank]*rows
        for x in range(0, rows):
            hor[x] = np.hstack(imgArray[x])
        ver = np.vstack(hor)
    else:
        for x in range(0, rows):
            if imgArray[x].shape[:2] == imgArray[0].shape[:2]:
                imgArray[x] = cv2.resize(imgArray[x], (0, 0), None, scale, scale)
            else:
                imgArray[x] = cv2.resize(imgArray[x], (imgArray[0].shape[1], imgArray[0].shape[0]), None,scale, scale)
            if len(imgArray[x].shape) == 2: imgArray[x] = cv2.cvtColor(imgArray[x], cv2.COLOR_GRAY2BGR)
        hor= np.hstack(imgArray)
        ver = hor
    return ver



path = '1.jpg'
cv2.namedWindow("T
# 创建一个窗口,放置6个滑动条rackBars")
cv2.resizeWindow("TrackBars",640,240)
cv2.createTrackbar("Hue Min","TrackBars",0,179,empty)
cv2.createTrackbar("Hue Max","TrackBars",19,179,empty)
cv2.createTrackbar("Sat Min","TrackBars",110,255,empty)
cv2.createTrackbar("Sat Max","TrackBars",240,255,empty)
cv2.createTrackbar("Val Min","TrackBars",153,255,empty)
cv2.createTrackbar("Val Max","TrackBars",255,255,empty)

while True:
    img = cv2.imread(path)
    imgHSV = cv2.cvtColor(img,cv2.COLOR_BGR2HSV)
    # 调用回调函数,获取滑动条的值
    h_min = cv2.getTrackbarPos("Hue Min","TrackBars")
    h_max = cv2.getTrackbarPos("Hue Max", "TrackBars")
    s_min = cv2.getTrackbarPos("Sat Min", "TrackBars")
    s_max = cv2.getTrackbarPos("Sat Max", "TrackBars")
    v_min = cv2.getTrackbarPos("Val Min", "TrackBars")
    v_max = cv2.getTrackbarPos("Val Max", "TrackBars")
    lower = np.array([h_min,s_min,v_min])
    upper = np.array([h_max,s_max,v_max])
    # 获得指定颜色范围内的掩码
    mask = cv2.inRange(imgHSV,lower,upper)
    # 对原图图像进行按位与的操作,掩码区域保留
    imgResult = cv2.bitwise_and(img,img,mask=mask)


    # cv2.imshow("Original",img)
    # cv2.imshow("HSV",imgHSV)
    # cv2.imshow("Mask", mask)
    # cv2.imshow("Result", imgResult)

    imgStack = stackImages(0.6,([img,imgHSV],[mask,imgResult]))
    cv2.imshow("Stacked Images", imgStack)

    cv2.waitKey(1)

insert image description here

1.3 Relationship between pixels

Under computer vision, a digital image is actually in the shape of a matrix

insert image description here
The subscript of the pixel is also called the coordinate (x, y), we can find some spatial position relationship between the pixel and the pixel from the information of the coordinate

1.3.1 Domain

  • 4-field : For a pixel P with coordinates (x,y), P has four horizontal and vertical adjacent pixels, called 4-field ( x − 1 , y ) , ( x + 1 , y ) , ( x , y − 1 ) , ( x , y + 1 ) (x-1,y),(x+1,y),(x,y-1),(x,y+1)(x1,y),(x+1,y),(x,y1),(x,y+1)
  • Diagonal field : P has four diagonally adjacent pixels, ( x − 1 , y − 1 ) , ( x − 1 , y + 1 ) , ( x + 1 , y − 1 ) , ( x + 1 , y + 1 ) (x-1,y-1),(x-1,y+1),(x+1,y-1),(x+1,y+1)(x1,y1),(x1,y+1),(x+1,y1),(x+1,y+1)
  • 8-fields : 4-fields and diagonal fields collectively referred to as 8-fields of pixels

insert image description here

1.3.2 Connection and connectivity

Two pixels are connected if they are not only connected in spatial location but also other pixel values ​​meet the similarity criterion.
The pixel similarity criterion means that the gray values ​​of the pixels are equal, or that the pixel values ​​are all in a gray set v.

For example, with 8 levels of grayscale, the range of pixel values ​​is 0 to 256 ( 2 8 2^828 ), the pixel value range of 7-level gray scale is 0~128, then the pixel value range of 128-256 from level 7 to level 8 is a set of gray levels.

  • 4-connection : Pixels p and q are both values ​​in the set v, and q and p are mutually 4-fields
  • 8-connection : Pixels p, q are both values ​​in the set v, and q, p are 8-fields each other
  • Pixel connection : It is the concept of adding on the basis of connection. If the same pixel exists: p and q are connected, q is connected with r, r is connected with s, and s is connected with t, then p and t are connected.
  • insert image description here
    If the connected lines form a closed loop, they can also be called connected domains.
    insert image description here

The algorithm technology of image processing uses the spatial position relationship between pixels and pixel value, combined with various mathematics and linear algebraic logic to realize the change of image effect.

2. Image preprocessing technology

The input and output forms of image processing have the following forms:

enter output
single image single image
multiple pictures single image
single image Numbers or symbols etc.
multiple pictures Numbers or symbols etc.

The main purpose of image prediction processing is to eliminate irrelevant information in the image, extract useful information (much like feature extraction of structural data), to increase the detectability of relevant information, simplify data to the greatest extent, thereby improving feature extraction and image segmentation , the reliability of matching and recognition, and apply it to deep learning analysis and prediction, the specific process can be as follows
insert image description here

The image prediction processing flow mainly includes: grayscale transformation, geometric transformation, image enhancement, image filtering, etc.

2.1 Gray scale transformation

Grayscale transformation refers to a certain mapping transformation of the pixel grayscale value of an image, so that the brightness, contrast or color of the image can be adjusted to achieve a specific visual effect.

We record the original value of the pixel as s, the mapping function of the grayscale transformation is T(s), and the transformed pixel is d, namely:
d = T ( s ) d=T(s)d=T ( s )
The following introduces several common image processing grayscale transformation methods:

2.1.1 Linear Transformation

Linear transformation is a simple grayscale transformation method, which linearly maps the grayscale value of the image, usually expressed by the following formula:

g ( x , y ) = a ∗ f ( x , y ) + b g(x,y) = a*f(x,y) + b g(x,y)=af(x,y)+b

where f ( x , y ) f(x,y)f(x,y ) represents the gray value of the original image,g ( x , y ) g(x,y)g(x,y ) represents the transformed gray value, a and b are constants, and the magnitude and direction of the transformation can be controlled by adjusting them.

2.1.2 Logarithmic transformation

Logarithmic transformation can enhance the dark details of the image, usually expressed by the following formula:

g ( x , y ) = c ∗ l o g ( 1 + f ( x , y ) ) g(x,y) = c * log(1 + f(x,y)) g(x,y)=clog(1+f(x,y))

where f ( x , y ) f(x,y)f(x,y ) represents the gray value of the original image,g ( x , y ) g(x,y)g(x,y ) represents the transformed gray value,ccc is a constant that can be adjusted to control the magnitude of the transformation.

2.1.3 Power law transformation

The power law transformation can enhance the details of the bright part of the image, and is usually expressed by the following formula:

g ( x , y ) = c ∗ f ( x , y ) γ g(x,y) = c * f(x,y)^γg(x,y)=cf(x,y)c

where f ( x , y ) f(x,y)f(x,y ) represents the gray value of the original image,g ( x , y ) g(x,y)g(x,y ) represents the transformed gray value,cccc cγ are constants that can be adjusted to control the magnitude and direction of the transformation.

2.1.4 Inversion

Inversion is a simple grayscale transformation method that darkens areas with higher brightness values ​​and brightens areas with lower brightness values ​​in an image, thereby enhancing contrast. The implementation method of inversion is very simple, just need to invert the gray value of each pixel. For example, the pixel gray value in the original image is g, and the pixel gray value after inversion is 255-g.

2.1.5 Contrast enhancement

Contrast enhancement is a method of increasing the contrast of an image by remapping the grayscale values ​​in the image to a wider range. There are many ways to achieve contrast enhancement. One of the commonly used methods is grayscale stretching. Specifically:

Assuming that the pixel value range of the original image is [a,b], and stretching it linearly to the range of [0,255], the stretching function can be expressed as:

g ( x ) = ( x − a ) × 255 b − a , x ∈ [ a , b ] g(x)=\frac{(x-a)\times255}{b-a}, x\in[a,b] g(x)=ba(xa)×255,x[a,b]

Among them, xxx is the pixel value of the original image,g ( x ) g(x)g ( x ) is the stretched pixel value.

In OpenCV, it is also possible to use LUT (look-up table) to achieve grayscale stretching.

The specific steps are as follows:
(1) Calculate the stretching function g ( x ) g(x)g(x),其中 x ∈ [ a , b ] x\in[a,b] x[a,b] a a a andbbb are the minimum and maximum pixel values ​​of the original image, respectively. g ( x ) = ( x − a ) × 255 b − ag(x)=\frac{(xa)\times255}{ba}g(x)=ba(xa)×255
(2) Create a 256 256256 -element lookup tablelookup lookuplookup,其中 l o o k u p ( i ) lookup(i) l oo k u p ( i ) indicates that the pixel value in the original image isiiThe pixel value of pixel i after stretching.
(3) Traverse each pixel of the original image, find the corresponding new pixel value in the lookup table, and assign it to the output image.

Here is a code example for grayscale stretching using LUTs:

import cv2
import numpy as np

# 读取原图像
img = cv2.imread('test.jpg', cv2.IMREAD_GRAYSCALE)

# 计算拉伸函数
a = np.min(img)
b = np.max(img)
g = lambda x: (x-a)*255/(b-a)

# 创建查找表
lookup = np.zeros(256, dtype=np.uint8)
for i in range(256):
    lookup[i] = np.clip(g(i), 0, 255)

# 使用查找表进行灰度拉伸
img_stretched = cv2.LUT(img, lookup)

# 显示原图像和拉伸后的图像
cv2.imshow('Original Image', img)
cv2.imshow('Stretched Image', img_stretched)
cv2.waitKey(0)
cv2.destroyAllWindows()

In this example we use the np.clip() function to clip the pixel values ​​to [0,255] [0,255][0,255 ] to avoid the problem of output pixel values ​​out of range.

2.1.6 Contrast Compression

Contrast compression is a method of reducing the contrast of an image by remapping the gray-scale values ​​in an image into a narrower range. There are also many ways to implement contrast compression, one of which is the logarithmic transformation. Specifically, the logarithmic transformation takes the logarithm of the gray value in the image and then scales it to the range [0,255]. The formula is as follows:

g ( x ) = ( x − a ) × 255 b − a , x ∈ [ a , b ] g(x)=\frac{(x-a)\times255}{b-a}, x\in[a,b] g(x)=ba(xa)×255,x[a,b ] where,xxx is the pixel value of the original image,g ( x ) g(x)g ( x ) is the compressed pixel value.

In OpenCV, contrast compression can be achieved using a LUT (Look Up Table). The specific steps are as follows:
(1) Calculate the compression function g ( x ) g(x)g(x),其中 x ∈ [ a , b ] x\in[a,b] x[a,b] a a a andbbb are the compressed minimum pixel value and maximum pixel value, respectively. g ( x ) = ( x − a ) × 255 b − ag(x)=\frac{(xa)\times255}{ba}g(x)=ba(xa)×255
(2) Create a 256 256256 -element lookup tablelookup lookuplookup,其中 l o o k u p ( i ) lookup(i) l oo k u p ( i ) indicates that the pixel value in the original image isiiThe pixel value of pixel i after compression.
(3) Traverse each pixel of the original image, find the corresponding new pixel value in the lookup table, and assign it to the output image.

Here is a code example for implementing contrast compression using a LUT:

import cv2
import numpy as np

# 读取原图像
img = cv2.imread('test.jpg', cv2.IMREAD_GRAYSCALE)

# 计算压缩函数
a = 50
b = 200
g = lambda x: (x-a)*255/(b-a)

# 创建查找表
lookup = np.zeros(256, dtype=np.uint8)
for i in range(256):
    lookup[i] = np.clip(g(i), 0, 255)

# 使用查找表进行对比度压缩
img_compressed = cv2.LUT(img, lookup)

# 显示原图像和压缩后的图像
cv2.imshow('Original Image', img)
cv2.imshow('Compressed Image', img_compressed)
cv2.waitKey(0)
cv2.destroyAllWindows()

2.1.7 Gamma Correction

Gamma correction is a method of adjusting the brightness of an image by nonlinearly transforming the gray values ​​in the image. The principle of gamma correction is to map the gray value in the original image through a nonlinear function, so that the area with lower brightness value is darkened, and the area with higher brightness value is brightened. Specifically, gamma correction uses the following formula for grayscale value transformation:

g = A g γ g^ = A g^γg=Agc

Among them, g represents the pixel gray value in the original image, g' represents the pixel gray value after gamma correction, and A and γ are parameters. A controls the magnitude of the gray value, usually A=1, and γ controls the speed at which the gray value changes, usually the range of γ is between [0.5, 2.5]. When γ is less than 1, the area with low brightness value in the image will be enlarged, thereby improving the contrast of the image; when γ is greater than 1, the area with high brightness value in the image will be enlarged, thereby making the image brighter .

Inversion, contrast enhancement, contrast compression and gamma correction are commonly used image grayscale transformation methods, which can be used to adjust image brightness, contrast and other attributes. Selecting the appropriate grayscale transformation method according to actual needs can improve the visual effect of the image and improve the effect of image analysis and processing.

2.2 Image histogram

Image histogram: A histogram refers to counting the frequency of occurrences of the pixel values ​​​​(0~255) in the gray scale range of the entire image, and the histogram generated based on this is called an image histogram. The histogram reflects the distribution of image gray levels. is the statistical feature of the image,

If we use RBG to implement histograms for the three channels respectively, then these three histograms represent the characteristics of this image.

insert image description here

2.2.1 Histogram equalization

Histogram equalization is a more advanced method of enhancing image contrast. The basic idea is to transform the pixel values ​​of the image so that the pixel values ​​are evenly distributed throughout the gray scale, thereby enhancing the contrast of the image.
insert image description here
Equalization conversion
insert image description here
You can see that the equalization is to stretch the content in the histogram left and right. (Let the difference between the pixel areas in the image appear small, and balance the color of the entire picture, so that we will not find that a certain color stands out compared to other positions when we observe the picture)

Histogram equalization: Histogram equalization is a common image enhancement method. It equalizes the gray histogram of the image to make the brightness distribution of the image more uniform, thereby enhancing the contrast and details of the image. The specific implementation method can refer to the following steps:

(1) Calculate the gray histogram of the original image;
(2) Calculate the cumulative distribution function of the gray histogram;
(3) Map the gray value of the original image according to the cumulative distribution function;
(4) Get the equalized image .

Specifically, suppose the pixel value range of the original image is [0,255], and its grayscale histogram is H ( i ) , i ∈ [ 0 , 255 ] H(i),i\in[0,255]H(i),i[0,255],CDF为 C ( i ) , i ∈ [ 0 , 255 ] C(i),i\in[0,255] C(i),i[0,255 ] , the equalized pixel value isg ( i ) , i ∈ [ 0 , 255 ] g(i),i\in[0,255]g(i),i[0,255 ] , with: C ( i ) = ∑ j = 0 i H ( j ) C(i)=\sum_{j=0}^i H(j)C(i)=j=0iH(j) g ( i ) = ⌊ 255 × C ( i ) M N ⌋ g(i)=\lfloor \frac{255 \times C(i)}{MN} \rfloor g(i)=MN255×C(i) Among them,MMM andNNN is the width and height of the original image, respectively. In OpenCV, you can use the equalizeHist() function to achieve histogram equalization. It should be noted that histogram equalization can sometimes lead to image noise enhancement, so it needs to be used with caution in practical applications.

2.3 Spatial filtering

Spatial filtering is an image processing method based on the local neighborhood pixels of the image, which changes the characteristics of the image by performing weighted average or other mathematical operations on the neighborhood pixels around the image pixels.

空间滤波在图像去噪、边缘检测、图像增强等方面有着广泛的应用。

常见的空间滤波算法包括均值滤波、中值滤波、高斯滤波等。

  • 均值滤波:将像素点周围的邻域像素的灰度值进行平均,用来减少图像中的噪声。
  • 中值滤波:用邻域像素的中值来代替当前像素值,可以有效地去除图像中的椒盐噪声等非线性噪声。
  • 高斯滤波:将邻域像素的灰度值按照一定的权值进行加权平均,其中权值由高斯函数计算得到,可以有效地平滑图像并保留较好的图像细节。

空间滤波的一般步骤如下:

  1. 定义一个固定大小的滤波器(也称为卷积核或模板),滤波器通常是一个矩阵。
  2. 将滤波器中心对准当前像素,将滤波器中的所有元素与当前像素的邻域像素进行加权或其他数学运算,得到当前像素的输出值。
  3. 移动滤波器,重复步骤2,直到所有像素都被处理过。

使用不同的滤波(卷积核也就是矩阵)来实现图像像素的改变,其中的主要有三功能分别是图像的模糊/去噪、图像梯度/边缘发现、图像锐化/凸图像增强,我这里都把这些功能都看成是图像增强, 因为这些操作都是修改了图像的像素。

使用滤波增强强调图像中感兴趣的部分,增强图像的高频成分,可以使图像中物体的轮廓清晰,细节清晰; 增强低频分量可以降低图像中噪声的影响,(对图像中的像素值进行处理),也可以是使图像变得模糊。

2.3.1 均值滤波

均值滤波是指用当前像素点周围N*N个像素点的均值来代替当前像素值

insert image description here
112 ∗ 6 + 110 ∗ 4 + 60 ∗ 8 + 6 ∗ 70 24 = 83.33 \frac{112*6+110*4+60*8+6*70}{24}=83.33 241126+1104+608+670=83.33

2.3.2 Box Filtering

Box filtering does not calculate the pixel mean, it can freely choose whether to normalize the result of mean filtering, that is, it can freely choose whether the filtering result is the average of the sum of neighboring pixel values ​​or the sum of neighboring pixel values.

2.3.3 Gaussian filtering

When performing mean filtering and box filtering, the weight of each pixel in its neighborhood is equal. The Gaussian filter will increase the weight of the center point, and reduce the weight of the away from the center point, so as to calculate the sum of different weights of each pixel value in the neighborhood.

2.3.4 Median filtering

Replace the pixel value of the current pixel with the median value of all pixel values ​​in the neighborhood.

2.3.5 Bilateral filtering

Bilateral filtering is a nonlinear filtering method that preserves edge information while smoothing images. Its core idea is to achieve the effect of filtering by weighting the similarity between the spatial position of the pixel and the pixel value.

The bilateral filtering formula is:

insert image description here
其中, I f i l t e r e d ( x , y ) I_{filtered}(x,y) Ifiltered(x,y ) represents the filtered pixel value,I ( i , j ) I(i,j)I(i,j ) represents the neighborhood pixel( i , j ) (i,j)(i,j ) ,fp ( i , j ) f_{p}(i,j)fp(i,j ) means pixel( i , j ) (i,j)(i,j ) with the center pixel( x , y ) (x,y)(x,y ) , the similarity ofws w_{s}wsSum wr w_{r}wrrepresent the spatial weight and the pixel weight respectively, W p ( x , y ) W_{p}(x,y)Wp(x,y ) is the sum of normalized weights, which is used to ensure that the range of filtered pixel values ​​is[ 0 , 255 ] [0,255][0,255 ] between.

In practice, fp ( i , j ) f_{p}(i,j)fp(i,j ) is usually calculated using a Gaussian function, the spatial weightws w_{s}wsAnd pixel value weight wr w_{r}wrIt can also be calculated using Gaussian functions, and their values ​​depend on two parameters, namely the spatial domain parameter and the grayscale domain parameter. The spatial domain parameter determines the radius of the filter, and the grayscale parameter determines the sensitivity of the filter to grayscale differences.

2.3.6 Edge sharpening

The image gradient calculates the speed at which the image changes

For the edge part of the image, the gray value changes greatly, and the gradient value is also large; on the contrary, for the smooth part of the image, the gray value changes small, and the corresponding gradient value is also small. In general, the gradient calculation of an image is the edge information of the image.

In fact, the gradient is the derivative, but the image gradient generally obtains the approximate value of the gradient by calculating the difference between the pixel values, which can also be said to be an approximate derivative. This derivative can be expressed in calculus.

In linear algebraic calculus, the first-order differential of a one-dimensional function is defined:
dfdx = lim ⁡ ϵ → 0 f ( x + ϵ ) − f ( x ) ϵ \frac{df}{dx}=\lim_{\epsilon\ rightarrow 0}\frac{f(x+\epsilon)-f(x)}{\epsilon}dxdf=ϵ0limϵf(x+) _f(x)

In the image is a two-dimensional function f ( x , y ) f(x,y)f(x,y ) , there are two directions, one x direction and one y direction, so partial differentiation is required:
∂ f ( x , y ) ∂ x = lim ⁡ ϵ → 0 f ( x + ϵ , y ) − f ( x , y ) ) ϵ \frac{\partial f(x,y)}{\partial x}=\lim_{\epsilon\rightarrow 0}\frac{f(x+\epsilon,y)-f(x,y)}{\ epsilon}xf(x,y)=ϵ0limϵf(x+ϵ ,y)f(x,y)
∂ f ( x , y ) ∂ y = lim ⁡ ϵ → 0 f ( x , y + ϵ ) − f ( x , y ) ϵ \frac{\partial f(x,y)}{\partial y}=\ lim_{\epsilon\rightarrow 0}\frac{f(x,y+\epsilon)-f(x,y)}{\epsilon}yf(x,y)=ϵ0limϵf(x,y+) _f(x,y)

那个这个二维函数总的梯度就为:
G = ( ∂ f ( x , y ) ∂ x ) 2 + ( ∂ f ( x , y ) ∂ y ) 2 G= \sqrt{(\frac{\partial f(x,y)}{\partial x})^2+(\frac{\partial f(x,y)}{\partial y})^2} G=(xf(x,y))2+(yf(x,y))2

每一个像素的梯度是由它周围8个像素共同确定的

要想计算出图像的边缘的基本特征,就需要类似的空间滤波,在这里空间滤波也叫它算子,主要用于计算边缘的算子有Sobel、Robort、Laplacian。

Sobel算子

Sobel X方向算子模版:

G x = [ − 1 0 + 1 − 2 0 + 2 − 1 0 + 1 ] (2) G_x=\left[ \begin{matrix} -1 & 0 & +1\\ -2 & 0 & +2\\ -1 & 0 & +1\\ \end{matrix} \right] \tag{2} Gx= 121000+1+2+1 (2)

Sobel y function function:
G y = [ − 1 − 2 − 1 0 0 0 + 1 + 2 + 1 ] (3) G_y=\left[ \begin{matrix} -1 & -2 & -1\\; 0&0&0\\+1&+2&+1\\\end{matrix}\right] \tag{3}Gy= 10+120+210+1 (3)

Robort operator

Robort operator template
G = [ − 1 0 0 1 ] (4) G=\left[ \begin{matrix} -1 & 0\\ 0 & 1 \end{matrix} \right] \tag{4}G=[1001](4) G = [ 0 − 1 1 0 ] (4) G_=\left[ \begin{matrix} 0 & -1 \\ 1 & 0\end{matrix} \right] \tag{4} G=[0110](4)

Matrix by matrix multiplication, such as a m ∗ nm*nmn and ar ∗ cr*crc must ben = rn=rn=r can happen,
we know that our operator is2 ∗ 2 2*222 , corresponding to the image that needs to be changed, it must also takem ∗ 2 m*2mThe shape of 2 is multiplied by the operator, but it is better to have the same shape as the operator.

insert image description herechangeinsert image description here
insert image description here

Laplacian operator

The Laplacian operator is based on the second-order differential calculation, which is defined as follows:

G = ∂ 2 f ( x , y ) ∂ x 2 + ∂ 2 f ( x , y ) ∂ y 2 G= \frac{\partial^2 f(x,y)}{\partial x^2}+\frac{\partial^2 f(x,y)}{\partial y^2} G=x22f(x,y)+y22f(x,y)

其中:
∂ 2 f ( x , y ) ∂ x 2 = f ( x + 1 , y ) + f ( x − 1 , y ) − 2 f ( x , y ) \frac{\partial^2 f(x,y)}{\partial x^2}=f(x+1,y)+f(x-1,y)-2f(x,y) x22 f(x,y)=f(x+1,y)+f(x1,y)2f(x,y)
∂ 2 f ( x , y ) ∂ y 2 = f ( x , y + 1 ) + f ( x , y − 1 ) − 2 f ( x , y ) \frac{\partial^2 f(x,y)}{\partial y^2}=f(x,y+1)+f(x,y-1)-2f(x,y) y22 f(x,y)=f(x,y+1)+f(x,y1)2f(x,y)

Laplacian function
G = [ 0 1 0 1 − 4 1 0 1 0 ] (4) G_=\left[ \begin{matrix} 0 & 1 & 0 \\ 1 & -4 & 1 \\ 0 & 1 &0 \end{matrix}\right]\tag{4}G= 010141010 (4)

2.4 Coordinate transformation

The coordinate transformation of the image is also called the geometric calculation of the image. The common basic transformations are: image translation, mirroring, scaling, rotation, affine

Often used in deep learning. data augmentation

cv2.warpAffine() affine transformation

dst = cv2.warpAffine(src, M, dsize[, dst[, flags[, borderMode[, borderValue]]]])
  • src: input image
  • M: 2*3 transformation matrix (transition matrix)
  • dsize: the size of the output image, the format is (cols,rows), width corresponds to cols, height corresponds to rows
  • flags: optional parameter, combination of interpolation methods (int type), default value INTER_LINEAR
  • borderMode: optional parameter, border pixel mode (int type), default value BORDER_CONSTANT
  • borderValue: optional parameter, border padding value; by default, the default value of Scalar() is 0

Think of the image as a matrix
warpAffine(img,M,(rows,cols)) to achieve basic affine transformation effects, but in this case black borders will appear. The last parameter is borderValue, the color of the border fill, the default is black, M is a transformation matrix, and the Opencv function passes the image matrix

2.4.1 Image translation

MM during image translationM is a transformation matrix: [ 1 0 dx 0 1 dy ] (3) \left[ \begin{matrix} 1 & 0 & dx\\ 0 & 1 & dy \end{matrix} \right] \tag{3}[1001dxdy](3)

The image translation formula is as follows: where dx dxdx, d y dy d y means at( x , y ) (x,y)(x,y ) direction, as follows:
insert image description here

2.4.2 Rotation

MM in image rotationLet me equal the value: [ cos ⁡ θ − sin ⁡ θ 0 sin ⁡ θ cos ⁡ θ 0 ] (3) \left[ \begin{matrix} \cos\theta & -\sin\theta & 0\\ \sin \theta&\cos\theta&0\end{matrix}\right]\tag{3}[cosisinisinicosi00](3)

insert image description here

2.4.3 Scaling

MM in image scalingM is a transformation matrix: [ S x 0 0 0 S y 0 ] (3) \left[ \begin{matrix} S_x & 0 & 0\\ 0 & S_y & 0 \end{matrix} \right] \tag{ 3}[Sx00Sy00](3)
insert image description here

2.4.4 Mirroring

Let the size of the image be m ∗ nm*nmn

Horizontal mirroring, the x position is unchanged, only the y position is reversed:
α x , y = α x , n − y + 1 \alpha_{x,y}=\alpha_{x,n-y+1}ax,y=ax,ny+1

Vertical mirroring, the y position is unchanged, only the x position is reversed:
α x , y = α m − x + 1 , y \alpha_{x,y}=\alpha_{m-x+1,y}ax,y=amx+1,y

Diagonal mirror image, x, y positions are reversed
α x , y = α m − x + 1 , n − y + 1 \alpha_{x,y}=\alpha_{m-x+1,n-y+1 }ax,y=amx+1,ny+1

2.4.5 Image Correction

matrix = cv2.getPerspectiveTransform(pts1,pts2)
imgOutput = cv2.warpPerspective(img,matrix,(width,height))

2.4.6 Image Scaling

In opencv, we use the image pyramid for image scaling, which is similar to the resize function.
According to the method, the image pyramid can be divided into:
Gaussian pyramid: shrinking the image is also called downsampling, and using the PryDown function
to enlarge the image is also called upsampling. Sampling, using the PryUp function

2.5 Image Interpolation

insert image description here

Linear interpolation method, all methods use the same interpolation kernel, regardless of the position of the pixel to be interpolated, this method will use the edge blur in the image

When performing a resize operation on two-dimensional data, the original integer coordinates are often transformed into decimal coordinates. An intuitive and effective interpolation method for non-integer coordinate values ​​is bilinear interpolation.

2.5.1 Nearest Neighbor Interpolation

Algorithm idea: The purpose of interpolation is to obtain the pixel value of the unknown target image based on the pixel value of the known image. The interpolation transformation process is as shown in the figure below

Assuming the original image (matrix), the position of each pixel is expressed as srcx, srcy src_x, src_ysrcx,srcy t a g x , t a g y tag_x,tag_y tagx,tagyIndicates the position coordinates of the target pixel point obtained by interpolation.
insert image description here
How to get the position of the tag by the following formula:

src image:
w:193 h:153
tag image:
w:375 h:284

First obtain the mapping relationship between the original image and the target image
ratio = tagwsrcw = tagxscrx = 375 193 ratio_w = \frac{tag_w}{src_w} = \frac{tag_x}{scr_x}=\frac{375}{193}ratiow=srcwtagw=scrxtagx=193375
r a t i o h = t a g h s r c h = t a g y s c r y = 284 153 ratio_h = \frac{tag_h}{src_h} = \frac{tag_y}{scr_y}=\frac{284}{153} ratioh=srchtagh=scrytagy=153284

By mapping the value, get the coordinate position of the tag
tagx = int ( srcx ∗ ratio ) tag_x = int(src_x*ratio_w)tagx=int(srcxratiow)
t a g y = i n t ( s r c y ∗ r a t i o h ) tag_y = int(src_y*ratio_h) tagy=int(srcyratioh)

You can put src ( x , y ) src_(x,y)src(x,y ) pixel value assigned totag ( x , y ) tag_(x,y)tag(x,y)

2.5.2 Single linear interpolation

insert image description here
As shown in the figure above, insert a blue pixel block in the middle of the red block

Find a straight line between two points, any point between these two points falls on this straight line, and any two points between three points have the same slope

y − y 1 x − x 1 = y 2 − y 1 x 2 − x 1 \frac{y-y_1}{x-x_1}=\frac{y_2-y_1}{x_2-x_1} xx1and and1=x2x1y2y1

整理一下
y = x 2 − xx 2 − x 1 y 1 + x − x 1 x 2 − x 1 y 2 y=\frac{x_2 -x}{x_2 -x_1}y_1+\frac{x -x_1}{x_2 -x_1}y_2y=x2x1x2xy1+x2x1xx1y2

The formula for calculating the pixel value of the blue block is as follows:

f ( x , y ) = x 2 − x x 2 − x 1 f ( x 1 , y 1 ) + x − x 1 x 2 − x 1 f ( x 2 , y 2 ) f(x,y)=\frac{x_2 -x}{x_2 -x_1}f(x_1,y_1)+\frac{x-x_1}{x_2 -x_1}f(x_2,y_2) f(x,y)=x2x1x2xf(x1,y1)+x2x1xx1f(x2,y2)

2.5.3 Bilinear interpolation

insert image description here

insert image description here

As shown in the figure above, we can achieve linear interpolation through the red pixel blocks to obtain the pixel values ​​of the blue and green pixel blocks. The specific formula is as follows:

First calculate the blue pixel block, where x, x1, x0 are the position coordinates of the pixel block in the matrix, and the same is true for y:

f ( x , y 0 ) = x 1 − x x 1 − x 0 f ( x 0 , y 0 ) + x − x 0 x 1 − x 0 f ( x 1 , y 0 ) f(x,y_0)=\frac{x_1-x}{x_1 -x_0}f(x_0,y_0)+\frac{x-x_0}{x_1 - x_0}f(x_1,y_0) f(x,y0)=x1x0x1xf(x0,y0)+x1x0xx0f(x1,y0)
f ( x , y 1 ) = x 1 − x x 1 − x 0 f ( x 0 , y 1 ) + x − x 0 x 1 − x 0 f ( x 1 , y 1 ) f(x,y_1)=\frac{x_1-x}{x_1 -x_0}f(x_0,y_1)+\frac{x-x_0}{x_1 - x_0}f(x_1,y_1) f(x,y1)=x1x0x1xf(x0,y1)+x1x0xx0f(x1,y1)

再计算绿色块的像素
f ( x , y ) = y 1 − yy 1 − y 0 f ( x , y 0 ) + y − y 0 y 1 − y 0 f ( x , y 1 ) f(x,y )=\frac{y_1-y}{y_1 -y_0}f(x,y_0)+\frac{y-y_0}{y_1 - y_0}f(x,y_1)f(x,y)=y1y0y1yf(x,y0)+y1y0yy0f(x,y1)

2.5.4 Bicubic

Each pixel in the target image is weighted by the gray value of 4x4=16 pixels around the corresponding point on the original image to obtain an enlarged effect closer to the high-resolution image.

insert image description here

The distances between the 4 red pixel blocks in the X-axis direction of the row and the green pixel block are: 2, 1, -1, -2 The distances between the
4 red pixel blocks in the Y-axis direction of the row and the green pixel block are 2, 1, - 1, -2

If you want to get the value of the green pixel block (x, y), use the BiCubic basis function to calculate The weight of 16 pixels is obtained, and the value of the green pixel (x, y) is equal to the weighted superposition of 16 pixels.

The parameter x in the BiCubic function indicates the distance from the pixel point to the target pixel point, for example, the distance between (x-2, y-2) and (x, y) is (2, 2), so (x-2, y- 2) The abscissa weight is W(2), and the ordinate weight is W(2).

BiCubic weight formula is as follows
S ( x ) = { 1 − 2 ∣ x ∣ 2 + x 3 , ∣ x ∣ < 1 4 − 8 ∣ x ∣ + 5 ∣ x ∣ 2 − ∣ x ∣ 3 , 1 ≤ ∣ x ∣ < 2 0 , ∣ x ∣ ≥ 2 ( ) S(x)= \begin{cases} 1-2|x|^2+x^3,\quad |x|<1\\ 4-8|x|+5 |x|^2-|x|^3,\quad 1\leq |x|<2 \\ 0, \quad |x|\geq 2 \end{cases} \tag{}S(x)= 12∣x2+x3,x<148∣x+5∣x2x3,1x<20,x2()

The formula for finding (x,y) pixel value:

f ( x , y ) = A B C T f(x,y)=ABC^T f(x,y)=ABCT
A = [ S ( 2 ) S ( 1 ) S ( − 1 ) S ( − 2 ) ] A=[S(2) S(1) S(-1) S(-2)] A=[S(2)S(1)S(1)S(2)]
B = S r c [ x − 1 : x + 1 , y − 1 : y + 1 ] B=Src[x-1:x+1,y-1:y+1] B=Src[x1:x+1,y1:y+1]
C = [ S ( 2 ) S ( 1 ) S ( − 1 ) S ( − 2 ) ] C=[S(2) S(1) S(-1) S(-2)] C=[S(2)S(1)S(1)S(2)]

2.6 Affine Transformation

insert image description here

Image affine transformation refers to mapping points on one two-dimensional plane to points on another two-dimensional plane through a set of linear transformations, so as to realize operations such as rotation, translation, and scaling of images. It can be expressed as a linear transformation in matrix form as follows:
insert image description here

2.7 Data Enhancement Processing

Data augmentation refers to the augmentation of datasets through a series of image processing techniques to improve the generalization ability and robustness of models. Data enhancement of color images generally includes the following:

  1. Color transformation: Change the color of the image, such as changing the hue, saturation and brightness of the image. Color transformation can make the model more robust to color changes.
  2. Geometric transformation: Change the geometric structure of the image, such as random cropping, rotation, flipping, etc. Geometric transformation can make the model more robust to changes in the position and angle of objects.
  3. Noise addition: Add noise to the image, such as Gaussian noise, salt and pepper noise, etc. Noise addition can make the model more robust to the influence of image noise.
  4. Image reconstruction: Decompose the image into different frequency components, enhance each component, and then synthesize the components into a new image. Image reconstruction can make the model more robust to image details and textures.
  5. Contrast Enhancement: Enhance the contrast of the image, making the brightness and color of the image more vivid. Contrast enhancement can make it easier for the model to distinguish different objects.

3. Image Feature Extraction

Image feature extraction refers to extracting useful features that can represent images from images. Image features are usually composed of pixel values ​​or a combination of pixel values, such as edges, corners, textures, etc. Commonly used feature extraction methods include: color histogram, gradient histogram, local binary pattern (LBP), etc. These methods obtain image descriptions by performing statistics or calculations on image pixel values.

Compared with image preprocessing, feature extraction pays more attention to extracting useful information from images, and provides effective input for subsequent tasks such as image classification, target detection, and image recognition. Image preprocessing is mainly to perform noise reduction, scale transformation, rotation, cropping and other operations on the original image, so as to better adapt to specific application scenarios.

In deep learning, image feature extraction is a very critical step. Traditional image feature extraction methods need to manually select features, which are more dependent on human experience. Image feature extraction based on deep learning can use deep learning models such as convolutional neural network (CNN) to automatically learn image features, avoiding the process of manual feature selection. This image feature extraction method based on deep learning has achieved very good results in tasks such as image classification, object detection, and image segmentation, and has become one of the current research hotspots in the field of image processing.

3.1 Image binarization

Image binarization is the
process of setting all the grayscale values ​​of pixels on the image to black (0) or white (255), that is, to divide the entire image into obvious black and white effects.

The most commonly used is threshold segmentation, which sets the pixels whose gray value is greater than the threshold in the image to white (or black), and the points smaller than the threshold to black (or white), and the threshold (Threshold) is usually represented by T.

insert image description here
In addition, because the threshold selection directly affects the binarization segmentation effect, adaptive threshold segmentation, how to choose an appropriate threshold is the core of the algorithm, so there is a method to automatically calculate the segmentation threshold through the algorithm called adaptive threshold segmentation

Common adaptive threshold segmentation has bimodal method and maximum between-class variance method (OTSU)
insert image description here

3.1.1 Bimodal method

The bimodal method assumes that the gray histogram of the image is composed of two peaks. By finding the two peak points of the histogram, the median value of them is used as the threshold for binarization. This method works well for images with a distinctly bimodal distribution.

Algorithm steps:

Perform grayscale processing on the image to obtain a grayscale histogram. Find the two peak points of the grayscale histogram. The median of the two peak points is used as the threshold for binarization.

Formula:
Let h ( i ) h(i)h ( i ) is the gray valueiiThe number of pixels of i , ppp is the gray valueiiThe ratio of the pixels of i to the total number of pixels, then the mean value of the gray histogram μ \muμ and varianceσ 2 \sigma^2p2 can be expressed as:μ = ∑ i = 0 L − 1 i ⋅ p ( i ) \mu = \sum_{i=0}^{L-1} i \cdot p(i)m=i=0L1ip(i) σ 2 = ∑ i = 0 L − 1 ( i − μ ) 2 ⋅ p ( i ) \sigma^2 = \sum_{i=0}^{L-1} (i-\mu)^2 \cdot p(i) p2=i=0L1(im )2p ( i ) of which,LLL represents the number of gray levels.

3.1.2 Maximum between-class variance method

The maximum inter-class variance method is an adaptive threshold method, which can automatically select the appropriate threshold according to the local gray distribution of the image.

Algorithm principle:

The core idea of ​​the maximum between-class variance method is to divide the image into two classes so that the variance between the two classes is the largest. The larger the variance, the more obvious the difference between the two categories, so the selected threshold is more appropriate.

The specific implementation steps are as follows:

  1. Count the grayscale histogram of the image to get the number of pixels in each grayscale.
  2. Calculate the weight of each gray level, that is, the proportion of the gray level to the total number of pixels.
  3. Starting from the gray level 1, the inter-class variance of each gray level is calculated cyclically, that is, the variance between the two classes after the image is divided into two classes by the gray level. The variance calculation formula is: σ 2 = ω 1 ( μ 1 − μ t ) 2 + ω 2 ( μ 2 − μ t ) 2 σ 2 = ω 1 ​ ( μ 1 ​ − μ t ​ ) 2 + ω 2 ​ ( μ 2 ​ − μ t ​ ) 2 σ^2=ω_1(μ_1−μ_t)^2+ω^2(μ_2−μ_t)^2σ^2=ω_1​(μ_1​−μ_t​)^2+ω_2​( μ_2​−μ_t​)^2p2=oh1( m1mt)2+oh2 (m2mt)2 p2=oh1​( m1mt)2+oh2​( m2mt)2 of whichω 1 \omega_1oh1sum omega 2 \omega_2oh2Respectively, the ratio of the two types of pixels to the total number of pixels, μ 1 \mu_1m1μ 2 \mu_2m2are the average gray value of the two types of pixels, μ t \mu_tmtis the total average gray value.
  4. Find the gray level with the largest variance between classes as the threshold, which is the adaptive threshold of the image.

official:

最最米间方差法的设计公式的:σ 2 = ω 1 ( μ 1 − μ t ) 2 + ω 2 ( μ 2 − μ t ) 2 σ 2 = ω 1 ​ ( μ 1 ​ − μ t ​ ) 2 + ω 2 ​ ( μ 2 ​ − μ t ​ ) 2 σ^2=ω_1(μ_1−μ_t)^2+ω^2(μ_2−μ_t)^2σ^2=ω_1​(μ_1​−μ_t​)^ 2+ω_2​(μ_2​−μ_t​)^2p2=oh1( m1mt)2+oh2 (m2mt)2 p2=oh1​( m1mt)2+oh2​( m2mt)2

Among them, ω 1 \omega_1oh1sum omega 2 \omega_2oh2Respectively, the ratio of the two types of pixels to the total number of pixels, μ 1 \mu_1m1μ 2 \mu_2m2are the average gray value of the two types of pixels, μ t \mu_tmtis the total average gray value.

3.2 Morphological processing

Erosion & dilation are the core operations of image morphology

Corrosion: The content in the image shrinks inward along the boundary (thinning the lines in the image and removing the pixel value of some lines) Expansion: The
content in the image expands inward along the boundary (thinning the lines in the image thicker, increasing the pixel value of a part of the line)

The logic and function of these two operations are somewhat similar to the smoothing process using filters mentioned in the previous article. The difference is that erosion seeks the minimum value of pixels in the filter kernel, while expansion seeks the maximum value. And copy the calculated value to the pixel at the anchor position.

3.2.1 Opening & closing operation

The opening operation is to corrode the advanced nature of the image, and then perform the dilation operation. It can be used to remove details (noise) outside the image.

The closing operation is to expand the image first, and then perform the erosion operation. It can be used to erase the internal details (noise) of the image.

Although erosion and dilation are inverse operations, neither opening nor closing operations will restore the image to its original state.

3.3 Feature descriptors

Feature descriptor is an algorithm used to describe local features of images in image processing and computer vision. It is a technique to represent feature points in an image as a mathematical vector, which can be used in applications such as image matching, object detection and recognition.

The algorithm principle of the feature descriptor usually includes the following steps:

  1. Feature point detection: First, it is necessary to detect unique and distinguishable local feature points in the image, such as corners, edges, spots, etc.
  2. Feature point description: For each feature point, it is necessary to calculate the feature value or feature vector of its surrounding pixels, such as gradient direction, color, texture, etc., to describe the local features of the feature point.
  3. Feature point matching: By comparing feature point descriptors in different images, feature point matching can be performed for tasks such as image registration, target tracking and recognition.

OpenCV provides a variety of image feature extraction algorithms and supports a variety of feature descriptors.
The following are some commonly used feature descriptors:

  • SIFT (Scale-Invariant Feature Transform): Scale-invariant feature transformation is a descriptor based on local features. The SIFT descriptor is generated by computing a histogram of gradient orientations around keypoints, which is scale-invariant and rotation-invariant.
  • SURF (Speeded Up Robust Features): SURF is an accelerated version of SIFT, which uses some approximation algorithms to speed up calculations, and has scale invariance and rotation invariance similar to SIFT.
  • ORB (Oriented FAST and Rotated BRIEF): ORB is a feature descriptor with a faster calculation speed, which is an improved version based on the FAST corner detector and the BRIEF binary descriptor. ORB descriptors are rotation invariant and scale invariant.
  • HOG (Histogram of Oriented Gradients): HOG is a feature descriptor for target detection and classification. It is generated by computing a histogram of gradient orientations in the image and is directionally and scale invariant.
  • LBP (Local Binary Patterns): LBP is a local feature descriptor, which is generated by calculating the difference between each pixel in the image and its neighboring pixels. The LBP descriptor has rotation invariance and gray scale invariance.

Guess you like

Origin blog.csdn.net/weixin_42010722/article/details/127647384