[Computer Vision] Some basic operations on image processing

Image smoothing filter processing

Image smoothing means that due to the influence of factors such as the sensor and the atmosphere, certain areas with excessive brightness changes or some bright spots (also called noise) will appear on remote sensing images. This processing method that smoothes the image brightness in order to suppress noise is image smoothing. Image smoothing is actually low-pass filtering, and the smoothing process will cause the edges of the image to be blurred.

mean filter

Linear filtering operates on the entire image matrix.

  • Features: The larger the window size, the better the denoising effect. Of course, the longer the calculation time will be, and the more serious the image distortion will be. In actual processing, a balance must be struck between distortion and denoising effects, and an appropriate window size must be selected.
  • [Advantages] The algorithm is simple, does not require complex image processing technology, and does not require a lot of time and space. And it can effectively eliminate noise and improve image quality.
  • [Disadvantages] The mean filter itself has inherent flaws, that is, it cannot protect image details well. It also destroys the details of the image while denoising the image, thus making the image blurred and unable to remove noise points well. .
  • [Scope of application] Salt and pepper noise
  • [Unsuitable range] Low illumination, strong noise images
calculation process

For a given window size, the values ​​within the window are averaged and the median value is used as the window center value . For example, input 3, which means using a 3 × 3 3 \times 33×3 window to recalculate the pixel values ​​of the given image matrix. [Note: Those pixels on the edge (cannot become3 × 3 3 \times 33×3 ), calculate some valid pixels]
As shown in the figure below, for pixel 1 in the first row and first column, still taking it as the center, take the mean value of the orange area as its new value.
Please add image description

python implementation
import cv2
import numpy as np

x = np.array([[1, 3, 2, 4, 6, 5, 7]])
# 对图像进行均值滤波,指定核大小为5x5
result1 = cv2.blur(x, (3, 3))
print(result1)

Gaussian filter

Linear filtering operates on the entire image matrix.

  • 【Features】
    1. In Gaussian filtering, the width and height of the kernel can be different, but they must both be odd.
    2. For mean filtering, the weight of each pixel in its neighborhood is equal. In Gaussian filtering, the weight value of the center point will be increased, and the weight value of the point far away from the center point will be decreased. On this basis, the sum of different weights of each pixel value in the neighborhood is calculated.
    3. The Gaussian kernel can be regarded as a convolution kernel, which is also a two-dimensional filter matrix. The difference is that the Gaussian kernel is weighted on the basis of the ordinary convolution kernel (the weight matrix is ​​calculated by the Gaussian function).
  • [Advantages] While removing noise and detailed information in the image, the main features of the image (such as the contours and edges of the image) are retained; the blur degree of the Gaussian filter can be controlled by adjusting the size and standard deviation of the Gaussian kernel.
  • [Disadvantages] The computational complexity is high because the Gaussian filter requires a convolution operation.
  • [Scope of application] Eliminate Gaussian noise
calculation process

Gaussian filtering is usually implemented in two ways, one is discretized window sliding window convolution; the second is through Fourier transform. The following is Gaussian filtering implemented using discretized window sliding window convolution .
Commonly used Gaussian templates are as follows: Please add image descriptionThe above parameters are calculated through the Gaussian function, refer to some understandings about Gaussian filtering . The specific operation is: use a template (or convolution, mask) to scan each pixel in the image, and use the weighted average gray value
of the pixels in the neighborhood determined by the template to replace the value of the central pixel of the template. Please refer to Gauss filtering for details.

python implementation
import numpy as np
import cv2

x = np.array([[1, 3, 2, 4, 6, 5, 7]])
# (3,3)为滤波器的大小;1.3为滤波器的标准差,如果标准差这个参数设置为0,则程序会根据滤波器大小自动计算得到标准差。
pic = cv2.GaussianBlur(x, (1, 3), 1.3, 1.3)
print(pic)

median filter

Nonlinear filtering, targeting pixels in the image matrix that become the middle point of the window.

  • [Advantages] It is very effective in smoothing impulse noise. At the same time, it can protect the sharp edges of the image and select appropriate points to replace the values ​​of contaminated points, so the processing effect is good.
  • [Disadvantages] It is easy to cause image discontinuity; compared with mean filtering, it is slower to process large-size images because of the sorting operation.
  • [Scope of application] Eliminate random noise such as salt and pepper noise, additive Gaussian noise, etc.
  • [Unsuitable range] Not suitable for continuous noise.
calculation process

For a given window size, sort the values ​​within the window and use the middle value as the window center value . For example, input 3, which means using a 3 × 3 3 \times 33×3 window to recalculate the pixel values ​​of the given image matrix. [Note: The pixels participating in filtering are the center point of the window, and those pixels on the edge (cannot become3 × 3 3 \times 33×The center point of the window 3 ) will not be calculated.

The picture below shows 6 × 6 6\times 66×The 6- image matrix is ​​subjected to median filtering. The blue value is the sliding window and the red value is the filtered value. As shown in the figure, edge pixels will not participate in the calculation, and the pixel value that actually participates in the calculation is the center point of the window.
Please add image description

python implementation
import scipy.signal as ss
x = [1, 3, 2, 4, 6, 5, 7]
pic = ss.medfilt(x, 3)
print(pic)

Image edge detection

The image edge generally refers to the position where the grayscale change rate of the image is greatest. The main reasons are as follows:

  1. The gray level of the image changes discontinuously in the normal direction of the surface;

  2. The spatial depth of objects in the image is inconsistent;

  3. Inconsistent color on smooth surfaces;

  4. Light and shadow of objects in the image

Edge detection refers to the process of detecting edge points and edge segments from images and describing the edge direction.

Robert operator

The first-order differential operator is a gradient calculation method of oblique deviation. The size of the gradient represents the strength of the edge, and the direction of the gradient is perpendicular (orthogonal) to the direction of the edge.

  • [Advantages] The calculation is simple and the edge positioning is accurate.
  • [Disadvantages] Sensitive to noise
  • [Scope of application] Good at processing images with steep (obvious edges) and low noise
  • -[Not applicable] Images with a lot of noise
calculation process

Robert operator gradient window in x direction:
G x = [ 1 0 0 − 1 ] G_x=\begin{bmatrix}1 &0\\ 0&-1 \end{bmatrix}Gx=[1001]
Robert operator gradient window in y direction:
G x = [ 0 1 − 1 0 ] G_x=\begin{bmatrix}0 &1\\ -1&0 \end{bmatrix}Gx=[0110]
Gradient sum of pixels in the image matrix (can be the maximum absolute value or the square root of the two):
G = ∣ G x ∣ + ∣ G y ∣ G=|G_x|+|G_y|G=Gx+Gy

After obtaining the above two gradient windows and their addition methods, apply them to the image matrix. Note that the last row and the last first column of the image matrix are not involved in the calculation and are directly set to 0. Because the above calculation idea is equivalent to calculating (x 1 2, y 1 2) (x_\frac{1}{2},y_\frac{1}{2})(x21,y21) gradient.

python implementation
import numpy as np

def Roberts(img_arr,r_x, r_y):
    w, h = img_arr.shape
    res = np.zeros((w, h))  # 取一个和原图一样大小的图片,并在里面填充0
    for x in range(w-2):
        for y in range(h-2):
            sub = img_arr[x:x + 2, y:y + 2]
            roberts_x = np.array(r_x)
            roberts_y = np.array(r_y)
            var_x =sum(sum(sub * roberts_x))#矩阵相乘,查看公式,我们要得到是一个值,所以对它进行两次相加
            var_y = sum(sum(sub * roberts_y))

            var = abs(var_x) + abs(var_y)

            res[x][y] = var#把var值放在x行y列位置上
    return res
pic_arr = np.array([[0, 0, 0, 0], [0, 10, 10, 0], [0, 10, 10, 0], [0, 0, 0, 0]])
r_x = [[1, 0], [0, -1]]
r_y = [[1, 1], [-1, 0]]
pic = Roberts(pic_arr, r_x, r_y)
print(pic)

Image Processing

Corrosion operator

Calculated for each pixel of the image matrix.

  • Features: The corrosion of the image is compared to the highlighted part, and the corresponding binary image is to the white area. Generally speaking, corrosion is to shrink the white part on the original shape, expand the black part, and corrode The opposite of the expansion operation.
  • 【Scope of application】
    1. eliminate noise

    2. Split images and join images

    3. Find local maximum and local minimum (mathematical convolution operation on the image)

    4. Find the gradient of an image

calculation process

Input a picture matrix (0/1 matrix), and a template matrix (also called convolution kernel, 0/1 value). Use the pixel marked as "center" in the template matrix as a reference to translate pixel by pixel in the image matrix. Take the "AND" of [pixels with a value of 1 in the template matrix] and [pixels of corresponding sizes in the image matrix], and assign the minimum value of the result to the current pixel. Only the parts with the same shape of 1 in the image matrix and the shape with a value of 1 in the template matrix will be retained.
For example, in the picture below: B is the template matrix ('origin' in B means the center point), X is the picture matrix, and XB means the result after corrosion. This result can be regarded as the "AND" operation of the black part in B and the black part in X.
Please add image description

python implementation

Hog (Histogram of Gradient Orientation) feature

Calculation process:
  1. Image preprocessing: Convert the image to grayscale and normalize the pixel values. If there are strong reflections and other unstable brightness in the image, lighting normalization and other processing are required.

  2. Calculate gradient and orientation histograms: perform a convolution operation on the image to obtain the gradient magnitude and direction, and then divide the image into several small blocks (for example, 8 × 8 8\times88×8 small blocks), the gradient directions in each small block are counted, and the gradient direction histogram in the small block is obtained.

  3. Normalization: Normalize the gradient direction histogram within each small block to avoid the influence of factors such as lighting and shadows.

  4. Splicing: Splice the normalized histograms of all small blocks into a large vector, called the HOG feature vector.

Hog’s feature dimension calculation formula

Example:
Given resolution H × W = 100 × 100 H\times W = 100 \times 100H×W=100×For an image of 100 , it is known that the pixel size of the cell iscellsize = 8 × 8 cell_{size}=8\times8cellsize=8×8 , the number of histogrambins for each cell = 9 bins=9bins=9 , every4 × 4 4\times44×4 cells form ablock blockb l oc k , the scan step size is8 88 pixels.
Then the dimension of the Hog feature of the graph is calculated as follows:

  1. b l o c k s i z e = ( 4 × 8 ) × ( 4 × 8 ) = 32 × 32 block_{size}= (4 \times 8)\times(4 \times 8)=32\times32 blocksize=(4×8)×(4×8)=32×32
  2. 默认 b l o c k s t r i d e = c e l l s i z e = 8 block_{stride} =cell_{size}=8 blockstride=cellsize=8
  3. b l o c k H / W = ( H ( W ) − b l o c k s i z e ) b l o c k s t r i d e + 1 = 9 block_{H/W}=\frac{(H(W)-block_{size})}{block_{stride}}+1=9 blockH/W=blockstride(H(W)blocksize)+1=9
  4. b l o c k n u m = 9 × 9 = 81 block_{num} =9\times9=81 blocknum=9×9=81
  5. Feature dimension = bins × blocknum × Number of cells contained in each block = 9 × 81 × 16 = 11664 bins\times block_{num}\times Number of cells contained in each block = 9\times 81\times 16=11664bins×blocknum×Each b l oc k contains the number of ce ll=9×81×16=11664

According to the above calculation method, a NN can be obtainedN -dimensional HOG eigenvector.

python implementation
import cv2
import numpy as np

gray_pic = np.ones(shape=(32, 64), dtype=np.uint8)

# 为HOG描述符指定参数

# 像素大小(以像素为单位)(宽度,高度)。 它必须小于检测窗口的大小,
# 并且必须进行选择,以使生成的块大小小于检测窗口的大小。
cell_size = (4, 4)

# 每个方向(x,y)上每个块的单元数。 必须选择为使结果
# 块大小小于检测窗口
num_cells_per_block = (2, 2)

# 块大小(以像素为单位)(宽度,高度)。必须是“单元格大小”的整数倍。
# 块大小必须小于检测窗口。
block_size = (num_cells_per_block[0] * cell_size[0],
              num_cells_per_block[1] * cell_size[1])

# 计算在x和y方向上适合我们图像的像素数
x_cells = gray_pic.shape[1] // cell_size[0]
y_cells = gray_pic.shape[0] // cell_size[1]

# 块之间的水平距离,以像元大小为单位。 必须为整数,并且必须
# 将其设置为(x_cells-num_cells_per_block [0])/ h_stride =整数。
h_stride = 1

# 块之间的垂直距离,以像元大小为单位。 必须为整数,并且必须
# 将其设置为 (y_cells - num_cells_per_block[1]) / v_stride = integer.
v_stride = 1

# 块跨距(以像素为单位)(水平,垂直)。 必须是像素大小的整数倍。
block_stride = (cell_size[0] * h_stride, cell_size[1] * v_stride)

# 梯度定向箱的数量
num_bins = 9


# 指定检测窗口(感兴趣区域)的大小,以像素(宽度,高度)为单位。
# 它必须是“单元格大小”的整数倍,并且必须覆盖整个图像。
# 由于检测窗口必须是像元大小的整数倍,具体取决于您像元的大小,
# 因此生成的检测窗可能会比图像小一些。
# 完全可行
win_size = (x_cells * cell_size[0], y_cells * cell_size[1])

# 输出灰度图像的形状以供参考
print('\nThe gray scale image has shape: ', gray_pic.shape)
print()

# 输出HOG描述符的参数
print('HOG Descriptor Parameters:\n')
print('Window Size:', win_size)
print('Cell Size:', cell_size)
print('Block Size:', block_size)
print('Block Stride:', block_stride)
print('Number of Bins:', num_bins)
print()

# 使用上面定义的变量设置HOG描述符的参数
hog = cv2.HOGDescriptor(win_size, block_size, block_stride, cell_size, num_bins)

# 计算灰度图像的HOG描述符
hog_descriptor = hog.compute(gray_pic)
print(hog_descriptor.shape)

The output size and receptive field calculation formula of ordinary convolution/pooling

Output size of ordinary convolution

Let the input size be W × HW \times HW×H , the convolution kernel size isk × kk \times kk×k step size isSSS padding isPPP
output size——H=H − k + 2 PS + 1 \frac{H-k+2P}{S}+1SHk+2P+1Output
size——W=W − k + 2 PS + 1 \frac{W-k+2P}{S}+1SWk+2P+1

The output size of the pooling operation

Let the input size be W × HW \times HW×H , the pooling kernel size isk × kk \times kk×k step size isSSS
output size——H=H − k S + 1 \frac{Hk}{S}+1SHk+1Output
size——W=W − k S + 1 \frac{Wk}{S}+1SWk+1

Receptive field of convolution

This is a process from deep to shallow. It is known that after two layers of convolution (k=3, s=2, p=1), a 128 × 128 128\times128128×128 feature map, please ask the receptive field of this feature map, then calculate the receptive field in reverse:
RFN − 1 = f ( RFN , kernel , stride ) = ( RFN − 1 ) × stride + kernel RF_{N-1}=f (RF_N,kernel,stride)=(RF_N-1)\times stride + kernelRFN1=f(RFN,kernel,stride)=(RFN1)×stride+k er n e lIn
the above example: (((1-1)*2+3)-1)*2+3=7

The amount of calculation and parameters of different convolutions

Let the shape of the input feature map be H 1 × W 1 × C in H_1\times W_1\times C_{in}H1×W1×Cin, the convolution kernel size is k × k × C cink\times k \times C_{cin}k×k×Ccin, the shape of the output feature map is H 2 × W 2 × C out H_2\times W_2 \times C_{out}H2×W2×Cout

The amount of calculation and parameters of conventional convolution

计算量= k × k × C i n × H 2 × W 2 × C o u t k\times k\times C_{in}\times H_{2}\times W_{2} \times C_{out} k×k×Cin×H2×W2×Cout
参数量= k × k × C i n × C o u t k\times k\times C_{in}\times C_{out} k×k×Cin×Cout

Grouped convolution

计算量= k × k × C i n × H 2 × W 2 × C o u t × 1 g k\times k\times C_{in}\times H_{2}\times W_{2} \times C_{out}\times \frac{1}{g} k×k×Cin×H2×W2×Cout×g1
参数量= k × k × C i n × C o u t × 1 g k\times k\times C_{in}\times C_{out} \times \frac{1}{g} k×k×Cin×Cout×g1

The amount of calculation and parameters of depth-decomposable convolution

计算量= k × k × H 2 × W 2 × C i n + H 2 × W 2 × C o u t × C i n k\times k\times H_2\times W_2\times C_{in}+H_{2}\times W_{2} \times C_{out}\times C_{in} k×k×H2×W2×Cin+H2×W2×Cout×Cin
参数量= k × k × C i n + C i n × C o u t k\times k\times C_{in} +C_{in} \times C_{out} k×k×Cin+Cin×Cout

Calculation amount of depth-resolvable convolution Calculation amount of conventional convolution = 1 C out + 1 k 2 \frac{Computation amount of depth-resolvable convolution}{Computation amount of conventional convolution}=\frac{1}{C_ {out}}+\frac{1}{k^2}Computational amount of conventional convolutionComputational amount=Cout1+k21

Batchnorm layer operations

输入: x = [ x ( 1 ) , x ( 2 ) , . . . , x ( m ) ] x= [x^{(1)},x^{(2)},...,x^{(m)}] x=[x(1),x(2),...,x(m)],其中 x ( i ) ∈ R n x^{(i)} \in R^{n} x(i)Rn represents the iithinputi samples.

Calculate the mean: μ = 1 m ∑ i = 1 mx ( i ) \mu = \frac{1}{m}\sum_{i=1}^{m} x^{(i)}m=m1i=1mx(i)

Solution: σ 2 = 1 m ∑ i = 1 m ( x ( i ) − μ ) 2 \sigma^{2} = \frac{1}{m}\sum_{i=1}^{m}(x ^{(i)} - \mu)^{2}p2=m1i=1m(x(i)m )2

Form: x^(i) = x(i) − μ σ 2 + ϵ \hat{x}^{(i)} = \frac{x^{(i)} - \mu}{\sqrt{ \sigma^{2}+\epsilon}}x^(i)=p2 +ϵ x( i )m

where ϵ \epsilonϵ is a very small number, avoid denominator0 00

Scaling and offset: y ( i ) = γ x ^ ( i ) + β y^{(i)} = \gamma\hat{x}^{(i)} + \betay(i)=cx^(i)+βwhereγ
\gammacb \betaβ is a learnable parameter used to scale and offset the normalized results.

During training, the mean and variance of each mini-batch are calculated and used to normalize the input data. During testing, the mean and variance of all training samples are used for normalization.

Learnable parameters in the Batchnorm layer

  1. γ \gamma γ : Scaling parameter, used to scale the normalized features to increase the expressive ability of the network.

  2. β \betaβ : Offset parameter, used to offset normalized features to increase the expressive ability of the network.

These two parameters are trained through the backpropagation algorithm. During the inference phase, the values ​​of these parameters remain unchanged because they are used to scale and offset features, which are fixed.

Guess you like

Origin blog.csdn.net/qq_42312574/article/details/132061458