Image knowledge summary

It’s no good to post the big guy’s notes to learn for myself~~

image composition

Image Channels and Depth

Depth: The bit used to store a single pixel in the computer is called the depth of the image

For example: Channel: To describe a pixel, if it is a grayscale image, it only needs to be represented by a numerical value, which is a single channel. If a pixel is described by RGB three colors, it is three-channel, if it is described by RGB+alpha, it is four-channel exposure: the amount of exposure to light 

  1. The memory size of the picture: length * width * the size of the memory occupied by one pixel

  2. The size of memory occupied by a pixel is called color depth (depth, which refers to how many bits are used to represent a pixel). For example, the color number of the old Nokia S40 machine is 4096 colors, which is 2^12, and the memory space occupied by one pixel is 1.5 bytes (one byte is 8 bits, 1.5 bytes, that is, 12 bits). The color digits of the new S40 machine is 65536 colors, which is 2^16, and the memory size occupied by one pixel is 2 bytes.

  3. In opencv, the three channels of RGB are represented by 8 bits, so one pixel is 24 bits (8*3=24), so one pixel occupies 3 bytes.

Convert color image to grayscale image

The human eye has the highest sensitivity to green and the lowest to blue, so the grayscale image can be obtained by weighted average method, for example: 

Color quantization of images (color subtraction processing)

Compress the value of the color image from 256^3 to 4^3, that is, the value of RGB is only 32, 96, 160, 224 

Image convolution and pooling:

Convolution formula: Convolution formula

Convolution: In a box, do a "weighted average" of each pixel value

Pooling: The most common ones are maximum, minimum, and average pooling (that is, within a box, take the maximum, minimum, and average values)

Extension:

For the sequence {a_n}, {b_n}, the convolution of the two is (similar to the dot product):

For example:

2 dice, the first dice has 8 sides: 2 sides are engraved with 1, 1 side is engraved with 2, 3 sides are engraved with 3, 2 sides are engraved with 4. The second dice has 6 sides, 3 sides are engraved with 1, 2 sides engraved with 2, 1 side engraved with 3. Throw the two dice together once, find the probability that the sum of the points is 4.

Sort out: This question uses convolution, which can be written quickly.

Let the probability of throwing n on the first dice be a_n, and the probability of throwing m on the second dice be b_m, then the probability that the sum of the points thrown is 4 is:

Since the probability of throwing 0 on the second dice is 0, b0 is replaced with 0. So the result is: 

(In fact, the idea of ​​enumeration is just to use convolution to build a mathematical model)

color space:

RGB color space:

It is often used in images, RGB3 channels, and an alpha channel may be added (indicating the degree of transparency), so a pixel is usually represented by 24bit or 32bit.

Color depth: the number of bits in binary used to represent a color. If we want to represent 8 colors, only 3 binary bits are needed, so the color depth is 3. The color depth of QBitmap is 1, so it can only represent 2 colors, namely black and white. (Usually indicates how many bits are used to represent one pixel)

YUV color space:

Commonly used in video processing, YCbCr is a specific implementation of YUV.

Y: stands for brightness, Luminance or Luna

U, V: represent two different color parts, usually blue and red, so it is also called CbCr (Chrominance or Chroma)

YUV: It is to re-encode RGB to separate lightness and color, because the human eye is more bright and dark to the change of light and dark. It is also compatible with black and white TV. The Y channel is directly input to black and white TV (grayscale image), and the UV channel signal is in charge of color.

HSV color space

H: Hue, hue, hue, color, the value range is [0,179]

S: Saturation, saturation, color purity (dark blue, light blue), the value range is [0,255]

V: Value, lightness, brightness (the degree of light and darkness, etc.)

H is the hue (that is, which color), H is fixed, and S decreases, which is equivalent to adding white to the color; S increases, indicating that the color is brighter. V decreases, which is equivalent to adding black to the color. When V is 0, the entire color appears black.

Extension:

WebRTC: Web Real Time Communication, Web instant communication

FFmpeg: Fast Forward Motion Picture Experts Group, Fast Forward Motion Picture Experts Group, is the most commonly used open source software for video processing

AV: Audio& Video, audio common encoding formats are mp3, aac, video common encoding formats are h262, h264, h265

Morphological operations (7 types)
  1. Corrosion: cv2.erode

  2. dilate: cv2.dilate

  3. Open operation: corrode first, then expand (deburring)

  4. Closed operation: expand first, then corrode

  5. Gradient operation: dilation-corrosion (get contour)

  6. Top hat (tophat): original image - open operation (get glitch)

  7. Black hat (blackhat): closed operation - original image (small outline)

Image processing (9 types)
  1. Image conversion: converted to grayscale image, hsv image

  2. Binarization: cv2.threshold

  3. Smoothing/filtering with gradient operators: 

4. Histogram (2): calculate histogram, histogram equalization

5. Geometric operations: scaling, shearing, shifting, rotating, mirroring
cv2.getAffineTransform, cv2.wrapAffine

6. Special effects (6)
Image base (255 subtraction)
Mosaic: replace all the values ​​in the box with a value
Frosted glass: replace all the values ​​in the box with a random value in the box
Image fusion: cv2.addWeighted
relief : Subtract two adjacent pixels (prominent edge), plus a constant value, such as 150
oil painting:

7. Image beautification:
histogram equalization
, repair (cv2.inpaint, mask is used),
brightness enhancement,
skin whitening (bilateral filtering)

8. Draw lines, rectangles, add text

Image applications of the Fourier transform:
  1. High frequency: places where pixel values ​​change drastically, such as borders
    Low frequency: places where pixel values ​​change slowly, such as a sea

  2. Low-pass filter: only retains low frequencies, which will blur the image
    High-pass filter: only retains high frequencies, which will enhance image details

  3. Through Fourier transform, a certain frequency image can be partially removed

Histogram equalization:

Histogram features:

A histogram represents a schematic representation of the gray value distribution of the selected image. The X-axis represents the gray value, and the y-axis corresponds to the number of pixels of the corresponding gray value.

In RGB images, the histogram is obtained by calculating the gray value of each channel. 

Generally, a normal histogram is high in the middle and low on both sides. If there is a height on the far left of the image, it means that there is a shadow on the picture; if there is a height on the far right, it means that the picture has a highlight.

Equalization principle:

It belongs to a technology of image enhancement (it can make the image brighter and enhance the contrast of the image), and is suitable for processing images that are too bright or too dark. Spread out the grayscale pixels that are too concentrated.

Realized by cumulative probability, pixels with large gray value correspond to large cumulative probability

New gray value = cumulative probability (each pixel will be different) * maximum gray value (this is constant)

Image manipulation code example:

flip horizontally

Taking 8*8 image, one-dimensional linear representation as an example, it is found through observation:

First line: 0+7+1= 8 = 8 * 1 =8 * (0+1)

First row: 8+15+1= 24= 8 * 3 =8 * (1+2)

First line: 16+23+1= 40 = 8 * 5 = 8 * (2+3)

。。。

Analysis can be obtained: the result of adding the first and last two elements of each row is: width* (RowIdx+RowIdx+1)-1

Therefore, for a one-dimensional representation of an image, the procedure for flipping it horizontally is: Flip clockwise

Transpose first, then flip horizontally

If the image is represented in two dimensions, the transpose operation is very simple:

Array[i] [j]=Array[j] [i]

If the image is represented in one dimension (as shown in the figure above), the row number and column number must be calculated first to perform the transpose operation.

Filtering:

Images can also be understood as the superposition of various color waves.

Several common filters:

  1. Box filter, median filter, mean filter, Gaussian filter

  2. Using a Gaussian filter with a standard deviation of sigma to perform 2 filters is equivalent to performing a Gaussian filter with a standard deviation of sqrt(2)*sigma once

  3. Two-dimensional Gaussian filtering can be converted into two one-dimensional Gaussian filtering

 4. Thinking about a formula (Laplace Gaussian) for sharpening with convolution : images can use convolution to obtain edges, and point clouds can directly obtain the edges of objects.

5. Use convolution to derive partial derivative: Gradient operator of 
partial derivative image: Sobel operator, Scharr operator, Laplacian operator 6. Image gradient

7. The derivation operation is very sensitive to noisy images: so it is often denoised first, and then derivation  8. Image pyramid (2 types): 

Gaussian filtering and its first derivative, second derivative

Gaussian filter: low-pass filter, the image is smoothed and blurred

Gaussian first-order derivative: high-pass filtering, Canny operator to find the edge

Gaussian second derivative: high-pass filter, LOG operator (for scaling)

The first-order derivative can extract the change of the gray gradient of the image, and the second-order derivative can extract the details of the image (how to understand??), while responding to the change of the image gradient. 

Extension: Application of Gaussian function in image

One-dimensional Gaussian function:

Comparison of Gaussian curves with different mean, variance, and amplitude: 

The distribution of the two-dimensional Gaussian function (the distribution of the two dimensions of x and y is a one-dimensional Gaussian distribution, so it is an ellipse from the top view): 

The image is:  the curve of the Gaussian distribution is shown in the figure above, the closer to the center, the larger the value, and the farther away from the center, the smaller the value; the practical meaning is: the closer to the center, the greater the influence, and the farther away from the center, the smaller the influence.

Such characteristics can be used for weight distribution: the closer to the center, the greater the weight, and the farther away from the center, the smaller the weight.

Blurring of the image: it is a process of calculating the "weighted average" . In terms of value, this is a kind of "smoothing (smoothing)". On the image, a blurring effect will be produced. At this time, the weight of each point is the same Yes, but it is obviously unreasonable, because the images are continuous, the closer the points are, the closer the relationship is, and the farther the points are, the more distant they are. Therefore, the weighted average is more reasonable, the closer the point is, the greater the weight is, and the farther away the point is, the smaller the weight is. At this point, a normal distribution (ie Gaussian distribution) can be used to assign weights.

The Gaussian function is determined by 3 parameters: amplitude (how high), center coordinates, standard deviation (how wide)

(In the image neighborhood, there will also be a ksize, which represents the size of the Gaussian kernel. For example, if ksize=3, the Gaussian kernel is a 3*3 matrix)

Bilateral Filter Bilateral Filter: do edge preservation

Using Gaussian filtering to denoise will blur the edges, and the protection of high-frequency details is not obvious.

Bilateral filtering: A trade-off that combines the spatial proximity of images and the similarity of pixel values. There is one more Gaussian variance than Gaussian filtering. Near the edge, the pixels farther away will not affect the pixel value on the edge too much

The role of convolution:

Extension:

Gradient and directional derivatives:

  1. The directional derivative is a number that represents the amount of change along a certain direction

  2. Gradient is a vector

  3. When the directional derivative reaches the maximum, the direction at this time is the direction of the gradient, and the directional derivative at this time is the modulus of the gradient

Gradient understanding:

  1. Can be understood as slope. That is, the degree of inclination of the surface along a certain direction (actually the direction derivative)

  2. Indicates that the directional derivative of a certain function at this point achieves the maximum value along the direction (gradient direction), that is, the function changes the fastest (maximum rate of change) along the gradient direction

  3. In the image field, the gradient represents the speed at which the gray value of a pixel changes

OpenCV related:

OpenCV library structure:

OpenCV folder structure (after decompression, there will be 2 folders: source and build):

Under the Source folder:

module/core: the core data structure and basic operations

module/highgui: UI interface for image reading, display, storage, etc.

module/imgproc: methods of image processing, such as geometric transformation, smoothing. . .

feature2d: used to extract features

nonfree: patented algorithms, such as SIFT

objdetect: Target detection, such as Haar for face recognition, LBP features; HOG-based pedestrian, vehicle and other target detection

stitching: image stitching

ml: machine learning library

video: vision processing, such as background modeling, moving object tracking, foreground detection

Under the build folder:

doc/opencvrefman.pdf: function manual

doc/opencv_tutorials.pdf: Function Manual

include folder: OpenCV header files

x86 and x64 folders: dll and lib libraries for 32-bit and 64-bit

python:python API

java: the JAR package of the java API

OpenCV geometric transformation

Affine transformation:

Affine function: A polynomial function with a maximum degree of 1. An affine function whose constant term is 0 is called a linear function.

The mapping x-> Ax+b from R^n to R^m is called affine transformation, where A is an m* n matrix and b is an m-dimensional vector.

Linear transformation (rotation, scaling) + translation 

All of the above are centered on the origin (0,0).

If you rotate alpha counterclockwise with any point (x0, y0) as the center, then move (x0, y0) to the origin, rotate alpha and then move back. The affine matrix is: after obtaining the affine matrix, use the wrapAffine function to apply the transformation to the image

Projective transformation: the difference between several transformations:

  1. Rigid body transformation: also known as isometric transformation, equal to translation + rotation, 3 degrees of freedom (1 rotation, 2 translation)
    rigid body motion, ensuring that the length and angle of the same vector in each coordinate system will not change . This transformation is called the Euclidean transformation. A Euclidean transformation consists of two parts, a rotation and a translation.

  2. Similar transformation: rigid body transformation + scaling

There are 4 degrees of freedom, namely rotation, translation in the x direction, translation in the y direction, the scaling factor s is
similar to the length ratio before and after the transformation, and the included angle remains unchanged (similar to similar triangles)

3. Affine transformation:
realized through a series of atomic transformations (5): translation, scaling, rotation, flip, and shear.
Among them, the miscut is divided into horizontal miscut (the side on the horizontal axis remains unchanged) and vertical miscut (the side on the vertical axis remains unchanged). The third line of the transformation matrix must be: 0, 0, 1

  1. Projection Transformation (Projection Transformation)
    , also known as Perspective Transformation (Perspective Transformation), also known as projective transformation, contains 8 degrees of freedom (why are 8 instead of 9)

 

When the last line of the projection matrix is ​​(0, 0, 1), it is an affine transformation.
In affine transformation, when the 2 * 2 matrix in the upper left corner is orthogonal, it is Euclidean transformation (that is, rigid body transformation). When the determinant of the 2 * 2 matrix in the upper left corner is 1, it is oriented Euclidean transformation.
So the projection matrix contains the affine matrix, and the affine transformation contains the Euclidean transformation (rigid body transformation)
Perspective Transformation = Homography transformation (homograph) + Direct transformation (collination)

Extension:

Homography matrix: the relationship between points of the same plane mountain under different viewing angles

analytical transformation matrix 

OpenCV's Mat class:

Matrix, located in core.hpp

  1. Mat m=Mat(2,3,CV_32FC(1)); //Create 2*3 matrix, F refers to float type, 1 refers to single channel

  2. You can also use the size class, note: the first parameter of size is width (that is, the number of columns), and the second parameter is height (number of rows), that is, if you create a 2*3 array: Mat m=Mat(Size(3
    , 2), CV_32FC(1));
    At this time, the output of m.size() is 3*2, that is, width * height, the width is 3, and the height is 2, that is, 2 rows and 3 columns

  3. If you want to change the elements of a two-dimensional matrix into coordinates of points, you can use

 

4. The member variable ptr of Mat points to the first address of the first row.
The elements of each row are continuous in storage, but there may be gaps between rows. You can use m.isContinuous() to judge whether there is a gap between rows. by interval.

5. The member variables step and data of Mat
point to the pointer of the first value, the type is uchar
step[0] represents the number of bytes occupied by each row (including the interval between rows)
step[1] represents each row The number of bytes occupied by a value
For example: if you want to access row r and column c of an int-type single-channel matrix, you can use (int *) because m.data is uchar, so type conversion is required

OpenCV camera calibration (Camera Calibration)

Camera calibration, in simple terms, is the process of converting the world coordinate system to the image coordinate system (world coordinate system--"camera coordinate system--"image coordinate system), that is, the process of finding the final projection matrix P.

  1. Camera extrinsic parameters R, t: Transform from the world coordinate system to the camera coordinate system. This step is the conversion from 3D point to 3D point

  2. Camera internal reference: from the camera coordinate system to the image coordinate system, this step is the transformation from a 3D point to a 2D point

(This part is actually very related to the orthogonal transformation and projection transformation in graphics, so I will write it in the knowledge of graphics)

Common C++ linear algebra libraries:

DCMTK: A library for processing dicom images

Eigen: Open source C++ linear algebra library, commonly used in opencv. The sample code is as follows:

#include <Eigen/SVD>
Eigen::Matrix3d w=Eigen::Matrix3d::Zero();
...
Eigen::JacobiSVD<Eigen::Matrix3d> svd(w,Eigen::ComputeFullU|Eigen::ComputeFullV);
Eigen::Matrix3d U=svd.matrixU();
Eigen::Matrix3d V=svd.matrixV();

In the above few lines of code, the w matrix is ​​decomposed by svd, and U and V are solved.

There is also a common C++ linear algebra library: Armadillo (armadillo)

BLAS, CUBLAS and LAPACK

BLAS: Basic Linear Algebra Subprograms, Basic Linear Algebra Subprograms

CUBALS: BLAS version under GPU computing technology

LAPACK: Linear Algebra Package, linear algebra package (BLAS is a part of LAPACK), a well-known public software funded by the US National Fund, including solving the most common numerical linear algebra problems in scientific and engineering computing: such as solving linear equations, linear minimum Square problem, eigenvalue problem, singular value problem, etc.

Otsu method

Maximum between-class variance method. Find a threshold that maximizes the variance gap between background and foreground (binary classification problem)

Assuming that there is a threshold threshold, the image pixels are divided into two types C1 (less than the threshold) and C2 (greater than the threshold), the mean values ​​of these two types of pixels are m1, m2, the global mean value of the image is mg, and the pixels are divided into C1 The probabilities of and C2 are p1 and p2 respectively, then there is: Otsu method is to find a threshold to make sigma^2 reach the maximum (traverse each gray value in 0-255, calculate p1, p2, m1, m2)

Extension: Can it be used in decision trees? ? ? Similar to LDA on October 5, 2020, see August 24, 2020

local features of the image

Corner points: Harris operator, SuSAN operator, FAST operator

Gradient feature points: SIFT, SURF, GLOH, ASIFT, PSIFT operators

Edge feature (line type): Canny operator, Marr operator

Texture features: gray level co-occurrence matrix, wavelet Gabor operator

LBP feature: Local Binary Pattern local binary pattern

The original LBP operator is defined as a 3 * 3 window, with the central pixel as the threshold, and the adjacent 8 pixels are compared with the threshold. If it is greater than it, it will be marked as 1, if it is less than, it will be marked as 0, and 8 pixels in the 3*3 area Adjacent points can be represented by 8 as a binary number, and finally can represent a decimal number from 0 to 255, that is, the LBP code, and finally use this value to represent the texture information of the area.

The statistical histogram of the LBP feature spectrum is often used as the feature vector for classification and recognition, such as face analysis and texture classification.

Harris corner detection

(Missing an example of actual calculation)

1 When the window is in a flat area, there is no change in grayscale when moving in any direction; when the window is in the edge, there is no change in grayscale; when the window is in a corner, there is no change in grayscale when moving in any direction obvious change.

2 ,

Among them, w(x, y) is the weight matrix (generally using Gaussian function), u, v represent the distance moving along the x direction and y direction, and I(x, y) represents the grayscale of the image

3. According to Taylor expansion, the quadratic function is essentially an ellipse (the standard equation of the ellipse: x^2 / a^2 + y^2 / b^2 = 1), and the major and minor axes of the ellipse are determined by the eigenvalue lamda1 of the M matrix , determined by lamda2, the M matrix can be changed to  be an edge when λ2 >> λ1 or λ1 >> λ2

When both λ1 and λ2 are large, and λ1 and λ2 are similar, it is a corner point

When both λ1 and λ2 are small, it is a flat region

When calculating corner points, there is no need to calculate λ1 and λ2, which can be approximated by the following formula:

\alpha is a constant, usually 0.04~0.06

When calculating the gradient

Finally calculate the R value of each pixel.

Pedestrian detection: HOG+SVM

HOG: Histogram of Oriented Gradient, Histogram of Oriented Gradient

The gradient and direction of pixels in a certain area are counted to generate descriptors.

For example, if there are 8*8 pixels in a cell, and the gradient information of 9 directions (9 bins) is counted, then the gradient of 8*8 pixels and its direction are calculated, and the gradient direction is counted every 360/9=40 degrees, and it is made into histogram. Finally, each cell corresponds to a 9-dimensional feature vector.

At the same time, multiple cells can also be combined into a block. For example, 2 * 2 cells form a block, and a block corresponds to a 2* 2 *9 =36-dimensional feature.

In practical applications, a fixed-size sliding window is usually used to extract HOG features. For example, the window size is set to 64 * 128, and every 8 * 8 pixels form a cell, and every 2 * 2 cells form a block. There are (8 -1)* (16-1)=105 blocks, then the feature dimension of each window is 105 * 36 =3780

Pedestrian re-identification (ReID: Person Re-Identification)

Use CV technology to judge whether there is a specific pedestrian in an image or video sequence.

The data set is divided into: training set, verification set, query, gallery. Train the model on the training set and the verification set, and then use the model to calculate the similarity of the picture extraction features in the Query and Gallery, and find the top N similar pictures in the Gallery for each Query.

2 general directions: feature extraction, metric learning

Existing challenges: low camera resolution, occlusions, perspective/pose changes, lighting

Face detection: Haar + Adaboost

Haar template: There are 14 templates in OpenCV. At the earliest, there were only 4 template calculation methods: sum (white)-sum (black). Selecting different types of templates (template size, template position) can get different feature values. 

Integral image: Integral Image, also called Summed Area Table. The calculation of the Harr feature needs to repeatedly calculate the pixel value of the target area, and the use of the integral map can greatly reduce the amount of calculation.

The pixel value at the point (x, y) is I(x, y), then Adaboost: through the cascade method (Cascade), multiple weak classifiers (CART decision tree) are turned into a strong classifier: to calculate each The error rate of the classifier uses the exponential loss function  OpenCV also supports LBP + Adaboost and HOG+Adaboost methods for face recognition

Canny edge detection
  1. Filter the image with the Gaussian first-order partial derivative kernel to find the magnitude and direction of the gradient map

  2. Non-Maximum Suppression: Turning "Wide" Edges into "Narrow" Edges

  3. Use a high threshold to find the edge first, and then use a low threshold to find the edge connected to the edge (remove false edges)

RANSAC:Random Sample Consensus

Random sampling consistency: in a bunch of sample points, randomly select 2 points to make a straight line, set a threshold T at the same time, count the number of points (inner points) whose distance from the sample point to the straight line is less than T, and iterate repeatedly Take sample points, make a straight line, and find the straight line with the most interior points.

In addition, RanSAC can also be used to find matching points.

Hough transform:
  1. A straight line y=ax+b in the Cartesian coordinate system can also be written as b=-ax+y. At this time, when (x, y) is fixed, b=-ax+y is also a straight line, but in the parameter space In (Hough space)
    , a straight line y=ax+b in the Cartesian coordinate system corresponds to a point (a, b) in Hough space. Similarly, a straight line b=-ax+y in Hough space, Corresponds to a point (x, y) in Cartesian coordinate system

  2. Two points in the Cartesian coordinate system determine a straight line, corresponding to the intersection of two straight lines in Hough space; three points in the Cartesian coordinate system are collinear, corresponding to the intersection of three straight lines in Hough space

  3. Since y=ax+b cannot represent the case where the slope is infinite (that is, a straight line perpendicular to the x-axis), consider changing the Cartesian coordinate system to a polar coordinate system.

At this time, a point (x0, y0) in the Cartesian coordinate system corresponds to a curve of p-theta in the Hough space; three points in the Cartesian coordinate system are collinear, that is, corresponding to three curves in the Hough space p-theta Intersection

4. Detect a circle
The idea of ​​Hough transform is: find a commonality. To detect a straight line, the slope and intercept are constant (or p and theta). When detecting a circle, the center and radius of the circle remain unchanged. It is also to find the point where multiple curves intersect in the parameter space (circle center---radius), and then the circle is found.

Log transformation (not the LOG operator) and Box-Cox transformation

Log transformation: used to stabilize the variance, can convert the skewed distribution to a normal distribution, and can make the image brighter (it should be that the log function is stronger for low value expansion)

Box-Cox transformation: It is used when the continuous variable does not satisfy the normal distribution.

LOG operator and DOG operator:

LOG operator: Laplace Of Gaussian, edge detection is performed by the zero value of the second order derivative of the image. Since the differential operation is sensitive to noise, LoG first performs Gaussian smoothing on the image, and then uses the Laplace operator for edge detection (the LoG operator can find the scale)

DoG operator: Difference of Gaussian, the difference of the Gaussian function, compares the Gaussian filtering results of the image under different sigma parameters to obtain a difference map.

Because the DOG operator is relatively simple in calculation, the DOG operator is often used instead of the LOG operator.

The difference between the two is k-1 times, but it does not affect the detection of extreme points 

Second derivative and concavity

The sign of the first derivative represents the increase or decrease of the function value f(x)

The sign of the second-order derivative represents the increase or decrease of the slope (the slope is the first-order derivative).

The definition of concave-convex: the concave-convexity can also be judged from the positive and negative of the second order derivative

The second-order derivative is positive, indicating that the slope is getting larger and larger, that is, a concave function (note the image above)

The second-order derivative is negative, indicating that the slope is getting smaller and smaller, that is, a convex function

The second derivative is 0, indicating that the rate of change of the slope is 0, that is, maintaining the same slope (rate of change)

Common interpolation methods

Spline interpolation: every two points determine a function, each function is a spline, the function is different, the spline is different, so the definition says "variable spline", and then combine all the spline segments into one function, is the final interpolation function.

Extension:

Enlargement and reduction of pictures - interpolation principle (nearest neighbor interpolation and bilinear interpolation)

For example, a 3*3 256 grayscale image, the pixel matrix (denoted as src image) is: 4*4 The value at (0, 0) in the image is: (0 * 3/4, 0* 3/4) = (0,0)

That is, the value at (0, 0) in the 4*4 image is the image at (0, 0) on the way to src

The value at (1, 0) in the 4*4 image is: (1 * 3/4, 0* 3/4) = (0.75, 0), which is the value at (0.75, 0) in the original src image, this The rounding method will be adopted, that is, at (1, 0) in the src diagram

According to this method, the 4* 4 matrix is ​​calculated to obtain:

This algorithm is the simplest image scaling algorithm, and the effect is also the worst: the mosaic is enlarged, and the distortion is reduced (because of the rounding method). This method is unscientific, because when the coordinate is 0.75, it should not be simply taken as 1, but calculated according to certain rules by using the 4 real points around the virtual point of the source map. At this point bilinear interpolation is introduced.

Bilinear interpolation: In the x and y directions, do linear interpolation respectively

Given the coordinates of the four points Q11, Q12, Q21, and Q22 and the values ​​of these four points, now give the coordinates of point P and find the value of point P. At this time, bilinear interpolation is used (first find the value of R1, R2, and then find the value of point P)

The formula for bilinear interpolation is: For example, at (1, 1) in the dst map, the coordinates of the dst map obtained by the nearest neighbor interpolation are (0.75, 0.75), this point is a virtual point, and it should be composed of 4 points around it (0,0), (0,1), (1,0), (1,1) decide.

Since (0.75, 0.75) is closer to (1, 1), the effect of (1, 1) is greater. It can be reflected by the coefficient uv=0.75*0.75 of the formula. And (0.75, 0.75) is farther away from (0, 0), then (0, 0) will play a smaller role. It can be reflected by the coefficient (1-u)(1-v)=0.25*0.25 of the formula.

NLM denoising algorithm

Non Local Means, non-local mean.

Principle: Assume that there are many similar textures on the same image, so in areas with noise, you can replace the noise area with similar texture areas in a certain way, so as to achieve a better denoising effect, and not Too much loss of detail.

image enhancement algorithm
  1. Histogram equalization, Laplace LOG, gamma transform

  2. Graphics enhancement is commonly used to adjust the brightness, contrast, saturation, hue, etc. of the image to increase its clarity and reduce noise. Image enhancement is often a combination of multiple algorithms. The general process is: image denoising, increasing clarity (contrast), grayscale or obtaining image edge features (convolving the image), binarization, etc. Different image enhancement methods have different application fields, and multiple methods need to be flexibly mastered in the event.

  3. Image Denoising: Equivalent to Low Pass Filter (noise is high frequency)
    Increased Clarity: High Pass Filter

  4. In the image field, differentiation is sharpening and integral is blurring

Image sharpening: enhance the grayscale contrast and make blurred images clear

Blurred image: The image is averaged or integrated.

Differential operation can highlight the details of the image and make the image clearer. laplus is a differential operator. Its application can enhance the area of ​​sudden grayscale changes in the image and weaken the slowly changing area of ​​grayscale.

Image Enhancement: Logarithmic log Transformation

Since the logarithmic function curve has a large slope in areas with low pixel values ​​and a small slope in areas with high pixel values, after logarithmic transformation, the contrast of the darker areas of the image will be improved, and the dark details of the image will be enhanced.

Image Enhancement: Gamma Transformation

Mainly used for image correction, enhancing image contrast, suitable for image correction with too high gray or too low gray. For images with low contrast and high overall brightness (camera overexposure), the enhancement effect is obvious.

The r value is divided by 1, the smaller the value, the stronger the expansion effect on the low gray level part (similar to log transformation at this time); the larger the value, the stronger the expansion effect on the high gray level part of the image (at this time Similar to exponential transformation). Through different r values, the effect of low grayscale (or high grayscale) part details can be enhanced.

A way to enhance image contrast:

The bigger the bigger, the smaller the smaller (you can write a paper), for example [10, 30] becomes [-10, 50], you can use the following method

  1. Find the mean: (10+30)/2=20

  2. The average value range [10-20, 30-20] is [-10, 10]
    multiplied by the coefficient, assuming it is 2, that is, [-10, 10]* 2=[20, 20]

  3. Applied to the original range: [10,30]+[-20, 20]=[-10,50]

Image Compression:

Image compression: svd, Fourier transform, base transform (use wavelet base and Fourier base, JEEG uses Fourier base) 

Image edge detection:

LoG operator: Laplacian of Gaussian operator

First perform Gaussian filtering on the graphics, then find the second order derivative of Laplacian, and finally detect the zero crossing (Zero crossing) of the filtering results to obtain the edge of the image or object.

The most common methods of edge detection: Sobel operator, Laplacian operator, Canny operator, etc.

  1. Roberts operator (the direction of 0 is the direction of the edge) 

  2. 2. Prewitt operator 

3. Sobel operator: weight is added to the Prewitt operator, and the weight of the pixel with the closest distance is higher 

4. Laplacian operator: two-digit differential operator, also a second-order differential operator 

Essence: do convolution, find difference

Image registration algorithm (3 categories)

Based on grayscale and template matching algorithm:

MAD: Mean Absolute Differences mean absolute difference algorithm

SAD: Sum of Absolute Differences absolute error and

SSD: Sum of Squred Differences , the sum of squared errors, also called the sum of squared differences

MSD: Mean Squred Differences, mean square error algorithm

NCC: Normalized CrossCorrelation, normalized cross-correlation algorithm (or normalized cross-correlation algorithm), which uses the calculation of the similarity coefficient to calculate the similarity of two images.

SSDA: Sequential Similarity Detection Algorithm, Sequence Similarity Detection Algorithm: Set one with it, accumulate the sum of absolute errors, and if the value exceeds the threshold, the next match will be performed.

SATD: Sum of Absolute Transformed Difference, this algorithm is also often used in video coding

Feature-based matching algorithm:

Canny algorithm, etc.

Method based on domain transformation:

Fourier --- Merlin transform

wavelet transform.

Image restoration:
  1. The area is relatively small, called Inpainting

  2. Large area is called Image Completion

Image Super Resolution: Super Resolution, Super Resolution

Interpolation-Based Reconstruction: Traditional Approaches

Probability-Based Reconstruction: Backprojection, Maximum Posteriori

Reconstruction based on machine learning and deep learning

Common algorithms for image stitching (making panoramic pictures):

Algorithms to find key points: SIFT, SURF, ORB, SuperPoint

SIFT:Scale-Invariant Feature Transform

Scale-invariant feature transformation, which is a way to find keypoints. Proceed as follows:

  1. Construct multi-dimensional space (DOG, difference of Gaussian), detect extreme points

  2. Use interpolation and other methods to find the key points through multiple extreme points obtained in the previous step

  3. Generate feature description points: use the histogram to count the gradient direction of the pixels in the neighborhood of the key point, find the main direction of the key point, and construct the descriptor

Extract the size-invariant area--"Normalized size--"Rotation normalization--"Feature descriptor (finally: position + 128-dimensional vector) whaosoft  aiot  http://143ai.com 

SURF:Speed Up Robust Features

Accelerate robust features (also find key points). The general algorithm is the same as SIFT, but it is more efficient than SIFT. The determinant of the Hessian matrix is ​​used for feature point detection, and the integral map is used to accelerate the operation.

It is a variant of SIFT, the effect is not as good as SIFT, but it is faster.

ORB: Oriented FAST and Rotated BRIEF

Proposed based on FAST and BRIEF feature descriptors, the running time is the fastest, and it is more commonly used and in actual production

Target motion detection algorithm:
  1. background subtraction

  2. optical flow

  3. frame difference method

Guess you like

Origin blog.csdn.net/qq_29788741/article/details/132255190