Several classic algorithms of computer vision - least square method + RANSAC + hash algorithm (with DCT) + image clustering algorithm

Several classic algorithms of computer vision

1. Least squares method (finding linear regression function)

Before understanding the least squares method, we need to talk about linear regression. The most common example of so-called linear regression is y=2x. In the unary linear regression equation, the slope 2 is the regression coefficient. The corresponding relationship, and linear regression means that some discrete points are generally the closest to a certain straight line

What does this have to do with least squares? In fact, the least square method (Least Square Method) is a method to find a linear regression function, which is obtained by minimizing the sum of squares of the error, so why do we use the sum of squares of the residuals? Because it has a higher degree of fitting, that is, the similarity between the fitting function and the y of the function to be solved is higher

The unknown data can be easily obtained by using the least square method, and the sum of squares of the errors between the obtained data and the actual data can be minimized

insert image description here

2. RANSAC (model known, parameters unknown)

RANSAC, that is, random sampling consistency, it is not a method or algorithm, it is an idea, a framework for solving the parameters of known models, it does not limit a specific problem, it can be a problem of computer vision, or it can be It is statistical mathematics, and it can even be an estimation problem of model parameters in the field of economics

It is an iterative method for estimating the parameters of a mathematical model in a set of discrete observed data , and it is also a non-deterministic algorithm that produces a reasonable result under a certain probability, which allows Use more iterations to increase its probability

The basic assumption of RANSAC is that "inner group" data can describe its data distribution through several sets of model parameters, while "outlier" data is data that is not suitable for modeling. Data can be affected by noise, which refers to outliers , such as from extreme noise or misinterpretation of measurements or incorrect assumptions about the data

RANSAC assumes that, given a (usually small) set of ingroups, there exists a procedure that can estimate the parameters that best explain or best fit the data model

2.1 The difference between RANSAC and the least squares method

From the above content, we can know that these two things are actually used for fitting, but what is the relationship between them, what are the differences, and what are the different application scenarios?

In actual development or production practice, the data will have certain deviations. When we know that the relationship between two variables is a linear regression function, Y=aX+b, if we want to know the specific values ​​of a and b, the theory It is said that we only need two points to meet the requirements, but due to the existence of errors, the a and b obtained by the two points we randomly select may be different. What we want most is the final theoretical model and test The error of the value is the smallest

  • Least squares method: By calculating the value of the minimum mean square error with respect to the partial derivatives of a and b when it is 0, the least squares method is synonymous with linear regression in many cases, but it is only suitable for small errors
  • RANSAC: When the model is determined and the maximum number of iterations is allowed, RANSAC can always find the optimal solution (for a data set containing 80% error, RANSAC is far better than the least squares method)

To sum up, we can one-sidedly think that the least squares method is suitable for the case where the error is small, and the use of RANSAC and the case where the error is slightly larger and the maximum number of iterations is allowed. In the actual development of image processing, due to the large number of pixels in a picture , using the least squares method has a huge amount of calculation and the calculation speed is very slow

2.2 Steps of RANSAC algorithm

The input data of the RANSAC algorithm: it is a set of observation data, and due to the uniqueness of RANSAC, there are usually large noise and invalid points in these observation data. In addition to the observation data, RANSAC also needs a known model (understood as a function), and some trusted parameters are also important

In the implementation process of the RANSAC algorithm, we can roughly divide it into six steps:

  1. Randomly select several points in the input data and set them as the inner group, that is, valid points other than noise points and invalid points
  2. Calculate the model of a suitable inner group and pass it into the RANSAC algorithm as input data
  3. Introduce other points that have not been selected just now into the established model, and calculate whether it is an inner group
  4. Record the number of ingroups and repeat the above steps
  5. In each process of calculating the number of ingroups, select the one with the largest number of ingroups, and the model with the largest number of ingroups is the result we need

In the process of executing the RANSAC algorithm, for different mathematical models, the method of calculating the parameter model must be different, so RANSAC is not to find the parameters of the calculation model, but to provide such an idea

The biggest flaw of RANSAC is that the mathematical model is required to be known

In the process of algorithm execution, we should pay attention to two issues: 1. How many points should be selected as the inner group at the beginning; 2. What is the maximum number of iterations?

2.3 Parameter determination of RANSAC

Assuming that the point we selected in the first step is the probability of the ingroup is w, and the number of selected points is n, w^n is the probability that the selected n points are all ingroups, and 1-w^n is the selected There is a probability that at least one of the n points in is not an in-group, (1 − w n) k is the probability that all n points are not an in-group after repeating k times, assuming that the probability of success after running the algorithm for k times is p , we can get the calculation formula of p: p = 1 − (1 − w^n)k

Then use the above formula to get the calculation formula of k: K=log(1-P)/log(1-w^n)

Through these two functions, we can know that if the model we want to build has a higher probability of being the optimal solution, when w is determined, the larger k is, the larger p will be. When w is constant, the larger n is, the larger k is required. However, generally speaking, w is positional, so we will choose n to be smaller.

2.4 Application of RANSAC

The most typical application scenario of RANSAC is panoramic stitching , and its implementation process is roughly divided into 4 steps

  1. Take multiple images for a scene, usually in sequence
  2. Calculate the transformation structure between the next picture and the previous picture by feature matching (SIFT)
  3. Image mapping, superimposing the next image into the coordinate system of the previous image
  4. The transformed images are fused together to obtain the final panorama

insert image description here

2.5 Advantages and disadvantages of RANSAC algorithm

Advantages: Its advantage is its function, that is, it can find the optimal calculation model

Disadvantages: Due to its many limitations, the shortcomings are also very obvious

  • There is no upper limit to the number of iterations it calculates parameters; if the upper limit of the number of iterations is set, the result obtained may not be the optimal result, and may even get wrong results
  • RANSAC only has a certain probability to get a credible model, and the probability is proportional to the number of iterations
  • It requires setting thresholds related to the problem
  • RANSAC can only estimate one model from a specific data set, if there are two (or more) models, RANSAC cannot find another model
  • The mathematical model is required to be known

3. Hash algorithm

The Hash algorithm has nothing to do with RANSAC and the least squares method. It is another algorithm. In the field of computer vision, it is often used for image similarity comparison. There are three hash algorithms for similar image search: mean hash algorithm (aHash), Difference hashing algorithm (dHash) and perceptual hashing algorithm (pHash)

When it comes to the hash algorithm, we must talk about the hash function. It is a method of creating a small digital "fingerprint" from any kind of data. It compresses the message or data into a summary, making the amount of data smaller, and will Once the format of the data is fixed , this function scrambles the data to recreate a fingerprint called a hash value, which is usually represented by a short string of random letters and numbers

A binary value of arbitrary length obtained by a hash algorithm is mapped to a shorter fixed-length binary value, that is, a hash value. In addition, a hash value is a unique and extremely compact numerical representation of a piece of data. If you obtain a hash value by hashing a piece of plaintext, even if you only change any letter in the piece of plaintext, the resulting hash value will be different . A hash algorithm is a function that converts almost any digital file into a seemingly garbled string of numbers and letters

As an encryption function, the hash function has two most important characteristics :

  • Irreversibility , it is very easy to get the seemingly garbled string (hash value) output from the input information, but it is very, very difficult to deduce the input result from the output string
  • The uniqueness and unpredictability of the output value , as long as the input information is slightly different, the output value obtained according to the hash algorithm is also very different

Hamming distance

The Hamming distance is a measure used to compare the proximity of two hash values, which represents the number of positions where two numbers correspond to binary differences

3.1 Mean Hash Algorithm and Difference Hash Algorithm

The process of mean value hashing algorithm is divided into 6 steps:

  1. Scaling: The picture is scaled to 8*8, the structure is preserved, and the details are removed
  2. Grayscale: convert to grayscale image
  3. Average: Calculate the average of all pixels in the grayscale image
  4. Comparison: if the pixel value is greater than the average value, it is recorded as 1, otherwise it is recorded as 0, a total of 64 bits
  5. Generate hash: Combining the 1 and 0 generated in the above steps in order is the fingerprint (hash) of the picture
  6. Compare fingerprints: Comparing the fingerprints of the two pictures, calculate the Hamming distance, that is, how many bits of the two 64-bit hash values ​​are different. The fewer the different bits, the more similar the pictures

The difference hash algorithm is basically the same as the average hash algorithm in the early and late stages of processing, but there are differences when comparing hashes. The difference hash algorithm is divided into 6 steps:

  1. Scaling: The picture is scaled to 8*9 , the structure is preserved, and the details are removed
  2. Grayscale: convert to grayscale image
  3. Comparison: if the pixel value is greater than the next pixel value, it will be recorded as 1, otherwise it will be recorded as 0. This line is not compared with the next line, each line has 9 pixels, eight differences, there are 8 lines, a total of 64 bits
#均值哈希算法
def aHash(img):
    #缩放为8*8
    img=cv2.resize(img,(8,8),interpolation=cv2.INTER_CUBIC)
    #转换为灰度图
    gray=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
    #s为像素和初值为0,hash_str为hash值初值为''
    s=0
    hash_str=''
    #遍历累加求像素和
    for i in range(8):
        for j in range(8):
            s=s+gray[i,j]
    #求平均灰度
    avg=s/64
    #灰度大于平均值为1相反为0生成图片的hash值
    for i in range(8):
        for j in range(8):
            if  gray[i,j]>avg:
                hash_str=hash_str+'1'
            else:
                hash_str=hash_str+'0'           
    return hash_str
  
#差值感知算法
def dHash(img):
    #缩放8*9
    img=cv2.resize(img,(9,8),interpolation=cv2.INTER_CUBIC)
    #转换灰度图
    gray=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
    hash_str=''
    #每行前一个像素大于后一个像素为1,相反为0,生成哈希
    for i in range(8):
        for j in range(8):
            if   gray[i,j]>gray[i,j+1]:
                hash_str=hash_str+'1'
            else:
                hash_str=hash_str+'0'
    return hash_str

3.2 Discrete Cosine Transform DCT and Perceptual Hash Algorithm

We said above that there are 3 hash algorithms, but only two of them are mentioned. The third perceptual hash algorithm is related to a new thing DCT, which is the discrete cosine transform

The average hashing algorithm is too strict and not accurate enough, so it is more suitable for searching thumbnails. In order to obtain more accurate results, you can choose the perceptual hashing algorithm, which uses DCT (discrete cosine transform) to reduce the frequency. Perceptual hashing algorithm points for 8 steps:

  1. Reduce the picture: 32 * 32 is a better size, which is convenient for DCT calculation
  2. Convert to grayscale: Convert the scaled image to grayscale.
  3. Calculate DCT: DCT separates the picture into a set of ratios
  4. Reduced DCT: The matrix after DCT calculation is 32 * 32, and the 8 * 8 in the upper left corner is reserved, which represent the lowest frequency of the picture
  5. Calculate the average: Calculate the average of all pixels after the DCT is reduced
  6. Further reduce the DCT: if it is greater than the average value, it will be recorded as 1, otherwise it will be recorded as 0.
  7. Get the information fingerprint: combine 64 information bits, and the order is random to maintain consistency
  8. Finally, compare the fingerprints of the two pictures to obtain the Hamming distance

Discrete Cosine Transform (Discrete Cosine Transform) , mainly used to compress data or images, can convert spatial domain signals to frequency domain, has good decorrelation performance, DCT transform itself is lossless, and at the same time, due to DCT The transformation is symmetric, so we can use DCT inverse transformation after quantization and coding to restore the original image information at the receiving end. DCT transformation has a very wide range of uses in the current image analysis and compression fields. Our common JPEG static image coding DCT transform is used in standards such as MJPEG and MPEG dynamic coding

I don’t know much about this stuff, so I can only provide you with a little bit of theory first, and I’ll talk about it in detail when I have a chance.

4. Image clustering algorithm K-Means

4.1 Classification and clustering

Classification

Classification is actually the process of mining patterns from specific data and making judgments. The main process of classification learning can be divided into three steps:

  1. There is a class label number in the training data set, and it is judged whether it is a positive data set or a negative data set
  2. Then you need to learn and train the data set and build a trained model
  3. Make predictions on the prediction data set through the model and calculate the performance of its results

clustering

In a broad sense, clustering is to put together data members in a data set that are similar in some respects. A cluster is a collection of data instances in which data elements in the same cluster are similar to each other, but in different clusters. The elements in are different from each other

Since there is no classification or grouping information representing data categories in clustering, that is, these data are unlabeled , clustering is usually classified as Unsupervised Learning. From this perspective, classification is similar to supervised learning

The purpose of clustering is also to classify the data, but we don’t know how to classify the data in advance. It is entirely the algorithm itself that judges the similarity between the pieces of data, and puts the similar ones together. Before the clustering conclusion comes out, I don’t know the characteristics of each category at all. We must analyze the results of the clustering through human experience to see what characteristics the clustered category has.

In short, clustering is mainly "like clustering together", which gathers similar elements together through similarity, and it has no labels; while classification uses labels to train a model, the process of predicting new data sets, the data has labels

4.2 K-Means Clustering

K-Means clustering is the most commonly used clustering algorithm. It originated from signal processing. Its goal is to divide data points into K clusters. The biggest advantage of this algorithm is that it is simple, easy to understand, and has a fast operation speed . The disadvantages are It is necessary to specify the number of clusters before clustering. The biggest advantage of this algorithm is that it is simple, easy to understand, and has a fast operation speed. The disadvantage is that it is necessary to specify the number of clusters before clustering.

Analysis process of K-Means clustering algorithm

The first step: determine the K value, that is, divide the data set into K clusters or groups

Step 2: Randomly select K data points from the dataset as the centroid (Centroid) or data center

Step 3: Calculate the distance from each point to each centroid separately, and divide each point into the nearest centroid group

Step 4: When each centroid has gathered some points, redefine the algorithm to select a new centroid (for each cluster, calculate its mean, that is, get new k centroid points)

Step 5: Perform steps 3 and 4 iteratively until the iteration is terminated or the condition for the end of the iteration is met (the clustering result will not change)

import cv2
import numpy as np
import matplotlib.pyplot as plt
 
#读取原始图像
img = cv2.imread('lenna.png') 
print (img.shape)
 
#图像二维像素转换为一维
data = img.reshape((-1,3))
data = np.float32(data)
 
#停止条件 (type,max_iter,epsilon)
criteria = (cv2.TERM_CRITERIA_EPS +
            cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
 
#设置标签
flags = cv2.KMEANS_RANDOM_CENTERS
 
#K-Means聚类 聚集成2类
compactness, labels2, centers2 = cv2.kmeans(data, 2, None, criteria, 10, flags)
 
#K-Means聚类 聚集成4类
compactness, labels4, centers4 = cv2.kmeans(data, 4, None, criteria, 10, flags)
 
#K-Means聚类 聚集成8类
compactness, labels8, centers8 = cv2.kmeans(data, 8, None, criteria, 10, flags)
 
#K-Means聚类 聚集成16类
compactness, labels16, centers16 = cv2.kmeans(data, 16, None, criteria, 10, flags)
 
#K-Means聚类 聚集成64类
compactness, labels64, centers64 = cv2.kmeans(data, 64, None, criteria, 10, flags)
 
#图像转换回uint8二维类型
centers2 = np.uint8(centers2)
res = centers2[labels2.flatten()]
dst2 = res.reshape((img.shape))
 
centers4 = np.uint8(centers4)
res = centers4[labels4.flatten()]
dst4 = res.reshape((img.shape))
 
centers8 = np.uint8(centers8)
res = centers8[labels8.flatten()]
dst8 = res.reshape((img.shape))
 
centers16 = np.uint8(centers16)
res = centers16[labels16.flatten()]
dst16 = res.reshape((img.shape))
 
centers64 = np.uint8(centers64)
res = centers64[labels64.flatten()]
dst64 = res.reshape((img.shape))
 
#图像转换为RGB显示
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
dst2 = cv2.cvtColor(dst2, cv2.COLOR_BGR2RGB)
dst4 = cv2.cvtColor(dst4, cv2.COLOR_BGR2RGB)
dst8 = cv2.cvtColor(dst8, cv2.COLOR_BGR2RGB)
dst16 = cv2.cvtColor(dst16, cv2.COLOR_BGR2RGB)
dst64 = cv2.cvtColor(dst64, cv2.COLOR_BGR2RGB)
 
#用来正常显示中文标签
plt.rcParams['font.sans-serif']=['SimHei']
 
#显示图像
titles = [u'原始图像', u'聚类图像 K=2', u'聚类图像 K=4',
          u'聚类图像 K=8', u'聚类图像 K=16',  u'聚类图像 K=64']  
images = [img, dst2, dst4, dst8, dst16, dst64]  
for i in range(6):  
   plt.subplot(2,3,i+1), plt.imshow(images[i], 'gray'), 
   plt.title(titles[i])  
   plt.xticks([]),plt.yticks([])  
plt.show()

K-Means Clustering and Image Processing

In image processing China, image segmentation, image clustering, and image recognition can be realized through the K-Means clustering algorithm. We can cluster these pixels into K clusters through the algorithm, and then use the centroid in each cluster Points to replace all the pixels in the cluster, so that the image color can be quantized and compressed without changing the resolution, and the image color level segmentation can be realized

In the application of image processing, K-Means clustering has obvious advantages. Although the effect is not as good as that of our neural network, it is sufficient in general cases, and it is simple and fast, and the algorithm for processing large data sets is very efficient. High, and the effect is very ideal when the result cluster is dense

One of its disadvantages is that the number of clusters K must be specified just now, and the other is that it is sensitive to noise and outlier data. These defects can be made up for by other methods when dealing with general problems.


Copyright statement: The above learning content and pictures come from or refer to——Badou Artificial Intelligence Wang Xiaotian
If the article is helpful to you, remember to support it with one click~

Guess you like

Origin blog.csdn.net/qq_50587771/article/details/124433952