Image similarity comparison
1. Demand
Suppose there is a picture pool with 100 million pictures. Given a target image, match it in the image pool.
- Determine whether a picture has appeared in the picture pool. (exactly the same)
- Check if something similar has happened. For example, the similarity between two pictures is 90, and the two pictures are describing one thing.
2. Demand realization plan
For the above requirements, the implementation idea is: convert the picture into a vector, store it in ES, and search for pictures by picture. However, the implementation in ES is to search for pictures by pictures, which is realized by KNN. KNN will always return the topN result. In the image pool, even if there is no image consistent with the target, it will still return the most similar image.
It is easy to use ES to achieve requirement 1 . Because of the exact same picture, a relevance score of 1 is returned. However, in the case that the search target does not exist, the returned results and the given relevance score are likely to be completely irrelevant to the target, but the similarity is still greater than 85%. We cannot judge whether the recall result and the target are really similar. Therefore, for requirement 2, if the recall result is not 1, it should be judged again whether the recalled most relevant picture is really similar to the target picture.
For requirement 2, other image similarity algorithms should be used to perform a verification . According to research and testing, using openCV and using the histogram comparison method can have good results. According to the test results, in the following cases, we can set the correlation greater than 85% to distinguish whether the pictures are similar. (Need to test more cases to verify the optimal similarity threshold)
3. Counterexamples of problems in ES vector retrieval
Negative example of es recall effect:
1. The following picture is the search target picture
2. The result of the recall is top3
The above three pictures are the results of the recall, and the sorting of the pictures is the result of sorting the relevance of the ES recall. If the results of the first two images do not exist in the image pool, it is problematic to recall the third image and cannot be used for weight ranking.
3. The scores given by the above three pictures in es are as follows:
From the correlation score given by es, the score of the first picture is 1, which can be used to judge that there is no problem with complete consistency. The scores of the second picture and the third picture are very similar, but the actual correlation between the third picture and the first picture is not good. If we want to exclude the third picture through correlation, it is not appropriate to only time-share the correlation returned by es.
- Use openCV test to compare the effect of two pictures
For requirement 2, ES cannot be used. You can use openCV to re-comparison the first recalled result when the score is not 1.
OpenCV compares the histograms of the two pictures to get a correlation score, which is more reliable. At least it looks like the effect we want.
Case 1
Although the two pictures are not the same person, they both describe one thing. Ordinarily, it should be describing one thing. We believe that these two pictures are similar, with a similarity of more than 90%.
Relevance score calculated by openCV
Mean Square Error (MSE): 131.44561624837127
Structural Similarity Index (SSIM): 5.7201094656E10
Peak Signal-to-Noise Ratio (PSNR): 26.943342540382247
Image similarity (histogram): 0.8858558728156901
Case 2
Two pictures, from the same video, with different frames. Judging intuitively, these two are the same thing. The similarity is greater than 95.
Mean Square Error (MSE): 123.0275316249348
Structural Similarity Index (SSIM): 1.909637632E9
Peak Signal-to-Noise Ratio (PSNR): 27.230780502837018
Image similarity (histogram): 0.9565945992942751
Case 3
Although they are all Musk. But this is describing two different things. The similarity should be low.
Mean Square Error (MSE): 209.28278961867477
Structural Similarity Index (SSIM): 5.423145472E9
Peak Signal-to-Noise Ratio (PSNR): 24.923468452906206
Image similarity (histogram): 0.34953414682303025
Case 4
Among the following two pictures, the second picture can be re-compared using openCV. The correlation given in es is 85%. The similarity of the openCV comparison is 77%, and the error result can be excluded by setting the similarity 85% threshold.
Mean Square Error (MSE): 185.4257086381148
Structural Similarity Index (SSIM): 2.40297230336E11
Peak Signal-to-Noise Ratio (PSNR): 25.449104134454515
Image similarity (histogram): 0.7713211102457774
ES uses the KNN vector retrieval of es 8.8.