Perceptual Hashing - Image Similarity Analysis

‍The author of this article is a development engineer of 360 Qiwu Troupe

introduction

Recently, I have been working on the mini-program skinning function, because the mini-programs with different theme colors correspond to different image libraries, and the method of referencing pictures in the project is the online URL address configuration form. When adding a new set of pictures, the pictures and online After comparing the URL links, configure them on the corresponding Key. After such a manual operation, it was found to be time-consuming and laborious. Because there are mainly color differences between different picture libraries, I wonder if it is possible to automatically match the pictures to the corresponding Key after comparing the similarity of the pictures. Finally, after researching, my needs are realized through perceptual hashing, and I will record it.

perceptual hashing

concept

Perceptual hashing is the use of fingerprinting algorithms to generate fragments, hashes, or fingerprints of various forms of multimedia. Perceptual hashing is a type of locality-sensitive hashing, which is similar if the characteristics of multimedia are similar .
In imaging applications, perceptual image hashing is a method of creating image fingerprints based on the visual appearance of an image . This fingerprint makes it easier to compare similar images. This algorithm is often used in image search scenarios, and returns visually similar images based on the provided images. For example, Google's image search is based on perceptual hashing.

Perceptual hashing vs cryptographic hashing

Perceptual hashing is a different concept than cryptographic hash functions such as MD5 and SHA1 .
With cryptographic hashing , the hash value is random. The data used to generate the hash is like a random seed, so the same data will produce the same result, but different data will produce wildly different results . In contrast, perceptual hashing can be compared - to get the similarity between two data sets .
Comparing two  SHA1  hashes can really only lead to two conclusions. If the hashes are different, the data is different. If the hashes are the same, the data is probably the same. (Having the same hash does not guarantee the same data due to the possibility of hash collisions).
Based on this feature, MD5 and
SHA1
can be used to calculate the hash value of the file, and then compare it to determine whether it is a duplicate file. However, for image similarity scenarios, such a strict comparison algorithm for judging the same data is not suitable. Because, for pictures, the modification of image format, metadata and other information will lead to different binary content when the pixel content is exactly the same, resulting in completely different hashes.
444106f58672f5f5fadd02836f93ae70.png
In addition, similar images caused by operations such as image resolution, brightness, chroma, contrast, blur, scaling, rotation, interception, and minor modification will get completely different results when compared using encrypted hashes. Therefore, what the image similarity requires is not a simple and crude comparison of the binary content of the file, but the color distribution of the pixels on the image . In the scenario of perceptual hash calculation similarity, it can be roughly understood as the distribution of the rightmost white level and the leftmost black level in the image color histogram.

Hamming distance

A fingerprint string can be generated for a picture through perceptual hashing. By comparing the distance between the strings (usually Hamming distance, Hamming distance), the smaller the distance, the more similar the two pictures are . Generally, we have the following rules:
: ::tips
Hamming distance = 0 -> particular like
Hamming distance < 5 -> very like
hamming distance > 10 -> different picture
:::
Hamming distance is used in data transmission error control coding, Hamming distance is a concept , which represents the number of different characters at corresponding positions in two (same length) strings. Perform an XOR operation on two strings and count the number of 1s, then this number is the Hamming distance .
In information theory, the Hamming distance between two strings of equal length is the number of different characters in the corresponding positions of the two strings. In other words, it is the number of characters that need to be replaced to transform one string into another. Example:
The Hamming distance between 101100 and 111000 is 2 .
The Hamming distance between 21438 and 22337 is 3.
The Hamming distance between "toned" and "roses" is 3.
682978d4993b77ad90cda71a0a7399f4.png

Implementation process

Perceptual hash algorithm is a general term for a class of algorithms. According to the different calculation methods of color distribution, commonly used perceptual hash algorithms include aHash, pHash, and dHash .
The implementation process can be simplified as: Simplify the picture - get the pixel value - calculate the image hash . Different hashing algorithms have the same basic characteristics: For images enlarged or reduced, different aspect ratios, small color differences (contrast, brightness, blur, saturation, etc.) will still match similar images with the average hash
algorithm (aHash ):
This algorithm is based on comparing each pixel of the grayscale image with the average value.
91a511cff35c6c2ea640c48734fc8147.png
Night view of the Bay Bridge – Photo credit: DH Parks (CC)
The process of calculating the average hash in python is shown below: the first step is to reduce the size and color of the image
using PIL or Pillow . This is done to reduce the complexity of the image so that the comparison is more accurate.

image = image.resize((8, 8), Image.ANTIALIAS)  # Reduce it's size.

a32f9e093f65623681044700960b0465.png

image = image.convert("L")  # Convert it to grayscale.

c6236c6eab39556e67171f08f580a633.png
Next we find the average pixel value of the image:

pixels = list(image.getdata())
avg = sum(pixels) / len(pixels)

pixelsJust a list of pixel values ​​ranging from 0 (black) to 255 (white), we just add them up and divide by the amount to get the average. For this image, the average pixel value is 61 (approximately 25% grayscale).
Now we can calculate the hash value. This is done by comparing each pixel in the image to the average pixel value. 0 if the pixel value is less than the average, 1 if greater than the average. Then we treat it as a string of bits and convert it to hex.

bits = "".join(map(lambda pixel: '1' if pixel &lt; avg else '0', pixels))  # '00010100...'
hexadecimal = int(bits, 2).__format__('016x').upper()

a4d0a9dad945b68cf1bef4c4809ee93d.png
Bits to convert to a black and white image.
This yields a hash value of 00010E3CE08FFFFE that can be used to compare the " structure " (looks) of this image with any other image hashed in the same way by Hamming distance. The closer the distance is to 0, the more similar the images are, and 0 means basically the same.
Perceptual hash algorithm (pHash):
The pHash algorithm is the perceptual hash algorithm. The principle is to reduce the image frequency through discrete cosine transform (DCT). In the image frequency, high frequency represents details and low frequency represents structure. Most of the image features are retained by lossy compression, and then the feature values ​​are compared, that is, high frequencies are ignored and low frequencies are retained .
DCT: The full name of DCT is Discrete Cosine Transform, which is discrete cosine transform. Its principle is similar to that of Fourier transform, which decomposes the target signal from complex time-domain signals into frequency-domain signals of different frequency intensities. According to the principle of Fourier, complex signals are the superposition of simple signals, so the basic principle of DCT is that through multiple DCT base signals of different strengths and frequencies, the original signal can be superimposed and "assembled", so the actual recorded At this time, it is no longer necessary to record the complex original signal, but to record the basis of DCT. And after the signal is converted into a DCT frequency domain signal, the upper left corner expresses its low-frequency signal strength, and the lower right corner expresses its high-frequency signal strength, so that the frequency components and strength of the signal are clear at a glance, without the complexity of the original signal.
In the DCT matrix, the coefficients from the upper left corner to the lower right corner represent higher and higher frequency coefficients, but except the upper left corner, the coefficients in other parts are all about 0. When the pHash algorithm simplifies the picture, it often reduces the picture to 32 32, so only select The part of 8 8 in the upper left corner of the DCT matrix can get most of the features of the image.
b08af0dc178b26b6a51b5fd1d199acb7.png
Then compare each value of the 8 8 partial matrix with the mean value, and the combination is also a 64-bit 0/1 hash sequence, and finally the similarity is obtained through the same bit comparison.
The advantage of the pHash algorithm is that it is more stable and the judgment effect is good, but the speed is slightly slower
Difference Hash Algorithm (dHash):
The dHash algorithm is the difference hash algorithm. The principle is to compare the size of adjacent elements in each row. If the pixel on the left is brighter than the pixel on the right, it is marked as 1, otherwise it is 0, and finally combined to get Ha Greek sequence .
When this algorithm simplifies the picture, it often reduces the picture to 9
8, and the adjacent comparison of 9 elements in each line can get 8 values, a total of 8 lines, and the result is also a 64-bit 0/1 hash sequence.
The hash sequences of the two pictures can also be compared to obtain the similarity.
The advantage of the dHash algorithm is that it is fast, and the judgment effect is better than that of aHash.
The above three hash algorithms can be implemented with opencv. There are many code implementations on the Internet, so I won’t post them here. You can also directly use the third-party library imagehash in python, which is packaged and implemented, and it is very simple to use.

Summarize

There are many scenarios for image similarity comparison. Only by understanding the principle of perceptual hash algorithm to compare image similarity can we figure out which one it is suitable for. Perceptual hashing provides a simple method that has a high recognition rate in most cases for scenarios such as slight color toning, zooming, and even small detail differences. Scenes such as cropping, rotating, and adding local corrections (borders, watermarks) that affect the color distribution structure of the image will have a relatively high recognition error rate.
There is no similarity image algorithm with a 100% accuracy rate, because the difference in image similarity can vary widely, and similarity is a relative concept, and even humans may make mistakes in judgment . Therefore, as long as you use some other auxiliary identification algorithms for compatibility according to your actual usage scenarios, you can meet your needs.

reference link

https://www.hackerfactor.com/blog/index.php?/archives/432-Looks-Like-It.html
https://web.archive.org/web/20171112054354/https://www.safaribooksonline.com/blog/2013/11/26/image-hashing-with-python/
https://blog.csdn.net/cjzjolly/article/details/123524616
https://zhuanlan.zhihu.com/p/68215900
https://www.yumefx.com/?p=3163

- END -

About Qi Wu Troupe

Qi Wu Troupe is the largest front-end team of 360 Group, and participates in the work of W3C and ECMA members (TC39) on behalf of the group. Qi Wu Troupe attaches great importance to talent training. There are engineers, lecturers, translators, business interface people, team leaders and other development directions for employees to choose from, and supplemented by providing corresponding training on technical skills, professional skills, general skills, leadership skills, etc. course. Qi Dance Troupe welcomes all kinds of outstanding talents to pay attention to and join Qi Dance Troupe with an open and talent-seeking attitude.

e36ccea673d94f55035a7fe0cdb0d4ae.png

Guess you like

Origin blog.csdn.net/qiwoo_weekly/article/details/132703140