How to use Elasticsearch to implement "picture search"

1. What is Map Search?

"Picture search" refers to a method of image search. Users can upload a picture, and the search engine will return similar or related picture results. This search method does not require the user to input text, but finds similar or related pictures by comparing the visual information of the pictures. This technology is useful in many different applications, such as finding identical or similar images, finding the source of an image, or identifying objects in an image, among others.

The technical basis of image search mainly includes image processing and machine learning. Through image processing, the features of images (such as color, shape, texture, etc.) can be extracted, and then these features can be compared through machine learning models to find similar pictures. In recent years, deep learning has also played an important role in image search, making search results more accurate and efficient.

For example: Google "search by image", Baidu image recognition.

c9986120562f7020ec670890bf3b6c3b.png cd3f2e099331f2af9982e3ad5de84371.png
2. Why do you want to search by image? Is traditional search not good?

Both image search and traditional text search have their own advantages and applications. Here are some reasons to use image search:

  • find similar images

If you have a picture and want to find similar pictures, or find other versions of the picture (such as different resolutions or whether it has a watermark, etc.), image search is the most straightforward method.

  • find the source of the image

If you find an image you like but don't know where it came from, image search can help you find its original source, such as which website it came from or who took it.

  • Identify what's in an image

Image search can also help you identify objects or people in pictures. Say, for example, you have a picture that contains an unknown object, and you can use an image search to identify what it is.

  • Beyond language and cultural barriers

Sometimes, you may not be able to describe exactly what you are searching for in words, or you may not know its proper name. In this case, image search can help you find the information you need, regardless of language and cultural differences.

For example: I took children to play in the community, and when I encountered a bug, the children gathered around, and the curious child asked, "What's the name of this bug?" Parents didn't even know it, it was a bit like the bean bug I saw when I was young, But it's not exactly the same, and finally got the answer with the help of "Baidu Map".

Overall, image search is a very useful tool that complements and enhances traditional text search. However, it is not a panacea, and sometimes it needs to be used together with text search to get the best search results.

3. How does Elasticsearch 8.X implement graph search?

From a macro point of view, similar to the several major steps of "putting an elephant in the refrigerator", Elasticsearch 8.X needs two core steps to realize image search:

Step 1: Feature Extraction

Use image processing and machine learning methods such as convolutional neural networks to extract features from images. These features are usually encoded as a vector, which can be used to measure the similarity of images. There are some open source tool libraries that can be used for image feature extraction, some examples are as follows:

Tool Library language main features
OpenCV C++,Python,Java Provide a variety of feature extraction algorithms, such as SIFT, SURF, ORB, etc.; also provide a series of image processing functions
TensorFlow Python Provide pre-trained deep neural network models, such as ResNet, VGG, Inception, etc., for extracting image features
PyTorch Python Provide pre-trained deep neural network models, such as ResNet, VGG, Inception, etc., for extracting image features
VLFeat C,MATLAB Provide a variety of feature extraction algorithms, such as SIFT, HOG, LBP, etc.

These libraries provide a large number of tools and functions for image feature extraction, which can help developers quickly implement image feature extraction. It should be noted that different feature extraction methods may be suitable for different tasks, and which method to choose depends on the specific application requirements.

Step 2: Indexing and Searching

Store the extracted feature vectors in Elasticsearch, and then use the search capability of Elasticsearch to find similar images. Elasticsearch's vector data type can be used to store vectors, and script_score queries can be used to calculate similarity.

4. Elasticsearch 8.X "picture search" actual combat

4.1 Architecture combing

22353cd3a8e3b2a88705c84ec3d35c78.png
  • Data layer: Image data is scattered on the Internet and needs to be collected and realized.

  • Collection layer: Use crawlers or existing tools to collect data and store it locally.

  • Storage layer: With the help of vector conversion tools or model tools, traverse the image as a vector and store it in Elasticsearch.

  • Business layer: After the image is turned into a vector, the image search is realized with the help of knn retrieval.

4.2 clip-ViT-B-32-multilingual-v1 tool selection

sentence-transformers/clip-ViT-B-32-multilingual-v1 is a multilingual version of OpenAI's CLIP-ViT-B32 model.

12e06a028be307aac1ed71e2ee295c75.png

The model can map text (over 50 languages) and images into a common dense vector space such that images and matching text are closely connected. This model can be used for image search (users search through a large number of images) and multilingual image classification (image labels are defined as text).

Model address: https://huggingface.co/sentence-transformers/clip-ViT-B-32-multilingual-v1

4.3 Generating vectors

The following functions can generate vectors from existing dataset images.

model.encode(image)

The generated vector reference is as follows:

be04545a0eade6e78d1561955e18de00.png

4.4 Executing a search

POST my-image-embeddings/_search
{
  "knn"           : {
  "field"         : "image_embedding",
  "k"             : 5,
  "num_candidates": 10,
  "query_vector"  : [
      -0.7245588302612305,
      0.018258392810821533,
      -0.14531010389328003,
      -0.08420199155807495,
     .....省略.......
    ]
  },
  "fields": [
    "image_id",
    "image_name",
    "relative_path"
  ]
}

The above search request uses Elasticsearch's k-NN (k-nearest neighbor) plugin to find the closest image to query_vector.

The specific parameter meanings are as follows:

parameter meaning
knn Indicates that k-nearest neighbor search will be used.
field Defines the fields to perform k-NN searches. In this case, the image_embedding field should contain the image's embedding vector.
num_candidates is an option that controls the search accuracy and performance trade-off. In a large index, finding the exact k nearest neighbors can be slow. Therefore, the k-NN plugin first finds num_candidates candidates, and then among these candidates finds the k nearest neighbors. In this example, num_candidates: 10 means that 10 candidates are found first, and then 5 nearest neighbors are found among these candidates.
query_vector Query vectors to compare. The k-NN plugin calculates the distance between this vector and each vector in the index, and returns the k closest vectors. In this case, query_vector is a large list of floats representing the embedding vector for the image.
fields Defines the returned fields. In this example, the search results will only contain the image_id, image_name, and relative_path fields. If the fields parameter is not specified, the search results will contain all fields.

4.5 Image Search Result Display

99884afb28df62392a884939aa51dbea.gif

c785992f3cf164e315af9a19b5c51a28.png

5. Summary

To sum up, the implementation of the image search function focuses on two key components: Elasticsearch and the pre-trained model sentence-transformers/clip-ViT-B-32-multilingual-v1.

Elasticsearch, as a Lucene-based search server, provides a RESTful web interface-based platform for distributed multi-user full-text search. On the other hand, sentence-transformers/clip-ViT-B-32-multilingual-v1, this pre-trained model, based on OpenAI's CLIP model, can generate vector representations of text and images, which is very useful for comparing the similarity of text and images important.

In the specific implementation process, the features of each image are extracted by the pre-trained model, and the obtained vector can be regarded as the mathematical representation of the image. These vectors will be stored in Elasticsearch, which provides an efficient nearest neighbor search mechanism for the graph search function. When a new image is uploaded for search, the pre-trained model is also used to extract features, get a vector, and compare it with the image vector stored in Elasticsearch to find the most similar image.

The whole process reflects the important role of the pre-trained model in image feature extraction, and the powerful ability of Elasticsearch in efficient nearest neighbor search. The combination of the two provides a reliable technical support for the realization of the image search function.

reference

  • 1、https://huggingface.co/sentence-transformers/clip-ViT-B-32-multilingual-v1

  • 2、https://github.com/rkouye/es-clip-image-search

  • 3、https://github.com/radoondas/flask-elastic-image-search

  • 4、https://www.elastic.co/guide/en/elasticsearch/reference/current/knn-search.html

  • 5、https://unsplash.com/data

------

We have created a high-quality technical exchange group. When you are with excellent people, you will become excellent yourself. Hurry up and click to join the group and enjoy the joy of growing together. In addition, if you want to change jobs recently, I spent 2 weeks a year ago collecting a wave of face-to-face experience from big factories. If you plan to change jobs after the festival, you can click here to claim them !

recommended reading

··································

Hello, I am DD, a programmer. I have been developing a veteran driver for 10 years, MVP of Alibaba Cloud, TVP of Tencent Cloud. From general development to architect to partner. Along the way, my deepest feeling is that we must keep learning and pay attention to the frontier. As long as you can persevere, think more, complain less, and work hard, it will be easy to overtake on corners! So don't ask me if it's too late to do what I do now. If you are optimistic about something, you must persevere to see hope, not to persevere only when you see hope. Believe me, as long as you stick to it, you will be better than now! If you have no direction yet, you can follow me first, and I will often share some cutting-edge information here to help you accumulate capital for cornering and overtaking.

Guess you like

Origin blog.csdn.net/j3T9Z7H/article/details/131388281