Use the Volcano Cloud Search ESCloud service to build image-text retrieval applications (search images by text/search images by images)

Image-text retrieval has a wide range of applications in daily life. Common image retrieval includes text-based content search and image-based content search. Users can quickly find the same or similar pictures in the massive picture library by entering text descriptions or uploading pictures. This search method is widely used in popular fields such as e-commerce, advertising, design and search engines.

Based on the volcano engine cloud search service ESCloud and the image-text feature extraction model CLIP, this article quickly builds a set of end-to-end solutions for image search by image and image search by text.

Principle introduction

Image search technology, using text descriptions and images as retrieval objects, extracts features from image and text respectively, and establishes correlations between text and images in the model, then performs feature vector retrieval in massive image databases, and returns the most relevant retrieval objects collection of records. Among them, the feature extraction part adopts the CLIP model, and the vector retrieval uses the volcano engine cloud search service to quickly search among massive image features.

 

environment dependent preparation

1. Log in to the Volcano Engine cloud search service, create an instance cluster, and select 7.10 for the cluster version.

2.Python Client key dependency preparation

pip install -U sentence-transformers # 模型相关 pip install -U elasticsearch7==7.10.1 # ES向量数据库相关 pip install -U pandas #分析splash的csv

Dataset preparation

We choose Unsplash as the image data set. For details, please refer to: https://unsplash.com/data. In this example, we chose to download the Lite dataset, which contains approximately 25,000 photos. When the download is complete, you'll get a zipped file that contains a CSV file describing the image. By reading the CSV file with Pandas, we will get the URL address of the image.

def read_imgset(): path = '${下载的数据集所在路径}' documents = ['photos', 'keywords', 'collections', 'conversions', 'colors'] datasets = {} for doc in documents: files = glob.glob(path + doc + ".tsv*") subsets = [] for filename in files: # pd 分析csv df = pd.read_csv(filename, sep='\t', header=0) subsets.append(df) datasets[doc] = pd.concat(subsets, axis=0, ignore_index=True) return datasets

Model selection

This article selects the model of searching images by images and images by text . This model is trained based on the model of the OpenAI 2021 paper. The model CLIP can link images and text together. The goal is to obtain a model that can express both images and text clip-ViT-B-32. Model.

ESCloud Mapping preparation

PUT image_search { "mappings": { "dynamic": "false", "properties": { "photo_id": { "type": "keyword" }, "photo_url": { "type": "keyword" }, "describe": { "type": "text" }, "photo_embedding": { "type": "knn_vector", "dimension": 512 } } }, "settings": { "index": { "refresh_interval": "60s", "number_of_shards": "3", "knn.space_type": "cosinesimil", "knn": "true", "number_of_replicas": "1" } } }

ESCloud database operation

connect

Log in to the Volcano Engine cloud search service, select the newly created instance, and select to copy the public network access address (if it is closed, you can choose to open it):

# 连接云搜索实例 cloudSearch = CloudSearch("https://{user}:{password}@{ES_URL}", verify_certs=False, ssl_show_warn=False)

to write

from sentence_transformers import SentenceTransformer from elasticsearch7 import Elasticsearch as CloudSearch from PIL import Image import requests import pandas as pd import glob from os.path import join # We use the original clip-ViT-B-32 for encoding images img_model = SentenceTransformer('clip-ViT-B-32') text_model = SentenceTransformer('clip-ViT-B-32-multilingual-v1') # Construct request for es def encodedataset(photo_id, photo_url, describe, image): encoded_sents = { "photo_id": photo_id, "photo_url": photo_url, "describe": describe, "photo_embedding": img_model.encode(image), } return encoded_sents # download images def load_image(url_or_path): if url_or_path.startswith("http://") or url_or_path.startswith("https://"): return Image.open(requests.get(url_or_path, stream=True).raw) else: return Image.open(url_or_path) # 从unsplash的csv文件解出图片url,然后下载图片, # 下载完了后用model 生成embedding,并构造成ES的请求进行写入 def get_imgset_and_bulk(): datasets = read_imgset() datasets['photos'].head() kwywords = datasets['keywords'] docs = [] #遍历CSV, 根据photo_url 去download photo for idx, row in datasets['photos'].iterrows(): print("Process id: ", idx) # 获取CSV 中的url photo_url = row["photo_image_url"] photo_id = row["photo_id"] image = load_image(photo_url) # 找到photo_id 且 suggested true 对应的图片描述 filter = kwywords.loc[(kwywords['photo_id'] == photo_id) & (kwywords['suggested_by_user'] == 't')] text = ' '.join(set(filter['keyword'])) # 封装写入ES的请求 one_document = encodedataset(photo_id=photo_id, photo_url=photo_url, describe=text, image=image) docs.append({"index": {}}) docs.append(one_document) if idx % 20 == 0: # 20条一组进行写入 resp = cloudSearch.bulk(docs, index='image_search') print(resp) docs = [] return docs if __name__ == '__main__': docs = get_imgset_and_bulk() print(docs)

Inquire

Search images by text: text vectorization, execute knn query

def extract_text(text): # 文搜图 res = cloudSearch.search( body={ "size": 5, "query": {"knn": {"photo_embedding": {"vector": text_model.encode(text), "k": 5}}}, "_source": ["describe", "photo_url"], }, index="image_search2", ) return res fe = FeatureExtractor() @app.route('/', methods=['GET', 'POST']) def index(): # ... resp = fe.extract_text(text) return render_template('index.html', query_text=text, scores=resp['hits']['hits']) # ...

Search for sunset and print the result

 

Search by image: image vectorization, execute knn query

def extract(img): # 图搜图 res = cloudSearch.search( body={ "size": 5, "query": {"knn": {"photo_embedding": {"vector": img_model.encode(img), "k": 5}}}, "_source": ["describe", "photo_url"], }, index="image_search2", ) return res fe = FeatureExtractor() @app.route('/', methods=['GET', 'POST']) def index(): # ... # Save query image img = Image.open(file.stream) # PIL image uploaded_img_path = "static/uploaded/" + datetime.now().isoformat().replace(":", ".") + "_" + file.filename img.save(uploaded_img_path) # Run search resp = fe.extract(img) return render_template('index.html', query_path=uploaded_img_path, scores=resp['hits']['hits']) # ...

Search for seal pictures and print results

 


The volcano engine cloud search service ESCloud is compatible with Elasticsearch, Kibana and other software and common open source plug-ins, providing structured and unstructured text multi-condition retrieval, statistics, and reports, enabling one-click deployment, elastic scaling, simplified operation and maintenance, and rapid construction Business capabilities such as log analysis and information retrieval analysis.

Guess you like

Origin blog.csdn.net/weixin_46399686/article/details/132091026