使用Python计算图像与文字的语义相似度

这是图像和自然语言的交叉应用。无论是计算图像与图像的相似度,还是计算图像与文字或者文字与文字,本质都是计算特征向量的相似度。

计算图像与文字的相似度,实际上是评价文字描述图像的准确性。在Image Caption、Video Caption、VQA等视觉理解领域都非常有用。

本文代码来源:https://github.com/hila-chefer/Transformer-MM-Explainability/tree/main/CLIP

官方网站:https://openai.com/blog/clip/

从官方给的算法流程图可以看出,计算图像与文字的相似度,就是将图像和文字映射到特征空间,然后通过度量学习,拉近语义相同的图像特征与文字特征的距离。

(好了,超过两分钟了...原理就不看了,直接跑代码,反正能用就行。)

实验环境:

cuda 10.0.130

torch 1.7.1

torchvision 0.8.1

扫描二维码关注公众号,回复: 13309535 查看本文章

ftfy、regex、tqdm

配置工程:新建空的pyCharm工程,将所有文件放在工程路径。下载大佬的模型:https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt 也放在工程路径。

 运行代码:新建main.py文件,将下面的代码复制进去。不出意外应该没啥错。

import torch
import clip
from PIL import Image

device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("/home/cbl/caoyi/wq/ViT-B-32", device=device)

image = preprocess(Image.open("/home/cbl/caoyi/wq/1.jpg")).unsqueeze(0).to(device)
text = clip.tokenize(["a dog is running", "a people is jump", "some people are running"]).to(device)

#image = preprocess(Image.open("/home/cbl/caoyi/wq/2.jpeg")).unsqueeze(0).to(device)
#text = clip.tokenize(["a man is singing", "a man is eating", "a woman is eating"]).to(device)

#image = preprocess(Image.open("/home/cbl/caoyi/wq/3.jpeg")).unsqueeze(0).to(device)
#text = clip.tokenize(["a dog is runnning", "a man is running", "a man is walking the dogs"]).to(device)

with torch.no_grad():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    
    logits_per_image, logits_per_text = model(image, text)
    probs = logits_per_image.softmax(dim=-1).cpu().numpy()

print("Label probs:", probs)

测试结果:

 "a dog is running":相似度 0.1931

"a people is jump":相似度 0.29

"some people are running":相似度 0.517

 "a man is singing":相似度 0.003326

"a man is eating":相似度 0.9517

"a woman is eating":相似度 0.045

 

 "a dog is runnning":相似度 0.11115

"a man is running":相似度 0.001089

"a man is walking the dogs":相似度 0.8877

猜你喜欢

转载自blog.csdn.net/XLcaoyi/article/details/119413324