BOF image retrieval


Reference 1
Reference 2

search image

Image retrieval has been applied in many fields. The most common example in our daily life is Baidu image recognition. We input a picture, and it can help us search for many similar pictures. So the question is, how does an image search engine like Baidu Litu recognize the pictures we input? At the same time, after recognizing our picture, where does it display similar pictures to users? This is the image retrieval explained later.

There are two main aspects of image retrieval, one is text-based image retrieval (TBIR), and the other is content-based image retrieval (CBIR).

Text-Based Image Retrieval

Text-based image retrieval mainly uses text annotation to add keywords to images. This method requires manual annotation of each image, which is very labor-intensive.

Content-Based Image Retrieval

Content-based image retrieval eliminates manual annotation. This retrieval method first needs to prepare a data set (dateset) as training, extract the feature ( SIFT feature ) vector of each image in the data set through a certain algorithm, and then store these features to form a database . When you need to search for a certain image When taking a picture, input this picture, then extract the features of the input picture, compare the features of the extracted input picture with the features in the database with a certain matching criterion, and finally output the similarity from the database according to the similarity from large to small picture of.

BOF(Bag Of Feature)

Principle (BOW and BOF)

1. The proposal of BOF comes from BOW. BOW is Bag Of Word, which is to pack the same words together. In an article, assuming that the order of the words and whether the grammar is correct or not is not considered, we can pack the same words together, and construct a text descriptor according to the frequency (number of occurrences) of the words, similar to a histogram. This is BOW. For example, in an article as shown in the figure below, both I and you appear twice, and the others only appear once.
BOW
2. Extending from the above BOW to image retrieval, we can pack the features (SIFT features) in the image to construct a unique The feature identifiers of are placed in the Bag to construct a visual dictionary.
BOF
3. The visual dictionary is a collection of all feature vectors extracted from all training images. When there are many images, it is difficult to avoid the large size of the visual dictionary. At this time, the K-means clustering method can be used. For example, many SIFT feature points can be extracted from an image, and there are inevitably some very similar feature points. These similar feature points are clustered into a category using K-Means, and the cluster center is taken as a representative. Representing these similar feature points, these cluster centers are the "(visual) words" of the image, and these words represent the image . The number of words is the size of the visual dictionary.
visual dictionary

K-means clustering

1. The k-means idea of ​​BOF is to minimize the Euclidean distance between each feature x i and m k
Euclidean distance
. The calculation formula is as follows: 2. The algorithm flow is

  • Randomly initialize K cluster centers
  • Corresponding to each feature, assign a value to a center (category) according to the distance relationship
  • For each category, recalculate the cluster center according to its corresponding feature set

3. The k-means clustering adopts an unsupervised learning strategy (learning without a teacher), and clustering is the key to realizing a visual dictionary. Here are some sample sight words
sample 1
sample 2

The process of BOF algorithm

1. Prepare a training picture set and extract the SIFT features of these pictures

2. Perform K-Means clustering on these features to create a visual dictionary.

3. For the input feature set, quantify it according to the visual dictionary

4. Input the test picture, extract the SIFT features of the test picture, and convert the input image into a frequency histogram of visual words.

5. Construct a feature-to-image posting list, and quickly index related images through the posting list.

6. Perform histogram matching based on index results

TF-IDF weights for visual words

1. In an article, some words appear frequently, such as you, to, of, etc. These frequently appearing words will largely lead to text retrieval errors during text retrieval.

2. IDF (Inverse Document Term Frequency) : The same is true in the BOF algorithm of image retrieval. Some similar features appear more frequently in many images. These high-frequency features will cause the image to retrieve the wrong image. The solution is Add a weight to the extracted features. If the frequency of this feature is relatively high, the weight should be lower. On the contrary, if some features appear less frequently, it means that it is more representative of the corresponding image. Then the weight should be relatively high. The following figure is the calculation formula of IDF weight, add 1 to prevent the denominator from being 0

IDF
3. TF (Term Frequency) : The inverse document term frequency is aimed at the extracted SIFT features, while the term frequency is aimed at visual words. Contrary to the above inverse document word frequency, if a visual word appears more frequently in an image, it is more representative of the image. The weight calculation formula of TF is as follows:
TF

inverted list

1. Constructing an inverted list can quickly index images. It is a collection of key-value pairs, similar to a Map in C language. Suppose there is a database and a visual dictionary as follows, the database image contains 100 pictures I1~I100, and the visual dictionary contains 100 visual words W1~W100.
dataset
2. Construct the inverted table of the data. The inverted table in the figure indicates that the visual word W1 has appeared in the images I1 and I2, and so on. Now enter a picture Image0, which has 5 features. From the comparison of posting tables, it can be found that the visual words of the input image have appeared in image I2, so I2 can be returned to the user as a result.
inverted list

image retrieval code

data

The training set has a total of 148 pictures (no more, and the memory will overflow)
Training set

the code

1. Generate a vocabulary dictionary for the training image set. Extract the SIFT feature points of the image before generating the dictionary

# -*- coding: utf-8 -*-
import pickle
from PCV.imagesearch import vocabulary
from PCV.tools.imtools import get_imlist
from PCV.localdescriptors import sift

#获取图像列表
# imlist = get_imlist('D:/pythonProjects/ImageRetrieval/first500/')
imlist = get_imlist('D:/pythonProjects/ImageRetrieval/animaldb/')
nbr_images = len(imlist)

#获取特征列表
featlist = [imlist[i][:-3]+'sift' for i in range(nbr_images)]

#提取文件夹下图像的sift特征
for i in range(nbr_images):
    sift.process_image(imlist[i], featlist[i])

#生成词汇
voc = vocabulary.Vocabulary('ukbenchtest')
voc.train(featlist, 1000, 10)
#保存词汇
# saving vocabulary
with open('D:/pythonProjects/ImageRetrieval/animaldb/vocabulary.pkl', 'wb') as f:
    pickle.dump(voc, f)
print('vocabulary is:', voc.name, voc.nbr_words)

2. Add the image to the database

# -*- coding: utf-8 -*-
import pickle
from PCV.imagesearch import imagesearch
from PCV.localdescriptors import sift
from sqlite3 import dbapi2 as sqlite
from PCV.tools.imtools import get_imlist

#获取图像列表
imlist = get_imlist('D:/pythonProjects/ImageRetrieval/animaldb/')
nbr_images = len(imlist)
#获取特征列表
featlist = [imlist[i][:-3]+'sift' for i in range(nbr_images)]

# load vocabulary
#载入词汇
with open('D:/pythonProjects/ImageRetrieval/animaldb/vocabulary.pkl', 'rb') as f:
    voc = pickle.load(f)
#创建索引
indx = imagesearch.Indexer('testImaAdd.db',voc)
indx.create_tables()
# go through all images, project features on vocabulary and insert
#遍历所有的图像,并将它们的特征投影到词汇上
for i in range(nbr_images)[:500]:
    locs,descr = sift.read_features_from_file(featlist[i])
    indx.add_to_index(imlist[i],descr)
# commit to database
#提交到数据库
indx.db_commit()

con = sqlite.connect('testImaAdd.db')
print(con.execute('select count (filename) from imlist').fetchone())
print(con.execute('select * from imlist').fetchone())

3. Image retrieval test

# -*- coding: utf-8 -*- 
#使用视觉单词表示图像时不包含图像特征的位置信息
import pickle
from PCV.localdescriptors import sift
from PCV.imagesearch import imagesearch
from PCV.geometry import homography
from PCV.tools.imtools import get_imlist

# load image list and vocabulary
#载入图像列表
#imlist = get_imlist('E:/Python37_course/test7/first1000/')
imlist = get_imlist('D:/pythonProjects/ImageRetrieval/animaldb/')
nbr_images = len(imlist)
#载入特征列表
featlist = [imlist[i][:-3]+'sift' for i in range(nbr_images)]

#载入词汇
with open('D:/pythonProjects/ImageRetrieval/animaldb/vocabulary.pkl', 'rb') as f:
    voc = pickle.load(f)

src = imagesearch.Searcher('testImaAdd.db',voc)# Searcher类读入图像的单词直方图执行查询

# index of query image and number of results to return
#查询图像索引和查询返回的图像数
q_ind = 0          # 匹配的图片下标
nbr_results = 148  # 数据集大小

# regular query
# 常规查询(按欧式距离对结果排序)
res_reg = [w[1] for w in src.query(imlist[q_ind])[:nbr_results]] # 查询的结果 
print ('top matches (regular):', res_reg)

# load image features for query image
#载入查询图像特征进行匹配
q_locs,q_descr = sift.read_features_from_file(featlist[q_ind])
fp = homography.make_homog(q_locs[:,:2].T)

# RANSAC model for homography fitting
#用单应性进行拟合建立RANSAC模型
model = homography.RansacModel()
rank = {
    
    }
# load image features for result
#载入候选图像的特征
for ndx in res_reg[1:]:
	try:
    	locs,descr = sift.read_features_from_file(featlist[ndx])  # because 'ndx' is a rowid of the DB that starts at 1
	except:
		continue
    # get matches
    matches = sift.match(q_descr,descr)
    ind = matches.nonzero()[0]
    ind2 = matches[ind]
    tp = homography.make_homog(locs[:,:2].T)
    # compute homography, count inliers. if not enough matches return empty list
    # 计算单应性矩阵
    try:
        H,inliers = homography.H_from_ransac(fp[:,ind],tp[:,ind2],model,match_theshold=4)
    except:
        inliers = []
    # store inlier count
    rank[ndx] = len(inliers)

# sort dictionary to get the most inliers first
# 对字典进行排序,可以得到重排之后的查询结果
sorted_rank = sorted(rank.items(), key=lambda t: t[1], reverse=True)
res_geom = [res_reg[0]]+[s[0] for s in sorted_rank]
print ('top matches (homography):', res_geom)

# 显示查询结果
imagesearch.plot_results(src,res_reg[:6]) #常规查询
imagesearch.plot_results(src,res_geom[:6]) #重排后的结果

operation result

Modify the number of categories of K-means clustering for each run, and observe the running results

# 第二个参数是k-means聚类的类别
voc.train(featlist, 1000, 10)

retrieved images
input

Results obtained with different number of cluster categories

1. Category K=100
regular query results,
conventional
rearranged results
rearrange
2. Category K=500
regular query results,
conventional
rearranged query results
rearrange

3. Category K=1000

General query results
conventional

rearranged result
After rearranging

Result analysis

From the above running results, it can be found that as the number of clustering categories increases, the image retrieval results are getting better and closer to rationality. But when the number of categories K increases to a certain value, the effect of image retrieval begins to decline again. Therefore, the number of clustering categories can greatly affect the effect of image retrieval . In my running results here, the number of categories K=500 is relatively good, and the number of categories may need to be higher, but when it increases to K=1000, the effect of image retrieval begins to decline . Generally the best K is obtained through exhaustion.

Taking the running result of category K=1000 as an example, input an image of a puppy and search it in the database. The search results show images of cats, dogs, and sheep, not all of them are images of puppies. Observing the input puppy image, the overall image tends to be orange-yellow, and the corner points are generally located in the dog's nose, mouth, eyes, feet, and ears, so the extracted SIFT feature points are generally divided into these places.

In the image retrieved from the database, for the second puppy, it is darker, and the corner position is obvious at the nose and ears, and the color is similar. The SIFT features of these places will be similar to the SIFT features of the nose and ears of the input image, so considered to match the input image;

In the third photo, the whole body of the kitten is relatively black, but the ears and glasses are obviously at the corners, and the color is similar, so the kitten is shown as a puppy

The fourth image of a goat will be regarded as a puppy because its feature points in the eyes, ears, nose, and mouth are similar to the feature points in the corresponding positions of the input image, and they are all black. Although the goat is mostly white, it happens that these places will not be considered as corner points, so these places are ignored.

The condition of the kitten in the fourth picture is similar to the kitten in the third picture. The corners of the eyes and ears are black and orange, which are close to the puppy in the input image.

Chapter 5 The similarity between the puppy and the input image puppy is also in the SIFT features of the glasses, mouth, and nose.

Problems encountered during the experiment

1. When extracting the SIFT features of the image, the image was not read from the folder using the get_imlist method. After carefully looking at the method of reading the file, the file must be in the format of .jpg to read the file.

2. Read the image, extract features, and report an error when generating the vocabulary. The reason is that there are too many training images (originally 1000), which caused the maximum capacity of the memory space that can be used to be exceeded when reading. There are two solutions, one is to increase the memory space, and the second is to reduce the number of images, here I chose the second one, reducing the number of images to 148

numpy.core._exceptions.MemoryError: Unable to allocate array with shape (347792, 128) and data type float64

3. When testing input image retrieval, an error is reported. Check the line of code that reported the error, and it runs normally after adding the try except statement

File “D:/pythonProjects/ImageRetrieval/rearrangement_3.py”, line 46, in <module>
locs,descr = sift.read_features_from_file(featlist[ndx]) # because ‘ndx’ is a rowid of the DB that starts at 1
IndexError: list index out of range

#在报错行出添加try except语句
    try:
        locs,descr = sift.read_features_from_file(featlist[ndx])  # because 'ndx' is a rowid of the DB that starts at 1
    except:
        continue

Guess you like

Origin blog.csdn.net/weixin_44500303/article/details/117594154