1. 简介

faiss是一种ann（Approximate Nearest Neighbor）库，可以用于特征的入库，检索。

不仅可以在cpu上使用，还可以利用GPU进行检索，提高检索的速度。

具体可以参考：https://github.com/facebookresearch/faiss

2. 安装

cpu版本，适用于各个系统

pip install faiss-cpu

cpu + gpu版本，目前不适用于windows系统

pip install faiss-gpu

3. 示例

新建索引

import faiss

# 传入特征维度
dim = 2048

# IndexFlatIP表示利用内积来比较特征的相似度
# 这里一般会让提取的特征进行L2归一化，那么内积就等于余弦相似度
index_ip = faiss.IndexFlatIP(dim)

# IndexFlatL2表示利用L2距离来比较特征的相似度
index_l2 = faiss.IndexFlatL2(dim)

添加特征入库

import numpy as np

# 新建一个特征，维度为2048，shape为(1, 2048)
feature = np.random.random((1, 2048)).astype('float32')
index_ip.add(feature)

# 当然，也可以一次性添加多个特征
features = np.random.random((10, 2048)).astype('float32')
index_ip.add(features)

# 打印index_ip包含的特征数量
print(index_ip.ntotal)

自己指定每个特征的id

在第2步中，添加特征的id是根据特征入库的顺序对应的，如果想自己指定id，可以用IndexIDMap包装一层，代码如下所示：
```
index_ids = faiss.IndexFlatIP(2048)
index_ids = faiss.IndexIDMap(index_ids)

# 添加特征，并指定id，注意添加的id类型为int64
ids = 20
feature_ids = np.random.random((1, 2048)).astype('float32')
index_ids.add_with_ids(feature_ids, np.array((ids,)).astype('int64'))
```
这里要注意，包装的索引必须是空的，即一开始新建索引之后，就进行包装，不能入库过一些特征后中途再包装。

检索

feature_search = np.random.random((1, 2048)).astype('float32')

# 检索最相似的topK个特征
topK = 5
D, I = index_ip.search(feature_search, topK)
# 返回的D表示相似度（或者距离）, I表示检索的topK个特征id（索引）

保存/读取索引文件

faiss的另一个优点是，可以将保存着特征的索引持久化，保存为文件，类似数据库，这样就不用每次都提取特征了。
```
# 保存索引
faiss.write_index(index_ip, 'my.index')

# 读取索引
index = faiss.read_index('my.index')
```

GPU的使用

# 利用单个gpu
res = faiss.StandardGpuResources()

gpu_index = faiss.index_cpu_to_gpu(res, 0, index_ip)

其他操作可以参考faiss在github上的地址。

结束。

faiss的python接口使用

faiss的python接口使用

1. 简介

2. 安装

3. 示例

猜你喜欢