PostgreSQL [Application 01] uses the Vector plug-in to implement vector similarity query (Instructions for installing the pgvector plug-in for PostgreSQL deployed by Docker) and comparison with the Milvus vector library

1.Background

If you want to implement a feature vector similarity search function, the project is developed using Java and the database is PostgreSQL. The options available are:

  • Vector database - Milvus is easy to deploy, with visual interface Attu and JavaSDK (but requires special deployment).
  • PostgreSQL plug-in (Cube supports 100 dimensions, Pase supports 512 dimensions, and Vector supports 16,000 dimensions).

Since the feature vector of the extracted image has 1024 dimensions, only Milvus and PostgreSQL plug-in Vector can be used.

2.Application

2.1 Kite

Milvus official website has a detailed installation process and code that I won’t go into details here. Use Docker to install it. The version is. 2.2.9Here we provide you with a simple tool class. The database connection parameters are not parameterized. Friends can optimize and simplify the result data. Formatting:

Result encapsulation:

@Data
@Builder
public class MilvusRes {
    
    
    public float score;
    public String imagePath;
}

Tools:

@Slf4j
@Component
public class MilvusUtil {
    
    
    public MilvusServiceClient milvusServiceClient;
    @PostConstruct
    private void connectToServer() {
    
    
        milvusServiceClient = new MilvusServiceClient(
                ConnectParam.newBuilder()
                        .withHost("your service host")
                        .withPort(19530)
                        .build());
        // 加载数据
        LoadCollectionParam faceSearchNewLoad = LoadCollectionParam.newBuilder().withCollectionName("CollectionName").build();
        R<RpcStatus> rpcStatusR = milvusServiceClient.loadCollection(faceSearchNewLoad);
        log.info("Milvus LoadCollection [{}]", rpcStatusR.getStatus());

    }

    public int insertDataToMilvus(String id, String path, float[] feature) {
    
    
        List<InsertParam.Field> fields = new ArrayList<>();
        List<Float> featureList = new ArrayList<>(feature.length);
        for (float v : feature) {
    
    
            featureList.add(v);
        }
        fields.add(new InsertParam.Field("field1", Collections.singletonList(id)));
        fields.add(new InsertParam.Field("field2", Collections.singletonList(path)));
        fields.add(new InsertParam.Field("field3", Collections.singletonList(featureList)));

        InsertParam insertParam = InsertParam.newBuilder()
                .withCollectionName("CollectionName")
                //.withPartitionName("novel")
                .withFields(fields)
                .build();
        R<MutationResult> insert = milvusServiceClient.insert(insertParam);
        return insert.getStatus();
    }

    public List<MilvusRes> searchImageByFeature(float[] feature) {
    
    
        List<Float> featureList = new ArrayList<>(feature.length);
        for (float v : feature) {
    
    
            featureList.add(v);
        }
        List<String> queryOutputFields = Arrays.asList("field");
        SearchParam faceSearch = SearchParam.newBuilder()
                .withCollectionName("CollectionName")
                .withMetricType(MetricType.IP)
                .withVectorFieldName("VectorFieldName")
                .withVectors(Collections.singletonList(featureList))
                .withOutFields(queryOutputFields)
                .withTopK(10).build();
        // 执行搜索
        long l = System.currentTimeMillis();
        R<SearchResults> respSearch = milvusServiceClient.search(faceSearch);
        log.info("MilvusServiceClient.search cost [{}]", System.currentTimeMillis() - l);
        // 解析结果数据
        SearchResultData results = respSearch.getData().getResults();
        int scoresCount = results.getScoresCount();
        SearchResultsWrapper wrapperSearch = new SearchResultsWrapper(results);
        List<MilvusRes> milvusResList = new ArrayList<>();
        for (int i = 0; i < scoresCount; i++) {
    
    
            float score = wrapperSearch.getIDScore(0).get(i).getScore();
            Object imagePath = wrapperSearch.getFieldData("field1", 0).get(i);
            MilvusRes milvusRes = MilvusRes.builder().score(score).imagePath(imagePath.toString()).build();
            milvusResList.add(milvusRes);
        }
        return milvusResList;
    }
}

The quantity is as shown in the picture:

Insert image description here

The performance test results are as follows:

MilvusServiceClient.search cost [24]

2.2 Vector

Basic information is explained on the following websites, so I won’t go into details here.

The database PostgreSQL is deployed using Docker and the version is 12.12. The plug-in installation process is as follows:

# 进入容器
docker exec -it CONTAINER ID /bin/bash

# 1.更新 apt-get 
apt-get update
# 未更新直接安装会报错
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package postgresql-12-postgis-3
E: Unable to locate package postgresql-12-postgis-3-dbgsym
E: Unable to locate package postgresql-12-postgis-3-scripts

# 2.安装插件
apt-get install postgresql-12-pgvector

Database operations:

-- 添加 vector 扩展
CREATE EXTENSION vector;

-- 查询可使用的扩展
SELECT * FROM pg_available_extensions;

-- 创建表
CREATE TABLE "public"."test" ( 
  "field1" VARCHAR ( 64 ), 
  "field2" VARCHAR ( 128 ), 
  "field3" vector ( 1024 ), 
  CONSTRAINT "test_pkey" PRIMARY KEY ( "field1" ) 
);

When creating an index, follow the algorithm used:

-- 创建索引
CREATE INDEX ON test USING ivfflat ( field3);
CREATE INDEX ON test USING ivfflat ( field3 vector_ip_ops) WITH (lists = 50);
CREATE INDEX ON test USING ivfflat ( field3 vector_ip_ops) WITH (lists = 500);
CREATE INDEX ON test USING ivfflat ( field3 vector_ip_ops) WITH (lists = 1024);

Here is a method of writing SQL in a mapper file [query the top ten similarities]:

    <select id="queryId" resultType="map">
        SELECT
        field1,
        field2,
        field3 <![CDATA[ <#> ]]> CAST ( #{featrue}  AS vector ) AS "score"
        FROM test
        ORDER BY field1 <![CDATA[ <#> ]]> CAST ( #{featrue}  AS vector )
        LIMIT 10;
    </select>

Symbol Description:

  1. L2 Distance (<->): L2 distance, also known as Euclidean distance or Euclidean distance, is used to measure the straight-line distance between two vectors. The L2 distance is calculated by adding the squares of the differences between corresponding elements of two vectors and then taking the square root. A smaller L2 distance indicates that vectors are closer to each other.
  2. Inner Product (<#>): Inner product distance, also known as cosine distance or inner product similarity, is used to measure the cosine value of the angle between two vectors. The inner product distance is calculated as the dot product of two vectors divided by the product of the norms of the two vectors. The larger the inner product distance is, the smaller the angle between vectors is and the higher the similarity.
  3. Cosine Distance (<=>): Cosine distance, also known as the complement of cosine similarity. Cosine distance is a distance index that measures the angle between two vectors. The value ranges from 0 to 2, where 0 means completely similar and 2 means completely dissimilar. The cosine distance is calculated as the dot product of two vectors divided by the complement of the product of the norms of the two vectors.

The performance test is as follows:

PostgreSQL.vector.search cost [30]

3. Summary

Each has its own advantages: Milvus does not need to rebuild the index and the query speed is faster; Vector does not require special deployment and is easy to maintain.

Guess you like

Origin blog.csdn.net/weixin_39168541/article/details/131482197