Comprehensively improve the quality of RAG! Zilliz joins hands with Zhiyuan to integrate Sparse Embedding and Reranke

Zilliz continues to empower AI application developers!

Recently, Zilliz has reached a cooperation with Zhiyuan Research Institute to integrate various BGE (BAAI General Embedding) open source models with the open source vector database Milvus. Thanks to the newly launched Sparse Vector (sparse vector) and Multi-vector (multi-vector) support in Milvus 2.4, developers have a variety of options, including not only the Dense Embedding (dense vector model) widely used in the industry, but also the use of BGE The newly launched Sparse Embedding (sparse retrieval model) and Reranker (reordering) models. Developers can easily combine these tools to build a more powerful recall solution, covering semantic retrieval, full-text retrieval, and refined ranking capabilities.

The integration of BGE and Milvus comprehensively improves the quality of RAG while maintaining flexibility and can better serve AI application developers.

01.Sparse Embedding and Reranker: New trends in improving RAG

RAG (Retrieval Augmented Generation) is a technology that uses information retrieved from external knowledge bases to enhance the accuracy and reliability of large language models. RAG has been proven to be able to effectively solve a series of core problems that hinder the application of large models (such as hallucinations, poor timeliness, insufficient professional domain knowledge, data security issues, etc.). Embedding models and vector databases are the key to realizing this solution. Better models and vector databases with richer functions can effectively improve the quality of RAG responses and help large language models achieve a better terminal Q&A experience.

However, due to the limitations of the basic Dense Embedding and vector recall schemes, the final effect of RAG is not satisfactory in some scenarios. At present, the industry tends to adopt two solutions to improve the quality of RAG's Q&A:

Option one is to use Sparse Vector and Dense Vector to perform two-way recall. Among them, Sparse Vector can cover the capabilities of traditional full-text search and help identify and capture specific keywords, while Dense Vector can more effectively capture the overall semantic information in the text. By integrating the results of these two vector recalls, richer and more comprehensive information can be obtained, thereby improving the effectiveness of RAG.

Option two, use Cross-Encoder Reranker as the second layer of fine ranking. First, use Dense Vector, Sparse Vector, or a combination of the two for rough ranking; then use the Reranker model to further filter and sort the results of the first stage to improve the quality of the final results.

It is worth noting that option one can be used independently, that is, using a rule-based algorithm to merge the results of two-way recall, such as the commonly used RRF Reciprocal Rank Fusion (RRF), or it can be used in combination with option two, that is, using Cross-Encoder Reranker Merge and reorder the two-way results.

02.Milvus joins hands with BGE: Flexibly solving RAG’s quality problems

Milvus is an open source vector database for AI applications, serving vector search and unstructured data management. It was originally launched by the Zilliz company and open sourced in 2019. Since its launch, Milvus has become popular and widely adopted among the AI ​​developer community and enterprise users. With more than 26,000 stars and more than 260 contributors on GitHub, Milvus has been downloaded and installed more than 20 million times worldwide, becoming a One of the most widely used vector databases in the world.

Not long ago, Zilliz founder and CEO Xingjue officially released Milvus 2.4 version at the NVIDIA GTC conference, which can be called a revolutionary upgrade in the industry. In addition to the hotly discussed GPU-based vector indexing and search acceleration capabilities, support for Sparse Vector and Multi-Vector is also a highlight. The integration of Milvus and Zhiyuan BGE not only gathers the Sparse Embedding, multi-channel recall, and Reranker capabilities required to improve RAG quality at one time, but also provides developers with multiple types and multi-level recall solutions to help developers according to actual needs. Build AI applications flexibly.

The BGE jointly developed with Milvus this time is a universal semantic model created by Intelligent Source Research Institute. Since its first release in August 2023, the Zhiyuan team has successively launched Chinese and English models BGE v1.0, v1.5, and the BGE-M3 model that supports more than 100 languages ​​and multiple recall methods. Up to now, the BGE series models have been downloaded more than 15 million times globally, ranking first among domestic open source AI models. The BGE-M3 model once ranked among the top three popular Hugging Face models. In the latest Milvus 2.4 client, a simple and easy-to-use function encapsulation is provided for the BGE model. Developers can more easily use a variety of BGE open source models and cooperate with the Milvus vector database to build multi-channel and multi-level recall solutions to comprehensively improve RAG quality. BGE open source models that have been integrated so far include:

Embedding model

  • BAAI/bge-m3

  • BAAI/bge-large-en-v1.5

  • BAI/bge-base-en-v1.5

  • BAY/bge-small-en-v1.5

  • BAAI/bge-large-zh-v1.5

  • BAAI/bge-base-zh-v1.5

  • BAAI/bge-small-zh-v1.5

Reranker model

  • BAAI/bge-reranker-v2-m3

  • BAAI/bge-reanchor-large

  • BAAI/bge-reranker-base

03.Code sample display

Starting from version 2.4, the Milvus Python client has added the pymilvus[model] component. Whether it is Sparse Vector combined with Dense Vector's multi-way recall, or using Cross-Encoder Reranker to improve the relevance of the first layer of recall, the Model provided by Milvus Components can all be supported flexibly.

For example, just pip install pymilvus[model], you can easily use the Dense Vector generated by the BGE-M3 model to perform vector nearest neighbor search, and use the BGE Reranker model to refine the results:

from pymilvus.model.hybrid import BGEM3EmbeddingFunction
from pymilvus.model.reranker import BGERerankFunction

client = MilvusClient(uri="http://localhost:19530")
embed_fn = BGEM3EmbeddingFunction(device='cuda:0')
rerank_fn = BGERerankFunction(device="cuda:0")
query = "tell me information about France."
query_vector = [embed_fn([query])["dense"][0]]
# Search for top 20 nearest neighbour vectors
retrieved_results = client.search(collection_name="my_collection", data=query_vector, limit=20, output_fields=["text"])
# Rerank the vector search results and select the top 5
final_results = rerank_fn(query, [result['entity']['text'] for result in retrieved_results[0]], top_k=5)

In addition, more cases can be viewed at the link ( https://github.com/milvus-io/pymilvus/blob/master/examples/hello_hybrid_sparse_dense.py ).

04.Future Outlook

As a leader in the vector database industry, Zilliz has cooperated with the industry-leading Zhiyuan Research Institute to support a variety of recall solutions based on the open source BGE model and Milvus vector database, integrating the support of both Sparse Embedding and Reranker, which greatly Convenient for RAG developers.

Chen Jiang, head of Zilliz ecological integration and AI platform, said: "In the future, Milvus will continue to cooperate in depth with Zhiyuan in model research, developer promotion and other aspects to help further popularize and improve AI applications."

The head of the BGE team at Zhiyuan Research Institute said: “The integration of BGE and Milvus brings community users the ability to quickly build a “trinity” (dense retrieval, sparse retrieval, and reordering) retrieval pipeline. great convenience. In the future, we look forward to further cooperation with industry-leading companies like Zilliz to jointly empower AI application developers. "

Read the original article


  • Good news, the Milvus community is searching for the " Beichen Messenger " across the entire network! ! !​
  • If you have any problems using Milvus or Zilliz products, you can add the assistant WeChat "zilliz-tech" to join the communication group.​
  • Welcome to follow the WeChat public account "Zilliz" to learn the latest information.
I decided to give up on open source industrial software. Major events - OGG 1.0 was released, Huawei contributed all source code. Ubuntu 24.04 LTS was officially released. Google Python Foundation team was laid off. Google Reader was killed by the "code shit mountain". Fedora Linux 40 was officially released. A well-known game company released New regulations: Employees’ wedding gifts must not exceed 100,000 yuan. China Unicom releases the world’s first Llama3 8B Chinese version of the open source model. Pinduoduo is sentenced to compensate 5 million yuan for unfair competition. Domestic cloud input method - only Huawei has no cloud data upload security issues
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4209276/blog/11063572