Comprehensively improve the quality of RAG! Zilliz joins hands with Zhiyuan to integrate various BGE open source models such as Sparse Embedding and Reranker

The open source China community team made its first live broadcast, telling the story behind the open source China community in the name of sharing."

Zilliz continues to empower AI application developers!

Recently, Zilliz has reached a cooperation with Zhiyuan Research Institute to integrate various BGE ( BAAI General Embedding ) open source models with the open source vector database Milvus.Thanks to the newly launched Sparse Vector (sparse vector) and Multi-vector (multi-vector) support in Milvus 2.4, developers have a variety of choices, not only the Dense Em bedding (dense vector model) widely used in the industry, but also the use of BGE's latest Sparse Embedding (sparse retrieval model) and Reranker (reranking) models. Developers can easily combine these tools to build a more powerful recall solution, covering semantic retrieval, full-text retrieval, and refined ranking capabilities.

The integration of BGE and Milvus comprehensively improves the quality of RAG while maintaining flexibility and can better serve AI application developers.

01.

Sparse Embedding and Reranker: New trends to improve RAG

RAG (Retrieval Augmented Generation) is a technology that uses information retrieved from external knowledge bases to enhance the accuracy and reliability of large language models. RAG has been proven to be able to effectively solve a series of core problems that hinder the application of large models (such as hallucinations, poor timeliness, insufficient professional domain knowledge, data security issues, etc.) . Embedding models and vector databases are the key to realizing this solution. Better models and vector databases with richer functions can effectively improve the quality of RAG responses and help large language models achieve a better terminal Q&A experience.

However, due to the limitations of the basic Dense Embedding and vector recall schemes, the final effect of RAG is not satisfactory in some scenarios . At present, the industry tends to adopt two solutions to improve the quality of RAG's Q&A:

Option one is to use Sparse Vector and Dense Vector to perform two-way recall. Among them, Sparse Vector can cover the capabilities of traditional full-text search and help identify and capture specific keywords, while Dense Vector can more effectively capture the overall semantic information in the text. By integrating the results of these two vector recalls, richer and more comprehensive information can be obtained, thereby improving the effectiveness of RAG.

Option two , use Cross-Encoder Reranker as the second layer of fine ranking. First, use Dense Vector, Sparse Vector, or a combination of the two for rough ranking; then use the Reranker model to further filter and sort the results of the first stage to improve the quality of the final results.

It is worth noting that option one can be used independently, that is, using a rule-based algorithm to merge the results of two-way recall, such as the commonly used RRF Reciprocal Rank Fusion (RRF), or it can be used in combination with option two, that is, using Cross-Encoder Reranker Merge and reorder the two-way results.

02.

Milvus joins hands with BGE to flexibly solve RAG's quality problems

Milvus is an open source vector database for AI applications, serving vector search and unstructured data management. It was originally launched by the Zilliz company and open sourced in 2019. Since its launch, Milvus has become popular and widely adopted among the AI developer community and enterprise users. With more than 26,000 stars and more than 260 contributors on GitHub , Milvus has been downloaded and installed more than 20 million times worldwide , becoming a One of the most widely used vector databases in the world.

Not long ago, Zilliz founder and CEO Xingjue officially released Milvus 2.4 version at the NVIDIA GTC conference, which can be called a revolutionary upgrade in the industry. In addition to the hotly discussed GPU-based vector indexing and search acceleration capabilities, support for Sparse Vector and Multi-Vector is also a highlight. The integration of Milvus and Zhiyuan BGE not only gathers the Sparse Embedding, multi-channel recall, and Reranker capabilities required to improve RAG quality at one time, but also provides developers with multiple types and multi-level recall solutions to help developers according to actual needs. Build AI applications flexibly.

The BGE jointly developed with Milvus this time is a universal semantic model created by Intelligent Source Research Institute. Since its first release in August 2023, the Zhiyuan team has successively launched Chinese and English models BGE v1.0, v1.5, and the BGE-M3 model that supports more than 100 languages and multiple recall methods. Up to now, the BGE series models have been downloaded more than 15 million times globally, ranking first among domestic open source AI models. The BGE-M3 model once ranked among the top three popular Hugging Face models. In the latest Milvus 2.4 client, a simple and easy-to-use function encapsulation is provided for the BGE model. Developers can more conveniently use a variety of BGE open source models and cooperate with the Milvus vector database to build multi-channel and multi-level recall solutions to comprehensively improve RAG quality. BGE open source models that have been integrated so far include:

Embedding model

BAAI/bge-m3
BAAI/bge-large-en-v1.5
BAI/bge-base-en-v1.5
BAY/bge-small-en-v1.5
BAAI/bge-large-zh-v1.5
BAAI/bge-base-zh-v1.5
BAAI/bge-small-zh-v1.5

Reranker model

BAAI/bge-reranker-v2-m3
BAAI/bge-reanchor-large
BAAI/bge-reranker-base

03.

Code sample display

Starting from version 2.4, the Milvus Python client has added new pymilvus[model]components. Whether it is Sparse Vector combined with Dense Vector's multi-way recall, or using Cross-Encoder Reranker to improve the relevance of the first-level recall, the Model components provided by Milvus are flexible ground support.

For example, pip install pymilvus[model]you can easily use the Dense Vector generated by the BGE-M3 model to perform vector nearest neighbor search, and use the BGE Reranker model to refine the results:

from pymilvus.model.hybrid import BGEM3EmbeddingFunction
from pymilvus.model.reranker import BGERerankFunction

client = MilvusClient(uri="http://localhost:19530")
embed_fn = BGEM3EmbeddingFunction(device='cuda:0')
rerank_fn = BGERerankFunction(device="cuda:0")
query = "tell me information about France."
query_vector = [embed_fn([query])["dense"][0]]
# Search for top 20 nearest neighbour vectors
retrieved_results = client.search(collection_name="my_collection", data=query_vector, limit=20, output_fields=["text"])
# Rerank the vector search results and select the top 5
final_results = rerank_fn(query, [result['entity']['text'] for result in retrieved_results[0]], top_k=5)

In addition, more cases can be viewed at the link (https://github.com/milvus-io/pymilvus/blob/master/examples/hello_hybrid_sparse_dense.py).

04.

未来展望

Zilliz 作为向量数据库行业的引领者，此次与业界领先的智源研究院合作，支持基于开源 BGE 模型和 Milvus 向量数据库的多种召回方案，集成了两者对 Sparse Embedding 和 Reranker 的支持，极大地便利了 RAG 开发者。

Zilliz 生态集成和 AI 平台负责人陈将表示：“未来，Milvus 将持续与智源在模型研究、开发者推广等方面深度合作，助力 AI 应用的进一步普及和提升。”

智源研究院 BGE 团队负责人表示：“ 智源研究院 BGE 负责人表示：“ BGE 和 Milvus 的集成，为社区用户快速搭建“三位一体的”（稠密检索、稀疏检索、重排序）检索流水线带来的极大便利。未来期待与像 Zilliz 这样行业优秀的企业进一步合作，共同为 AI 应用开发者赋能。”

推荐阅读

本文分享自微信公众号 - ZILLIZ（Zilliztech）。
如有侵权，请联系 [email protected] 删除。
本文参与“OSC源创计划”，欢迎正在阅读的你也加入，一起分享。

Comprehensively improve the quality of RAG! Zilliz joins hands with Zhiyuan to integrate various BGE open source models such as Sparse Embedding and Reranker

Guess you like