手把手系列！使用 Zilliz Cloud 和 AWS Bedrock 搭建 RAG 应用

检索增强生成（Retrieval Augemented Generation, RAG）是一种 AI 框架，它通过结合信息检索和自然语言处理（NLP）能力从而增强文本生成。具体而言，RAG 系统中的语言模型通过一种检索机制查询和搜索知识库或外部数据库，将最新信息结合到生成的响应中，从而使得最终输出结果更准确、包含更多上下文。

Zilliz Cloud（https://zilliz.com.cn/cloud）基于 Milvus（https://milvus.io/）向量数据库构建，提供存储和处理大规模向量化数据的解决方案，可用于高效管理、分析和检索数据。开发人员可以利用 Zilliz Cloud 的向量数据库功能来存储和搜索海量 Embedding 向量，进一步增强 RAG 应用中的检索模块能力。

AWS Bedrock 云服务（https://aws.amazon.com/cn/bedrock/），提供多种预训练基础模型，可用于部署和扩展 NLP 解决方案。开发人员可以通过 AWS Bedrock 将语言生成、理解和翻译的模型集成到 AI 应用中。此外，AWS Bedrock 还可以针对文本生成相关和具备丰富上下文的响应，从而进一步增加 RAG 应用的能力。

01.使用 Zilliz Cloud 和 AWS Bedrock 搭建 RAG 应用

我们将通过以下示例代码（https://colab.research.google.com/github/milvus-io/bootcamp/blob/master/bootcamp/RAG/bedrock_langchain_zilliz_rag.ipynb#scrollTo=fHn0m6Y0ytIP），演示如何使用 Zilliz Cloud 与 AWS Bedrock 搭建 RAG 应用。基本流程如图 1 所示：

图1. 使用 Zilliz Cloud 与 AWS Bedrock 搭建 RAG 应用的基本流程

#download the packages then import them
! pip install --upgrade --quiet  langchain langchain-core langchain-text-splitters langchain-community langchain-aws bs4 boto3

# For example
import bs4
import boto3

连接至 AWS Bedrock 和 Zilliz Cloud

接着，设置连接 AWS 和 Zilliz Cloud 服务所需的环境变量。您需要提供 AWS 服务地域、访问密钥和 Zilliz Cloud 的 Endpoint URI 和 API 密钥以连接至 AWS Bedrock 和 Zilliz Cloud 服务。

# Set the AWS region and access key environment variables
REGION_NAME = "us-east-1"
AWS_ACCESS_KEY_ID = os.getenv("AWS_ACCESS_KEY_ID")
AWS_SECRET_ACCESS_KEY = os.getenv("AWS_SECRET_ACCESS_KEY")

# Set ZILLIZ cloud environment variables
ZILLIZ_CLOUD_URI = os.getenv("ZILLIZ_CLOUD_URI")
ZILLIZ_CLOUD_API_KEY = os.getenv("ZILLIZ_CLOUD_API_KEY")

通过上述提供的访问凭证，我们创建了一个 boto3 客户端（https://boto3.amazonaws.com/v1/documentation/api/latest/index.html），用于连接到 AWS Bedrock Runtime 服务并集成 AWS Bedrock 中的语言模型。接着，我们初始化一个 ChatBedrock 实例（https://python.langchain.com/v0.1/docs/integrations/chat/bedrock/），连接到客户端，并指定使用的语言模型。本教程中我们使用 anthropic.claude-3-sonnet-20240229-v1:0的模型。这一步帮助我们设置了生成文本响应的基础设施，同时还配置了模型的 temperature 参数，从而控制生成响应的多样性。BedrockEmbeddings 实例可用于将文本等非结构化数据（https://zilliz.com.cn/glossary/%E9%9D%9E%E7%BB%93%E6%9E%84%E5%8C%96%E6%95%B0%E6%8D%AE）转化为向量。

# Create a boto3 client with the specified credentials
client = boto3.client(
    "bedrock-runtime",
    region_name=REGION_NAME,
    aws_access_key_id=AWS_ACCESS_KEY_ID,
    aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
)

# Initialize the ChatBedrock instance for language model operations
llm = ChatBedrock(
    client=client,
    model_id="anthropic.claude-3-sonnet-20240229-v1:0",
    region_name=REGION_NAME,
    model_kwargs={"temperature": 0.1},
)

# Initialize the BedrockEmbeddings instance for handling text embeddings
embeddings = BedrockEmbeddings(client=client, region_name=REGION_NAME)

收集和处理信息

Embedding 模型初始化成功后，下一步是从外部来源加载数据。创建WebBaseLoader 实例（https://python.langchain.com/v0.1/docs/integrations/document_loaders/web_base/）以从指定的网络来源抓取内容。

本教程中，我们将从 AI agent 相关文章中加载内容。加载器使用 BeautifulSoup（https://www.crummy.com/software/BeautifulSoup/bs4/doc/）的 SoupStrainer 解析网页的特定部分——即带有 "post-content"、"post-title" 和 "post-header" 的部分，从而确保只检索相关内容。然后，加载器从指定的网络来源检索文档，提供了一系列的相关内容以便后续处理。接着，我们使用 RecursiveCharacterTextSplitter 实例（https://python.langchain.com/v0.1/docs/modules/data_connection/document_transformers/recursive_text_splitter/）将检索到的文档分割成更小的文本块。这样可以使内容变得更方便管理，也可以将这些文本块传入到其他组件中，例如文本 Embedding 或语言生成模块。

# Create a WebBaseLoader instance to load documents from web sources
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
# Load documents from web sources using the loader
documents = loader.load()
# Initialize a RecursiveCharacterTextSplitter for splitting text into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=200)


# Split the documents into chunks using the text_splitter
docs = text_splitter.split_documents(documents)

生成响应

prompt 模板预先定义了每个响应的结构，可以引导 AI 尽可能使用统计信息和数字，并在缺乏相关知识时避免编造答案。

# Define the prompt template for generating AI responses
PROMPT_TEMPLATE = """
Human: You are a financial advisor AI system, and provides answers to questions by using fact based and statistical information when possible.
Use the following pieces of information to provide a concise answer to the question enclosed in <question> tags.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
<context>
{context}
</context>

<question>
{question}
</question>

The response should be specific and use statistics or numbers when possible.

Assistant:"""
# Create a PromptTemplate instance with the defined template and input variables
prompt = PromptTemplate(
    template=PROMPT_TEMPLATE, input_variables=["context", "question"]
)

初始化 Zilliz vector store，并连接到 Zilliz Cloud 平台。vector store 负责将文档转化成向量，以便后续快速高效地检索文档。然后检索到的文档经过格式化组织称成连贯的文本，AI 将相关信息整合到响应中，最终提供高度准确度和相关的答案。

# Initialize Zilliz vector store from the loaded documents and embeddings
vectorstore = Zilliz.from_documents(
    documents=docs,
    embedding=embeddings,
    connection_args={
        "uri": ZILLIZ_CLOUD_URI,
        "token": ZILLIZ_CLOUD_API_KEY,
        "secure": True,
    },
    auto_id=True,
    drop_old=True,
)

# Create a retriever for document retrieval and generation
retriever = vectorstore.as_retriever()

# Define a function to format the retrieved documents
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

最后，我们创建一个完整的 RAG 链路用于生成 AI 响应。这个链路首先从 vector store 中检索与用户查询相关的文档，通过检索和格式化，然后将它们传递给 prompt template（https://python.langchain.com/v0.1/docs/modules/model_io/prompts/）生成响应结构。然后将这种结构化输入传入语言模型，生成连贯的回应，最终回应被解析成字符串格式并呈现给用户，提供准确、富含上下文的答案。

# Define the RAG (Retrieval-Augmented Generation) chain for AI response generation
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# rag_chain.get_graph().print_ascii()

# Invoke the RAG chain with a specific question and retrieve the response
res = rag_chain.invoke("What is self-reflection of an AI Agent?")
print(res)

以下为示例响应结果：

Self-reflection is a vital capability that allows autonomous AI agents to improve iteratively by analyzing and refining their past actions, decisions, and mistakes. Some key aspects of self-reflection for AI agents include:
1. Evaluating the efficiency and effectiveness of past reasoning trajectories and action sequences to identify potential issues like inefficient planning or hallucinations (generating consecutive identical actions without progress).
2. Synthesizing observations and memories from past experiences into higher-level inferences or summaries to guide future behavior.

02.使用 Zilliz Cloud 和 AWS Bedrock 的优势

如表 1 所示，Zilliz Cloud 可与 AWS Bedrock 无缝集成，增强 RAG 应用的效率、可扩展性和准确性。开发人员可以利用这两个服务开发全面的解决方案，处理海量数据集、简化 RAG 应用流程、提升 RAG 生成响应的准确性。

表1. 使用 Zilliz Cloud 和 AWS Bedrock 的优势

03.总结

本文主要介绍了如何使用 Zilliz Cloud 和 AWS Bedrock 构建 RAG 应用。

基于 Milvus 构建的向量数据库 Zilliz Cloud 可为 Embedding 向量提供可扩展的存储和检索解决方案，而 AWS Bedrock 则提供了强大的预训练模型用于语言生成。我们通过示例代码展示了如何连接至 Zilliz Cloud 和 AWS Bedrock、从外部源加载数据、处理和拆分数据，并最终构建一个完整的 RAG 链路。本文搭建的 RAG 应用能够最大限度地减少 LLM 产生幻觉和提供不准确响应的概率，充分发挥了现代 NLP 模型和向量数据库之间的协同效果。我们希望通过本教程抛砖引玉，帮助大家在搭建 RAG 应用中采用类似的技术。