我们从使用矢量数据库构建的开发人员那里听说,使用 GPT 将文档转换为不同的格式可以提高构建RAG 应用程序时矢量搜索的可靠性。
例如,将文档转换为问题和答案对,并对从这些对生成的基于向量的文档进行索引,直观上看起来对于格式化为问题的查询会产生更好的结果。
{
"questions_and_answers": [
{
"question": "Who is the email from?",
"answer": "The email is from [email protected]."
},
{
"question": "Who is the email to?",
"answer": "The email is to [email protected]."
},
{
"question": "What is the issue the back office is having?",
"answer": "The back office is having a hard time dealing with the $11 million dollars that is to be recognized as transport expense by the west desk then recouped from the Office of the Chairman."
},
...
}
我们很好奇这在实践和理论上是否属实,因此我们使用 LangChain 和 FAISS 创建了一个基本基准,以确定这些性能改进是否真实以及在什么条件下真实存在。
# 结果总结
与向量化原始电子邮件相比