Answer Quick Guide: Build your enterprise-level open source knowledge question answering system in less than 15 minutes

1. Write in front

As for why it is necessary to build an enterprise knowledge base, a knowledge question-and-answer retrieval system, and the disadvantages of the existing GPT model in enterprise applications, I will not repeat them here today. The previous articles that introduced other cases of building knowledge bases basically talked about However, if you are interested, you can read historical articles to find out.

Let's go directly to the topic today and introduce a good open source project , Danswer . Compared with other open source products, there are indeed certain advantages and reference points in terms of the design of Danswer, among which the most worth mentioning is Also belongs to Connectors connector. At present, the official has provided a built-in connector in 12. The connector can easily add any documents you need to add knowledge retrieval and question answering to the vector database for indexing.

2. Basic introduction of Answer

Answer allows you to ask natural language questions about internal documents and get solid answers backed up by citations and references in the source material so you can always trust the results you get. You can connect to many popular tools such as Slack, GitHub, Confluence, and more.

https://sundun-rdcenter.feishu.cn/space/api/box/stream/download/asynccode/?code=NWIyMDlhZjVjNTI0NzJlNGYxMTFjZDUzYmUyMGZiZDRfcnZ4c3RCM0hVUWVxSXpNSmdOVVhkdXZQWlhpWFdVNWtfVG9rZW46SGJyc2I5ckJpb09Lc1p4cGpRMWNHZFNwbmVoXzE2OTIzNDgzOTg6MTY5MjM1MTk5OF9WNA

2.1. What are the advantages of Answer?

  • It is completely open source ( MIT license ) and free to use, fully deployable locally, with data privacy and security.

  • Allows you to plug and play different LLM models such as GPT, HuggingFace, GPT4All, Llama cpp, and even custom self-hosted models .

  • Comes with out-of-the-box key features like document access control, front-end UI, admin dashboard, polling for document updates, and flexible deployment options.

  • Nice list of connectors to Slack, GitHub, GoogleDrive and 12 other tools.

  • Ask directly, get your questions answered without opening any documents, answers powered by GPT-4 (or your self-hosted model of choice), with citations and references attached to each answer, so you can always trust that you're getting answer.

  • Answer uses a state-of-the-art large language model to find the most relevant documents.

  • Custom deep learning models optimize searches based on user intent. Smart text chunking even prevents fine print from getting lost.

By default, Danswer uses OpenAI's GPT series of models, and you can also use open source models in Danswer as needed, such as the recently popular Llama 2 model.

3. Implementation principle of Answer

Danswer is a combination of open source NLP and the latest technology for generating artificial intelligence. The process and principle of Danswer from executing user query requests to feedback results are as follows:

1) When a user submits a query, it is first processed through the user intent model, which determines whether to use keyword search or semantic search. The model was fine-tuned using DistilBert checkpoints, using a dataset generated from GPT hints, and manually filtering out bad examples.

2) The semantic search process is divided into two steps: retrieval and reordering.

  • Retrieval is done by embedding the text using a dual encoder model. Vectors are stored in the Qdrant vector database and then fetched at query time by projecting user queries in the same way. Some of the operations performed during retrieval include: context-aware chunking of documents, including overlap across chunks, and embedding chunks multiple times at different scales. Use 512 markers at a time for greater context, then 128 markers for finer detail.

  • For reranking, a different set of models was used as an ensemble. The best reranking results can be obtained by combining multiple models that perform better on different datasets, even with smaller models, making the step overall faster.

3) Finally, the most relevant parts of the document are passed to the generative model, which is prompted to also provide citations for its answers. Citations are then matched to source documents and presented to the user along with the answer.

4. Rapid Deployment of Answer

Danswer provides Docker containers that can be easily deployed on any cloud, either on a single instance or via Kubernetes. In this demo, we will use Docker Compose to run Danswer locally.

  • First pull the code:
git clone https://github.com/danswer-ai/danswer.git
  • Next navigate to the deployment directory:
cd danswer/deployment/docker_compose
  • [Optional] The default Danswer model is GPT-3.5-Turbo, so you don't need to change anything if you want to use it.

If you want to use an open source model API like Llama 2, you can override some of the defaults by creating a .env file to configure Answer to use the new Llama 2 endpoint:

INTERNAL_MODEL_VERSION=request-completion  # 将Danswer设置为使用requests而不是客户端库与LLM进行接口交互
GEN_AI_HOST_TYPE=colab-demo  # 将请求的头/主体设置为模型端点所期望的格式
GEN_AI_ENDPOINT=<REPLACE-WITH-YOUR-NGROK-PUBLIC-URL>/generate  # 设置模型端点URL
  • Start Answer (using docker version ≥= 1.13.0):
docker compose -f docker-compose.dev.yml -p danswer-stack up -d --pull always --force-recreate
  • Once complete, Danswer will be available on http://localhost:3000.

For more documentation on Danswer, visit Quickstart - Danswer Documentation

5. Danswer application practice

5.1. Add documents to Answer

Danswer provides 12 types of Connectors to connect Danswer to your data source, so that the answer can be answered based on the knowledge of your organization. If all the built-in connectors do not meet your requirements, you can also provide official requirements, or yourself Implemented based on open source code. Connecting different data sources to Danswer is done through the administration page, click the [Admin Panel] menu in the upper right corner to set it up.

https://sundun-rdcenter.feishu.cn/space/api/box/stream/download/asynccode/?code=OTVhNDQzMTQyNDYwMzhlNDkyMjQwYmUzY2Y0NTNiYzRfYlc0cXRsWkQ4UWJTZGRWYVpNZ1JOT2dSb2kyQVlPRWVfVG9rZW46V04zdmJka0Y4b0FlcnJ4RjRYSWNBc3Z3bnpoXzE2OTIzNDgzOTg6MTY5MjM1MTk5OF9WNA

For the specific configuration and use of each Connectors connector, you can find a detailed introduction in the official website documentation.

Here we add a web connector data source to demonstrate. We only need to add a publicly accessible URL to the web connector, and Danswer will crawl the content of the document and store it in the vector database.

https://sundun-rdcenter.feishu.cn/space/api/box/stream/download/asynccode/?code=ZWQxZWZjNjIwZWM5YTU2ZWZlYzczZmQwMGVmNTNjNTNfNTFoQWJabFk0dkV4M3pWMGVlQ0hCT0ZMbDNScm9VRklfVG9rZW46RkRuc2JlOW0xb2RjU3p4TVRtWmM4bDI2bkVnXzE2OTIzNDgzOTg6MTY5MjM1MTk5OF9WNA

5.2. Get answers from Answer

After adding the data source, we can click the Answer icon to return to the home page, and now we can directly enter your document-related questions to index.

https://sundun-rdcenter.feishu.cn/space/api/box/stream/download/asynccode/?code=NzIzY2E0OTBkOTFjZDM1YWFhM2Q2Y2MwY2E3MGM3YzVfcHZzamRlUXZQWkg1MGlGS0FwdzNKUEFaa2VGNE9GYktfVG9rZW46RWNSU2JCTWdIb1FKWUl4SmhiMGNNQTZVbnhIXzE2OTIzNDgzOTg6MTY5MjM1MTk5OF9WNA

6. Conclusion

Danswer is an excellent open source enterprise knowledge question answering system. Through the use of connectors, various documents can be easily added to the vector database for indexing. It allows users to ask natural language questions about internal documents and get reliable, informative answers. The advantages of Danswer include completely open source, free to use, data privacy security, and support for plug-and-play of different LLM models. It can also seamlessly connect with many common tools (such as Slack, GitHub, Confluence, etc.) to provide enterprises with efficient and reliable knowledge retrieval and question answering solutions.

7. References

  • Danswer GitHub:

  • https://github.com/danswer-ai/danswer

  • Danswer Docs:

  • Overview - Danswer Documentation

  • Dnaswer Connectors:

  • https://github.com/danswer-ai/danswer/blob/main/backend/danswer/connectors/README.md

Guess you like

Origin blog.csdn.net/FrenzyTechAI/article/details/132365199