DB-GPT: Using privatized LLM technology to define the next generation of database interaction

What is DB-GPT?

With the release and iteration of the large model, the large model becomes more and more intelligent. In the process of using the large model, great data security and privacy challenges are encountered. In the process of using large model capabilities, our private data and environment need to be in our own hands, fully controllable, and avoid any data privacy leakage and security risks. Based on this, we launched the DB-GPT project to build a complete set of private large-scale model solutions for all database-based scenarios . Because this solution supports local deployment, it can not only be applied to independent private environments, but also can be deployed and isolated independently according to business modules, making the capabilities of large models absolutely private, safe, and controllable. Our vision is to make it easier and more convenient to build large model applications around databases.

DB-GPT is an open source database-based GPT experimental project, using localized GPT large models to interact with your data and environment, no risk of data leakage, 100% private

Star History Chart

DB-GPT video introduction

Effect demo

Examples demoed with RTX 4090 GPU

 dbgpt_demo.mp4 

Generate analysis charts based on natural language conversations

Generate SQL based on natural language dialogue

Dialogue with database metadata information to generate accurate SQL statements

Dialogue with data, directly view execution results

knowledge base management

Dialogue according to the knowledge base, such as pdf, csv, txt, words, etc.

Features at a Glance

At present, we have released a variety of key features. Here we list and demonstrate the capabilities of the current release.

  • SQL language capabilities

    • SQL generation
    • SQL diagnostics
  • Private Domain Q&A and Data Processing

    • Knowledge base management (currently supports txt, pdf, md, html, doc, ppt, and url)
    • Database knowledge quiz
    • data processing
  • plug-in model

    • Supports custom plug-ins to perform tasks, and natively supports Auto-GPT plug-ins. like:
      • Automatically execute SQL to obtain query results
      • Automatic crawling learning knowledge
  • Knowledge Base Unified Vector Storage/Index

    • Unstructured data support including PDF, MarkDown, CSV, WebURL
  • Multiple model support

    • Supports multiple large language models, currently supports Vicuna(7b,13b), ChatGLM-6b(int4, int8), guanaco(7b,13b,33b), Gorilla(7b,13b)
    • TODO: codet5p, codegen2

Architecture plan

DB-GPT builds a large model operating environment based on  FastChat  , and provides vicuna as a large language model based on it. In addition, we provide private domain knowledge base question answering capabilities through LangChain. At the same time, we support plug-in mode, and natively support Auto-GPT plug-in in design. Our vision is to make it easier and more convenient to build applications around databases and LLMs.

The architecture of the entire DB-GPT is shown in the figure below

The core competencies mainly include the following parts.

  1. Knowledge base capability: Support private domain knowledge base question answering ability
  2. Large model management capability: Provide a large model operating environment based on FastChat .
  3. Unified data vectorized storage and indexing: Provides a unified way to store and index various data types.
  4. Connection module: used to connect different modules and data sources to realize data flow and interaction.
  5. Agent and plug-in: Provide Agent and plug-in mechanism, so that users can customize and enhance the behavior of the system.
  6. Prompt automatic generation and optimization: Automatically generate high-quality prompts and optimize them to improve the response efficiency of the system.
  7. Multi-terminal product interface: supports a variety of different client products, such as Web, mobile applications, and desktop applications.

Install

quick start

Multilingual switching

In the .env configuration file, modify the LANGUAGE parameter to switch between different languages, the default is English (Chinese zh, English en, other languages ​​to be added)

Instructions for use

Multi-model use

user's guidance

If you encounter nltk-related errors when using the repository, you need to install the nltk toolkit. For more details, see: nltk documentation  Run the Python interpreter and type the commands:

>>> import nltk
>>> nltk.download()

We provide a brand-new user interface through which you can use DB-GPT. At the same time, we have prepared the following reference articles about some codes and principles related to our project.

  1. Large Model Combat Series (1) —— Combat with Langchain-Vicuna Application
  2. Large Model Combat Series (2) —— DB-GPT Alibaba Cloud Deployment Guide
  3. Large Model Actual Combat Series (3) —— Principle and Application of DB-GPT Plug-in Model

grateful

The achievements of the project need to thank the technical community, especially the following projects.

contribute

  • Please execute before submitting the code black .

This is a complex and innovative tool for databases. Our project is also under urgent development, and some new features will be released one after another. If you have any specific questions during use, please submit an issue under the project first. If necessary, please contact the following WeChat, and I will try my best to help. At the same time, everyone is very welcome to participate in the project construction.

route map

おすすめ

転載: blog.csdn.net/sinat_37574187/article/details/131735236