What is DB-GPT?
With the release and iteration of the large model, the large model becomes more and more intelligent. In the process of using the large model, great data security and privacy challenges are encountered. In the process of using large model capabilities, our private data and environment need to be in our own hands, fully controllable, and avoid any data privacy leakage and security risks. Based on this, we launched the DB-GPT project to build a complete set of private large-scale model solutions for all database-based scenarios . Because this solution supports local deployment, it can not only be applied to independent private environments, but also can be deployed and isolated independently according to business modules, making the capabilities of large models absolutely private, safe, and controllable. Our vision is to make it easier and more convenient to build large model applications around databases.
DB-GPT is an open source database-based GPT experimental project, using localized GPT large models to interact with your data and environment, no risk of data leakage, 100% private
Effect demo
Examples demoed with RTX 4090 GPU
dbgpt_demo.mp4
Generate analysis charts based on natural language conversations
Generate SQL based on natural language dialogue
Dialogue with database metadata information to generate accurate SQL statements
Dialogue with data, directly view execution results
knowledge base management
Dialogue according to the knowledge base, such as pdf, csv, txt, words, etc.
Features at a Glance
At present, we have released a variety of key features. Here we list and demonstrate the capabilities of the current release.
-
SQL language capabilities
- SQL generation
- SQL diagnostics
-
Private Domain Q&A and Data Processing
- Knowledge base management (currently supports txt, pdf, md, html, doc, ppt, and url)
- Database knowledge quiz
- data processing
-
plug-in model
- Supports custom plug-ins to perform tasks, and natively supports Auto-GPT plug-ins. like:
- Automatically execute SQL to obtain query results
- Automatic crawling learning knowledge
- Supports custom plug-ins to perform tasks, and natively supports Auto-GPT plug-ins. like:
-
Knowledge Base Unified Vector Storage/Index
- Unstructured data support including PDF, MarkDown, CSV, WebURL
-
Multiple model support
- Supports multiple large language models, currently supports Vicuna(7b,13b), ChatGLM-6b(int4, int8), guanaco(7b,13b,33b), Gorilla(7b,13b)
- TODO: codet5p, codegen2
Architecture plan
DB-GPT builds a large model operating environment based on FastChat , and provides vicuna as a large language model based on it. In addition, we provide private domain knowledge base question answering capabilities through LangChain. At the same time, we support plug-in mode, and natively support Auto-GPT plug-in in design. Our vision is to make it easier and more convenient to build applications around databases and LLMs.
The architecture of the entire DB-GPT is shown in the figure below
The core competencies mainly include the following parts.
- Knowledge base capability: Support private domain knowledge base question answering ability
- Large model management capability: Provide a large model operating environment based on FastChat .
- Unified data vectorized storage and indexing: Provides a unified way to store and index various data types.
- Connection module: used to connect different modules and data sources to realize data flow and interaction.
- Agent and plug-in: Provide Agent and plug-in mechanism, so that users can customize and enhance the behavior of the system.
- Prompt automatic generation and optimization: Automatically generate high-quality prompts and optimize them to improve the response efficiency of the system.
- Multi-terminal product interface: supports a variety of different client products, such as Web, mobile applications, and desktop applications.
Install
Multilingual switching
In the .env configuration file, modify the LANGUAGE parameter to switch between different languages, the default is English (Chinese zh, English en, other languages to be added)
Instructions for use
Multi-model use
If you encounter nltk-related errors when using the repository, you need to install the nltk toolkit. For more details, see: nltk documentation Run the Python interpreter and type the commands:
>>> import nltk >>> nltk.download()
We provide a brand-new user interface through which you can use DB-GPT. At the same time, we have prepared the following reference articles about some codes and principles related to our project.
- Large Model Combat Series (1) —— Combat with Langchain-Vicuna Application
- Large Model Combat Series (2) —— DB-GPT Alibaba Cloud Deployment Guide
- Large Model Actual Combat Series (3) —— Principle and Application of DB-GPT Plug-in Model
grateful
The achievements of the project need to thank the technical community, especially the following projects.
- FastChat provides chat service
- vicuna-13b as the base model
- langchain- toolchain
- Auto-GPT general plugin template
- Hugging Face Large Model Management
- Chroma vector storage
- Milvus distributed vector storage
- ChatGLM basic model
- llama-index performs In-Context Learning based on the existing knowledge base to enhance its database-related knowledge.
contribute
- Please execute before submitting the code
black .
This is a complex and innovative tool for databases. Our project is also under urgent development, and some new features will be released one after another. If you have any specific questions during use, please submit an issue under the project first. If necessary, please contact the following WeChat, and I will try my best to help. At the same time, everyone is very welcome to participate in the project construction.