TiDB Bot: Use Generative AI to build an enterprise-specific user assistant robot

This article introduces how PingCAP uses Generative AI to build a user assistant robot that uses an enterprise-specific knowledge base. In addition to using the knowledge base-based answer method commonly used in the industry, we also tried to use the model to determine toxicity under the few shot method. In the end, the rate of clicks and dislikes after users used the robot was less than 5%, and it has been applied to TiDB's various channels for global customers.

The magic of Generative Al has unfolded.

Since 2022, Generative AI (hereinafter referred to as GenAI) has swept the world. Since MidJourney (  https://www.midjourney.com/  ), DALL-E (  https://openai.com/dall-e-2  ) brought the popularity of text-generated images, and then to ChatGPT (  https:// openai.com/chatgpt  ) completely ignited people's attention with natural and fluent conversations, making GenAI a topic that can no longer be avoided. Whether AI can support humans to live and work better in more general scenarios has become one of the core topics in 23 years.

Among them, the rise of development tools such as LangChain (  https://www.langchain.com/  ) means that engineers are beginning to create GenAI-based applications in batches. PingCAP has also done some experiments and completed some work, such as:

● Ossingisht’s Data Explorer (  https://ossinsight.io/explore/  ): a project that uses natural language to generate SQL to explore Github open source software

● TiDB Cloud’s Chat2Query (  https://docs.pingcap.com/tidbcloud/explore-data-with-chat2query  ): A project that uses the database in the Cloud to generate SQL through natural language. After building these applications, the author began to think about whether GenAI's capabilities can be used to build more general applications and bring greater value to users.

Need thinking

With the gradual growth of global TiDB and TiDB Cloud, support for global users has become increasingly important. With the geometric growth of the number of users, the number of user-facing support staff will not grow rapidly. Therefore, how to handle a large number of users has become an urgent matter to consider.

Based on the experience of actual support users, surveys of users’ questions in the global community and internal work order systems, more than 50% of users’ questions can actually find answers in official documents, but it is just because the documents contain too much content and are difficult to find. Therefore, if a robot with knowledge of all official TiDB documents can be provided, it may be able to help users use TiDB better.

The gap between Generative Al and demand realization

After discovering the demand, you also need to understand the characteristics and limitations of GenAI to confirm whether Gen AI can be used in this demand. Based on the work that has been completed, the author can summarize some characteristics of Gen AI. Here, Gen AI mainly refers to GPT (Generative Pre-trained Transformer) type models, which focus on text dialogue. The remainder of this article will be described in terms of GPT.

1  GPT  capabilities

●  The ability to understand semantics  . GPT has extremely strong semantic understanding capabilities and can basically understand any text without any obstacles. No matter what language (human language or computer language), what level of expression the text is, even if it is mixed with multiple languages, or there are errors in grammar and wording, the user's questions can be understood.

●  The ability of logical reasoning  . GPT has specific logical reasoning capabilities. Without adding any special prompt words, GPT can make simple reasoning and dig out the deep content of the problem. With the addition of certain prompt words, GPT can achieve stronger reasoning capabilities. These prompt word methods include: Few-shot, Chain-of-Thought (COT), Self-Consistency, Tree of thought (TOT), etc. wait.

●  The ability to attempt to answer all questions  . GPT, especially Chat-type GPT, such as GPT 3.5 and GPT 4, will definitely try to use conversational form to answer all the user's questions while meeting the set values, even if it means answering "I can't answer this information."

●  The ability of general knowledge  . GPT itself has a large amount of general knowledge, which has high accuracy and covers a wide range.

●  The ability to have multiple rounds of dialogue  . GPT can understand the meaning of multiple conversations between different characters based on the set roles, which means that questioning can be used in the conversation instead of repeating all the key information in history in every conversation. This behavior is very consistent with human thinking and conversation logic.

2  Limitations of GPT 

●  Passive trigger  . GPT must provide a piece of content from the user before it will reply to the content. This means that GPT itself does not actively initiate interactions.

●  Knowledge is out of date  . This specifically refers to GPT 3.5 and GPT 4. The training data of both are as of September 2021, which means that GPT does not know the subsequent knowledge. You can't expect GPT itself to provide you with new knowledge.

●  The illusion of segmented fields  . Although GPT has excellent capabilities in the general knowledge part, in a specific knowledge field, such as the database industry where the author works, most of GPT's answers have more or less errors and cannot be directly accepted.

●  Conversation length  . GPT has a character length limit for each conversation, so if you provide GPT with content that exceeds the character length, the conversation will fail.

3  Gap in demand realization

The author hopes to use GPT to implement an "enterprise-specific user assistant robot", which means the following requirements:

● Requirement 1: Multiple rounds of dialogue, understand the user’s questions and give answers.

● Requirement 2: The content about TiDB and TiDB Cloud in the answer must be correct.

● Requirement 3: Cannot answer content unrelated to TiDB and TiDB Cloud.

Analyze these requirements:

● Requirement 1: Basically can be met, according to GPT's "ability to understand semantics", "ability to logical reasoning", "ability to try to answer questions", "ability to understand context".

● Demand 2: Unsatisfied. Because of GPT's "knowledge expiration" and "illusion of subdivided fields" limitations.

● Demand three: cannot be met. Because of GPT's "ability to try to answer all questions", any question will be answered, and GPT itself does not limit answering non-TiDB questions.

Therefore, in the construction of this assistant robot, it is mainly to solve the problems of demand two and demand three.

Correct answers to segmented field knowledge

Here we need to solve the problem of requirement two.

How to make GPT answer users' questions based on specific domain knowledge is not a new field. The author's previously optimized Ossinsight - Data Explorer used specific domain knowledge to help the executability rate of natural language generated SQL (that is, the generated SQL can be successfully executed in The result of running in TiDB) has been improved by more than 25%.

What needs to be used here is the spatial similarity search capability of the vector database. Generally divided into three steps:

1Domain  knowledge is stored in a vector database

The first step is to put the official documentation of TiDB (  https://docs.pingcap.com/tidb/stable  ) and TiDB Cloud (  https://docs.pingcap.com/tidbcloud  ) into the vector database.

After obtaining the document, put the text content into the Embedding model, generate vectors corresponding to the text content, and put these vectors into a specific vector database.

In this step, there are two points to check:

● If the quality of the document is poor, or the format of the document does not meet expectations, the document will be pre-processed in advance to convert the document into a relatively clean text format that is easily understood by LLM.

● If the document is long and exceeds the length of a single GPT session, the document must be trimmed to meet the length requirement. There are many methods of cropping, such as cropping by specific characters (such as comma, period, semicolon), cropping by text length, etc.

2Search  for relevant content from the vector database

The second step is to search for relevant text content based on the user's question from the vector database when the user asks a question.

When a user initiates a conversation, the system will convert the user's conversation into a vector through the Embedding model, and then put this vector into the vector database for query against the original expectations. During the query process, similarity algorithms (such as cosine similarity, dot-product, etc.) are used to calculate the most similar domain knowledge vectors and extract the text content of the corresponding vectors.

The user's specific question may require multiple documents to answer, so during the search process, the Top N with the highest similarity will be selected (currently N is 5). These Top N can meet the needs of spanning multiple documents, and will become the content provided to GPT in the next step.

3Relevant  content and user questions are provided to  GPT

The third step is to assemble all relevant information and provide it to GPT.

Include the task goals and relevant domain knowledge in the system prompt words, and assemble the chat history based on historical conversations. By feeding everything together into GPT, you can get specific answers based on this part of the domain knowledge.

After completing the above steps, we can basically meet the second requirement. We can answer questions based on specific domain knowledge. The accuracy of the answer is greatly improved compared to directly asking GPT.

Limit answer fields

Here we need to solve the problem of requirement three.

This robot is provided to users as an enterprise support capability, so it is expected that the robot will only answer content related to the enterprise, such as TiDB, TiDB Cloud itself, SQL questions, application construction questions, etc. If it exceeds these ranges, expect the bot to refuse to answer, for example, weather, city, art, etc.

Because GPT's "ability to try to answer all questions" was mentioned before, for the setting of GPT itself, any question should be answered in line with human values. Therefore, we cannot rely on GPT to help us build this layer of restrictions. We can only try to impose restrictions on the application side.

Only by meeting this requirement can a business truly go online to serve users. Unfortunately, there is currently no better implementation of this in the industry, and most application designs do not involve this part.

1  Concept: Toxicity

As mentioned just now, GPT will actually try to make the answers consistent with human values. This step is called "Align" in model training, allowing GPT to refuse to answer questions related to hatred and violence. If GPT does not answer questions related to hatred and violence as set, it is said to have detected toxicity.

Therefore, for the robot that the author is about to create, the scope of its toxicity has actually increased, that is, all content that answers non-company business can be said to be toxic. Under this definition, we can refer to previous work on detoxification. DeepMind's Johannes Welbl (  https://aclanthology.org/2021.findings-emnlp.210.pdf  ) (2021) and others introduced that language models can be used as a way to detect toxicity. At present, GPT's ability has been sufficient With the enhancement, it has become possible to use GPT to directly determine whether the user's question falls within the company's business scope.

To "limit the answer field", two steps are required.

2  Judgment in limited areas

The first step is to judge the user's original question.

Here we need to use the few shot method to construct prompt words for toxicity detection, so that GPT can judge whether the user's question is within the scope of enterprise services when it has multiple examples.

For example some examples:

<< EXAMPLES >>
instruction: who is Lady Gaga?
question: is the instruction out of scope (not related with TiDB)?
answer: YES
instruction: how to deploy a TiDB cluster?
question: is the instruction out of scope (not related with TiDB)?
answer: NO
instruction: how to use TiDB Cloud?
question: is the instruction out of scope (not related with TiDB)?
answer: NO

After the judgment is completed, GPT will input the text of Yes or No for subsequent process processing. Note that Yes here means poisonous (not related to business), and No means non-toxic (related to business).

In the second step, after getting the result of whether it is toxic or not, we branch the toxic and non-toxic processes and handle the abnormal processes and normal processes.

The normal process is the content related to the correct answer to the subdivided domain knowledge mentioned above. The process of abnormal content is mainly explained here.

When the system finds that the output content is "Yes", it will guide the process to enter the toxic content reply process. At this time, a system prompt word that refuses to answer the user's question and the user's corresponding question will be submitted to GPT, and the end user will get a reply that refuses to answer.

After completing these two steps, requirement three is basically completed.

Overall logical architecture

At this point, we have obtained an assistant robot that can basically provide users with knowledge in specific enterprise fields. We call this robot TiDB Bot.

Effect after TiDB Bot goes online

TiDB Bot has been undergoing internal testing since March 30 and will be officially open to Cloud users on July 11.

In the 103 days since TiDB Bot was incubated, thanks to the feedback provided by countless communities and developers on the test product, TiDB Bot has gradually become available. During the testing phase, a total of 249 users used it and sent 4570 messages. By the time the test phase was completed, a total of 83 users had given 266 pieces of feedback, of which thumbs-down feedback accounted for 3.4% of the total number of messages, and thumbs-up feedback accounted for 2.1% of the total number of messages.

In addition to community users who use it directly, there are also users who make suggestions and ideas, and community users who give more solutions. Thank you to all the communities and developers. Without you, there would be no TiDB Bot product release.

Follow-up

As the number of users gradually increases, there are still many challenges in terms of the accuracy of recall content and the success of toxicity judgment. Therefore, in the actual service provided, the author optimizes the accuracy of TiDB Bot and steadily improves the answer effect. . These contents will be introduced in subsequent articles.

Guess you like

Origin blog.csdn.net/TiDB_PingCAP/article/details/132274782