From knowledge base to interactive robot: GPT tutorial helps you build a content-based intelligent dialogue system


7068d7a7b3a6a7131c3109680a9b5880.jpeg

1 Introduction

This idea comes from my personal needs. I have serialized nearly 100 newsletters and accumulated a lot of content. I hope to import these materials to AI, and then AI can use these data to answer my questions and even give me some writing suggestions wait.

At the earliest time, I tried a very stupid method, which is to pass my newsletter text to AI when asking questions, and its prompt is probably like this:

Please summarize the following sentences to make them easier to understand.

Text: """
My newsletter
"""


This method can be used, but currently Cha**GPT has a very large limit. It limits the maximum number of tokens to 4096, which is about 16,000 characters. Note that this is a request + response, and the actual total number of requests does not so many. In other words, I can't import too much content to Cha**GPT at a time (my Newsletter has nearly 5,000 words), this problem has been stuck for me for a long time, until I saw GPT Index  ; and the Lennys Newsletter example.

Try it, it is very easy to use, and the steps are very simple, even if you don't know programming, you can easily follow the steps to realize this function.

I slightly optimized the code of the following example and added some introductions to the principles. Hope you all like it.

2. Principle introduction

In fact, there are ready-made methods for my needs in the field of traditional robots. For example, you should see many e-commerce customer service products with similar functions. If you say a word, the robot will reply to you.

This traditional robot is usually based on intent to answer human questions. For example, we built a customer service robot, and its working principle is as follows:

016af78fc0e6f412cc5fdbf6dcac4c70.jpeg

When the user asks "What should I do if I forget my password?", it will find the "password" closest to the intent. There will be many sample questions in each intent, such as "How to retrieve the forgotten password" and "What should I do if I forget the password", These sample questions will then have an answer "click the A button to retrieve your password", and the robot will match the intent closest to the sample question and return the answer.

But there is a problem with this. We need to set a lot of intentions, such as "unable to log in", "forgot password", and "login error". Although they may all describe one thing, we need to set three intentions and three sets of questions. and answer.

Although traditional robots have many limitations, this traditional approach has given us some inspiration.

We seem to be able to use this method to solve the problem of restricting tokens. We only need to pass documents that meet a certain intent to AI, and then AI will only use this document to generate an answer:

22a83fef503d928e5546fd4410960fcb.jpeg

For example, in the example of the customer service robot above, when the user asks "What should I do if I forget my password?", the intent related to "login" is matched, and then documents with the same or similar intent in the knowledge base are matched, such as "Login exception handling solution document", and finally we pass this document to GPT-3, which then uses the content of this document to generate an answer.

The simple understanding of the GPTIndex library is to do the part on the left of the above picture, and its working principle is as follows:

  1. Create a knowledge base or document index
  2. find the most relevant index
  3. Finally, give the content of the corresponding index to GPT-3

3. Restrictions and points of attention

Although this method solves the problem of token restrictions, it also has many limitations:

  1. When the user asks some vague questions, the match may be wrong, causing GPT-3 to get the wrong content, and finally generate a very outrageous answer.
  2. Bots sometimes generate false information when users ask for information without much context.

So if you want to use this technology as a customer service robot, I suggest you:

  1. Use some guiding questions to clarify the user's intentions first, which is similar to traditional customer service robots, and make a few buttons and let the user click first (for example, unable to log in).
  2. If the similarity is too low, it is recommended to add a bottom-up answer "Sorry, I can't answer your question, do you need to switch to manual customer service?"

4. Practice

In order to make it easier for everyone to use, I put the code in Google Colab, you don't need to install any environment, just open this with a browser: Code file

BTW you can copy and save it to your Google Drive.

information

I have received feedback from many friends that the button below cannot be clicked. The following is just a screenshot, you need to open this code file to operate. In addition, regarding the problem that the answer does not meet expectations, it is mainly the problem of vector matching, and there is no solution for the time being.

Step 1: Import data

There are two ways to import, the first is to import online data.

Importing GitHub data is relatively easy. If you are using it for the first time, I suggest you try this method first. Clicking the play button before the code below will run the code.

af2578a00f249215d57d7ccd489656ab.jpeg

After running, it will import several newsletters I wrote. If you also want to import data like me, just modify the link address behind clone.

The second method is to import offline data. Click the folder button on the left (if you are not logged in, this step will let you log in), then click the upload button marked 2 in the figure below to upload the file. If you want to upload multiple files, it is recommended that you create a folder first, and then upload all the files to this folder.

238ad79d6b8c0c8f8f38a32813e121fe.jpeg

Step 2 & 3: Install dependent libraries

Just click the play button.

But in the third step, you can try to change the parameters, you can change:

  1. num_ouputs: This is to set the maximum number of output tokens. The larger the number, the more words the machine can answer when answering questions.
  2. Temperature:  This is mainly to control the randomness of the results generated by the model. In short, the lower the temperature, the more certain the results, but also the more mundane or uninteresting. If you want to get some unexpected answers, you may wish to increase this parameter. But if your scenario is a fact-based scenario, such as data extraction and FAQ scenarios, it is best to set this parameter to 0.

Just leave the other parameters alone, it's not a big problem.

Step 4: Set OpenAI API Key

This requires you to log in to OpenAI (note that OpenAI is not Cha**GPT), click the avatar in the upper right corner, click View API Keys, or you can directly access it by clicking this link. Then click "Create New Secret Key", then copy that Key and paste it into the document.

e9b1233d3032ccc34cf53d736f678bb5.jpeg

Step Five: Build the Index

This step will run through the data imported in the first step and use OpenAI's embeddings API. If you uploaded your own data in the first step, just change Jimmy-Newsletter-Corpus in ' ' to the name of the folder you uploaded.

Notice:

  • This step will consume your OpenAI Credit. The price of 1000 tokens is $0.02. Before running the following code, you need to pay attention to whether there is money in your account.
  • If the OpenAI account you are using is a free account, you may encounter a frequency warning. At this time, you can wait for a while before running the following code (in addition, your imported knowledge base data is too much, it will also trigger). The best way to lift this restriction is to bind a credit card on the Billing page of your OpenAI account. How to bind the card, you need to search by yourself.

Step Six: Ask Questions

At this step, you can try to ask questions. If you imported my preset data in the first step, you can try to ask the following questions:

  • What is the main content of Issue 90?
  • Recommend a book similar to the one mentioned in Issue 90

If you are importing your own data, you can also ask the following types of questions:

  • Summarize
  • ask questions
  • information extraction

The above is a brief introduction. Follow and private message to get free GPT learning materials, Midjourney AI painting learning materials and GoGPT.VIP tutorials;

Search AI innovation workshop GOGPT, embrace AI, embrace GPT, embrace a bright future!

818b2c8f3290b4490fa110c559993a3f.jpeg

Guess you like

Origin blog.csdn.net/siyu471384214/article/details/131545054