Quick start, how does ChatGPT build a private knowledge base?

privateGPT is an open-source project that can be deployed locally, import company or personal private documents without networking, and then ask questions to the document in natural language just like using ChatGPT.

No internet connection required, harness the power of LLMs and ask questions of your documents. 100% private, no data leaves your execution environment at any time. You can import documents and ask questions without an internet connection! Built with LangChain, GPT4All, LlamaCpp, Chroma and SentenceTransformers.

Chatgpt latest use link portal: https://pan.baidu.com/s/1TsZ78aMcbYXEY9IMXW7QDQ?pwd=pn1t 
Extraction code: pn1t 

What documents does privateGPT support?

txt、CSV、word、html、mardown、PDF、PPT等。

privateGPT project address

https://github.com/imartinez/privateGPT

This article is summed up by the author after stepping on many pits, and it should be able to install successfully by following the steps. The article is long, it is recommended to bookmark it before reading. The resources needed in the tutorial can be downloaded in one stop using the Baidu network disk , follow the official account "AI technology practice", reply "privateGPT" to get the network disk link, and if you have any questions during the deployment process, you can private message the author on the official account.

Installation Environment

windows10/windows11, at least 20G free disk space.

1. Download the model

Download address: https://gpt4all.io/models/ggml-gpt4all-j-v1.3-groovy.bin

This file is more than 4g, and you can perform the following operations while waiting for the download.

2. Install the software

2.1 Install Visual Studio 2022

Download link: Visual Studio 2022 | Free Download

Click to install after downloading

Note: Select the place pointed by the arrow below

Click "Install" in the lower right corner , the installation process will be slow, you can perform the following operations first.

2.2 install python

Search and install python3.10 and above in the Microsoft Store , the author uses 3.10.

2.3 install git

Download address: https://git-scm.com/


2.4 Install the Windows version of GCC MinGW
download address: MinGW - Minimalist GNU for Windows

After downloading, double-click to install, and you need to wait for a while.

After waiting for completion, click the Close button to close.

3. Download privateGPT source code

Create a new aiworkspace directory in the root , enter this directory, and execute the following commands (you can also create a new directory according to your own situation, it is recommended to be consistent with the author, so as to facilitate comparison of the execution process).

git clone [email protected]:imartinez/privateGPT.git

4. Install project dependencies

Run cmd as an administrator (subsequent use of cmd will run as an administrator)

Enter the privateGPT directory downloaded in the previous step, and execute the following command:

pip3 install -r requirements.txt

As shown in the figure, it takes a long time to wait:

The author reported the following error when running, updated pip according to the prompt, and then re-run the above command:

Dependency installed successfully:

5. Import models and documents

5.1 Import model

Create a new models directory under the privateGPT directory, and place the downloaded model files in step 1 in this directory:

​5.2 Import documents

The source_documents under the privateGPT directory is the directory where the source documents are placed, and we can put our own documents that we want to ask and answer here. After the source code is downloaded, there will be a sample document state_of_the_union.txt in the source_documents directory, we delete it, and copy Bryant's info.txt downloaded from the network disk to here.

document content:

There is a man named Bryant who is Chinese. He was born in 1991 and works as a Java developer. He graduated in 2013 and worked at "Dev AI" for three years before moving to "Test AI" where he worked for another two years. In his free time, he enjoys watching movies, playing basketball, swimming, running, and hiking.

5.3 Modify .env

Copy example.env and rename it to .env.

5.4 Indexing documents

Cmd enters the privateGPT directory and executes the following command to let privateGPT index our documents:

python ingest.py

The first execution will download some things, and the execution speed will be faster later.

Here is a screenshot of the indexing complete:

After the index is completed, the db directory will be automatically generated in privateGPT. This directory is the database directory of privateGPT . PrivateGPT will use the data in this directory, and the documents in source_documents can be deleted.

If you want to modify the content of the source document, you can first modify the file content in the source_documents directory, then delete the db folder, and re-execute the above command to create the database.

6. Q&A using GPT

Finally, it’s time to use it. Now we can chat with privateGPT, ask it questions in natural language, and execute the following commands in the privateGPT directory:

python privateGPT.py

When Enter a query: appears, you can enter the question on the command line . If you use the document I gave above, you can directly copy the test results of the following questions:

hello, you play my assitant, I'm gonna ask you some questions and you should reply briefly, if you don't know the answer, just say you don't know, do you understant?

GIF cover

What's his job ?

Write a simple web application and call a python script to implement a simple enterprise internal knowledge base system

Guess you like

Origin blog.csdn.net/qqerrr/article/details/132147623