Building and deploying a local privatized knowledge base system based on langchain+chatGLM

foreword

1. Autonomous GPT

The so-called autonomous (autonomous) GPT is to design an Agent, let it make plans, decisions and actions by itself, and complete the set goals through continuous iterations. Such as AutoGPT.

AutoGPT pushes the capabilities of GPT to a higher application level. Set a task: make xx dollars. AutoGPT achieves astonishing results through ten or more rounds of interaction with the Internet and GPT. What is the principle behind it?

1.1 Working memory

The essence of chatGPT is stateless, and users feel that continuous dialogue is achieved through multiple rounds of dialogue history before each input. This kind of multi-turn dialogue is academically known as " short-term memory ". The common ones are the following three:

  • Actively guide with prompt
  • ReAct
  • Self Ask

Through working memory, we can split complex tasks into multiple small subtasks and guide chatGPT to complete more and more complex tasks.

When chatGPT is required to play a certain role, we will use " long-term memory ". We will store the long-term memory in an external database. When the chatGPT conversation is started, the long-term memory is extracted from the external database and entered into chatGPT as the initial prompt.

What if the long-term memory is too large and long to exceed the number of tokens that chatGPT can handle at a time?

This situation is very common, we will highly compress it through the Embedding model and store it in the vector database.

1.2 Prompt word engineering

Considering LLM as an "intelligent" creature, how can we ask questions to better interact with it?

In general, we divide prompt words into four categories:

  • Instructions : Tasks that the LLM is required to complete
  • Output Format : Requirements for output from LLM
  • Context : Give the large model some external information as input
  • Questions : Specific questions for the LLM

2. Introduction to langchain

2.1 Introduction to langchain

How do you use these LLM models when you get them?

Langchain is an intermediate framework connecting APP and LLM. See the example below:

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain import OpenAI,VectorDBQA
from langchain.document_loaders import DirectoryLoader
from langchain.chains import RetrievalQA
import os
os.environ["OPENAI_API_KEY"] = 'your openai key'

# 加载文件夹中的所有txt类型的文件
loader = DirectoryLoader('/movies/', glob='**/*.csv')
# 将数据转成 document 对象,每个文件会作为一个 document
documents = loader.load()

# 初始化加载器
text_splitter = CharacterTextSplitter(chunk_size=100, chunk_overlap=0)
# 切割加载的 document
split_docs = text_splitter.split_documents(documents)

# 初始化 openai 的 embeddings 对象
embeddings = OpenAIEmbeddings()
# 将 document 通过 openai 的 embeddings 对象计算 embedding 向量信息并临时存入 Chroma 向量数据库,用于后续匹配查询
docsearch = Chroma.from_documents(split_docs, embeddings)

# 创建问答对象
qa = VectorDBQA.from_chain_type(llm=OpenAI(), chain_type="stuff", vectorstore=docsearch,return_source_documents=True)
# 进行问答
result = qa({"query": "电影《红高粱》简介?"})
print(result)

The above code parses the movie data from the local csv file, splits and stores it in the vector database (Chroma), passes it to OpenAI GPT through the from_chain_type function parameter llm=OpenAI(), returns the result, and uses the langchain framework to complete the most Easy AI-driven APP development.

2.1.1 Text Embedding

Langchain provides interface classes for various models that support text Embedding. If you use OpenAI as LLM, you may need to use: text-embedding-ada-002 this model.

The function of text embedding is to encode a piece of text into a vector, and then query it through a vector approximation algorithm. Common usage is:

query_result = embeddings.embed_query(text)

doc_result = embeddings.embed_documents([text])

2.1.2 Document index

langchain defines four indexing methods

  • Stuffing directly inputs the document as a prompt to OpenAI
  • MapReduce makes a prompt (answer or summary) for each chunk, and then merges
  • Refine does a prompt on the first chunk to get the result, then merges the next file and outputs the result
  • Map-Rerank makes a prompt for each chunk, then scores it, and returns the results in the best document according to the score

2.2 langchain agent

To achieve functions similar to AutoGPT, one cannot only rely on static, pre-encoded chains, but needs to have the ability to dynamically generate chains. Therefore, the concept of Agent needs to be used, and langchain has also developed the corresponding function: Agents

2.2.1 Tools

Agents themselves will not complete the dynamic part of the work, it depends on the Tools class, in addition to buildin Tools, it also supports user-defined. There are two ways to define Tools : Based on the Tool class and inheriting from BaseTool. The most convenient is to use python syntactic sugar @tool

from langchain.agents import tool

@tool
def search_api(query: str) -> str:
    """Searches the API for the query."""
    return f"Results for query {query}"


class SearchInput(BaseModel):
    query: str = Field(description="should be a search query")
        
@tool("search", return_direct=True, args_schema=SearchInput)
def search_api(query: str) -> str:
    """Searches the API for the query."""
    return "Results"

2.2.2 Agents

common agent

class FakeAgent(BaseSingleActionAgent):
    # 同步
    def plan(
        self, intermediate_steps: List[Tuple[AgentAction, str]], **kwargs: Any
    ) -> Union[AgentAction, AgentFinish]:

        ...
    # 异步
    async def aplan(
        self, intermediate_steps: List[Tuple[AgentAction, str]], **kwargs: Any
    ) -> Union[AgentAction, AgentFinish]:
        ...

LLM agent

# LLM chain consisting of the LLM and a prompt
llm_chain = LLMChain(llm=llm, prompt=prompt)

tool_names = [tool.name for tool in tools]
agent = LLMSingleActionAgent(
    llm_chain=llm_chain, 
    output_parser=output_parser,
    stop=["\nObservation:"], 
    allowed_tools=tool_names
)

MRKL agent

MRKL (Modular Reasoning, Knowledge and Language) consists of a set of modules (such as Google search, API calls, database queries, etc.) and a router that decides how to "route" natural language queries to the appropriate modules. There are three pieces included in the langchain framework

  1. Tools
  2. LLMChain: Generate text and decide which action to take based on the text
  3. Agent
# Tools
search = SerpAPIWrapper()
tools = [
    Tool(
        name = "Search",
        func=search.run,
        description="useful for when you need to answer questions about current events"
    )
]




# LLMChain,需要输入参数prompt
llm_chain = LLMChain(llm=OpenAI(temperature=0), prompt=prompt)





# Agent
tool_names = [tool.name for tool in tools]
agent = ZeroShotAgent(llm_chain=llm_chain, allowed_tools=tool_names)

There are many restrictions on using openai in China. In addition, customers are also worried about data security, so we found an open source alternative to GPT, ChatGLM-6B.

3. Introduction of chatGLM-6B

chatGLM-6B is based on  the General Language Model  architecture, using and  ChatGLM  technology, optimized for Chinese question and answer and dialogue. After about 1T identifiers of Chinese and English bilingual training, supplemented by supervised fine-tuning, feedback self-help, human feedback reinforcement learning and other technologies, LLM with 6.2 billion parameters.

# 在langchain-ChatGLM代码中只需要修改model_config.py中的下面一行,即可实现本地加载
llm_model_dict = {
    "chatglm-6b-int4-qe": "THUDM/chatglm-6b-int4-qe",
    "chatglm-6b-int4": "THUDM/chatglm-6b-int4",
    "chatglm-6b": "C:\\AlexOuyang\\app\\GPT projects\\langchain-ChatGLM\\models\\chatglm-6b",
    "chatyuan": "ClueAI/ChatYuan-large-v2",
}

4. langchain-ChatGLM

langchain-ChatGLM  is a code implementation that uses the langchain framework and the chatGLM-6B model to replace chatGPT. Enterprise-level privatization deployment can be realized. The code logic is as follows:

 

From the development roadmap, this project is still behind AutoGPT (the most important Agent has not been implemented)

 The above are mainly some records in the process of learning this concept, which are purely used as notes. If you offend, please contact to delete. The following is the record of your own localization deployment practice experience, hoping to help those who need it.

The official instantiation project of langchain+chatGLM is provided, the address is here , as follows:

 At present, there are nearly 10k stars, which is still very strong.

The official also provides a video of the principle introduction and learning, here .

 From the perspective of document processing, the implementation process is as follows:

 Basically, it is easy to operate according to the readme of the official project. The main thing to note is that you need to adjust your local environment. Some environmental dependencies need to be installed. In addition, the corresponding configuration must be modified.

The configuration directory of the project is as follows:

 Here are the main points to modify:

1. The text2vec model configuration is as follows:

 2. The llm model configuration is also the chatglm model configuration. Because of the limitation of my hardware resources, I choose the int4 type model, as follows:

 3. Modify the model loading method. Generally, I believe that everyone will choose to download the model to the local and then load the local model, instead of directly loading the model in HuggingFace remotely. The reason is not much to say, here is the modification as follows:

 This will directly load and read the model file you downloaded locally.

The official model warehouse address is here , as follows:

 Tsinghua University provides a private model file download address, here , as follows:

 But here is not complete, only the model weight file is provided, some code and configuration files still need to be downloaded from HuggingFace. The address is here , as follows:

 Of course, I choose to use the int4 version of the model here, and the download address is here , as follows:

 You can download it to your local directory by yourself.

After completing all the above configurations, you can start the project in the terminal, just execute the webui.py module.

The effect after startup is as follows:

 You can see that the upper right corner provides three different answering methods, because this is to build a privatized knowledge base, so the answer here is the knowledge base. You can upload your own knowledge base file data, and then wait for the model to be loaded before starting to ask questions. Very obvious test findings: For example, if you ask where a certain company is, the model itself must not know, but feed the knowledge base data set Afterwards, the model can answer the question. For example, in Chaoyang District, Beijing, the model may answer that in Beijing, it still has a relatively obvious effect. Of course, the project itself is still in the stage of rapid development, and its functions are not stable enough. When I actually tested it, I found that sometimes the results of the same question are not the same.

The above is the practice record of completely building and deploying the privatized knowledge base locally. If you are interested, you can try it!

Guess you like

Origin blog.csdn.net/Together_CZ/article/details/131326288