LangChain: A New Chapter for Large Language Models

9c9aff3a2c6a7911230cb3ca357c7073.gif

This paper introduces the LangChain framework, which enables more powerful applications by combining large language models with other sources of computation or knowledge. Then, the key concepts of LangChain are explained in detail, and some case attempts are made based on this framework, aiming to help readers understand the working principle of LangChain more easily.

8dcf4ddce31612ad639210ebae85f852.png

introduction

Recently, large language models (LLMs) such as the GPT series of models have led a technological revolution in the field of artificial intelligence. Developers are using these LLMs to make various attempts. Although many interesting applications have been produced, it is often difficult to build powerful and practical applications using these LLMs alone.

LangChain realizes more powerful artificial intelligence applications by combining large language models with other knowledge bases and computational logic. To put it simply, I personally understand that LangChain can be regarded as an open source version of the GPT plug-in, which provides a wealth of large language model tools that can quickly enhance the capabilities of the model based on the open source model.

Here, I summarize the recent learning content of LangChain, and welcome all students to come and communicate. LangChain makes the use of language technology more active and diverse. It is expected to play an important role in the field of artificial intelligence and promote the transformation of our work efficiency. We are on the eve of the explosion of artificial intelligence, and actively embracing new technologies will bring a new experience.

bb2c954be05d79927d08872389be6fdd.png

LangChain main concepts and examples

LangChain provides a series of tools to help us better use the large language model (LLM). There are mainly 6 different types of tools that can be considered:

6442286bd4886df9dba579630f463958.png

▐Models   _

One of the core values ​​of LangChain is that it provides a standard model interface; then we can freely switch between different models. There are currently two types of models, but considering the usage scenarios, we generally use one model That is, the text generation model.

When it comes to models, everyone understands that the model is just ChatGPT. Pure models can only generate textual content.

  • Language Models

For text generation, text is used as input and output is also text.

  1. Plain LLM: Receives a text string as input and returns a text string as output

  2. Chat model: takes a list of chat messages as input and returns a chat message

Code example:

from langchain.schema import HumanMessage
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
llm = OpenAI()
chat_model = ChatOpenAI()
print(llm("say hi!"))
print(chat_model.predict("say hi!"))

ab9fcfb33c445d0fd89636d52df2e892.png

  • Text Embedding Models

Convert literals to descriptions in floating-point form:

These models receive text as input and return a set of floating point numbers. These floating-point numbers are usually used to represent the semantic information of the text, so as to perform tasks such as text similarity calculation and cluster analysis. Text embedding models can help developers create richer connections between texts and improve the performance of applications based on large language models.

from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()
text = "This is a test document."
query_result = embeddings.embed_query(text)
doc_result = embeddings.embed_documents([text])
print(doc_result)

59b481d93a8cd6ee84be4f92bc9e2f84.png

▐Prompts   _

The prompt word is the way we interact with the model, or the input of the model. Through the prompt word, the model can return the content we expect, for example, let the model return data to us in a certain format.

LangChain provides some tools to make it easier for us to build the prompt words we want . The main tools are as follows:

It is more convenient to understand these tools. Let us construct prompt words.

  • PromptTemplates

Language model prompt word template PromptTemplates, prompt templates allow us to generate prompts repeatedly and reuse our prompts. It contains a text string ("template") that takes a set of parameters from the user and generates a prompt, containing:

  1. The description of the language model, what role should it play

  2. A small set of examples to help the LLM generate better responses,

  3. specific questions

Code example:

from langchain import PromptTemplate




template = """
I want you to act as a naming consultant for new companies.
What is a good name for a company that makes {product}?
"""


prompt = PromptTemplate(
    input_variables=["product"],
    template=template,
)
prompt.format(product="colorful socks")
# -> I want you to act as a naming consultant for new companies.
# -> What is a good name for a company that makes colorful socks?
  • ChatPrompt Templates

ChatPrompt Templates, ChatModels accept a list of chat messages as input. Lists are generally different prompts, and each list message will generally have a role.

from langchain.prompts import (
    ChatPromptTemplate,
    PromptTemplate,
    SystemMessagePromptTemplate,
    AIMessagePromptTemplate,
    HumanMessagePromptTemplate,
)
from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)
template="You are a helpful assistant that translates {input_language} to {output_language}."
system_message_prompt = SystemMessagePromptTemplate.from_template(template)
human_template="{text}"
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)


chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])


# get a chat completion from the formatted messages
print(chat_prompt.format_prompt(input_language="English", output_language="French", text="I love programming.").to_messages())

bbd65145bba39d72ec5bb560523a9d03.png

  • Example Selectors

Example Selectors Example Selectors, if there are multiple cases, use ExampleSelectors to select a case for prompt words to use:

  1. custom case selector

  2. Based on the length of the case selector, when the input is long, there will be fewer cases, and when the input is more, the number of cases will be more.

  3. Relevance selector, selects a case most relevant to the input

from langchain.prompts.example_selector.base import BaseExampleSelector
from typing import Dict, List
import numpy as np




class CustomExampleSelector(BaseExampleSelector):


    def __init__(self, examples: List[Dict[str, str]]):
        self.examples = examples


    def add_example(self, example: Dict[str, str]) -> None:
        """Add new example to store for a key."""
        self.examples.append(example)


    def select_examples(self, input_variables: Dict[str, str]) -> List[dict]:
        """Select which examples to use based on the inputs."""
        return np.random.choice(self.examples, size=2, replace=False)




examples = [
    {"foo": "1"},
    {"foo": "2"},
    {"foo": "3"}
]


# Initialize example selector.
example_selector = CustomExampleSelector(examples)
# Select examples
print(example_selector.select_examples({"foo": "foo"}))
# -> array([{'foo': '2'}, {'foo': '3'}], dtype=object)
# Add new example to the set of examples
example_selector.add_example({"foo": "4"})
print(example_selector.examples)
# -> [{'foo': '1'}, {'foo': '2'}, {'foo': '3'}, {'foo': '4'}]
# Select examples
print(example_selector.select_examples({"foo": "foo"}))
# -> array([{'foo': '1'}, {'foo': '4'}], dtype=object)

e23feba3056e4480e285286a38c5f191.png

  • OutputParsers

The output parser OutputParsers allows LLM to output more structured information:

  1. Instructs the model how to format output: get_format_instructions

  2. The output is parsed into the desired format: parse(str)

Main Parsers:

  1. CommaSeparatedListOutputParser, let LLM return in comma-separated form. ['Vanilla', 'Chocolate', 'Strawberry', 'Mint Chocolate Chip', 'Cookies and Cream']

  2. StructuredOutputParser generates structured content directly without defining an object. Similar to PydanticOutputParser, but without defining an object.

  3. PydanticOutputParser defines an object model and lets LLM return data according to this model

You can see that we defined the Joke class, and then PydanticOutputParser allows LLM to return data to us in the format of our defined objects.

from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI


from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field, validator
from typing import List
model_name = 'text-davinci-003'
temperature = 0.0
model = OpenAI(model_name=model_name, temperature=temperature)




# Define your desired data structure.
class Joke(BaseModel):
    setup: str = Field(description="question to set up a joke")
    punchline: str = Field(description="answer to resolve the joke")


    # You can add custom validation logic easily with Pydantic.
    @validator('setup')
    def question_ends_with_question_mark(cls, field):
        if field[-1] != '?':
            raise ValueError("Badly formed question!")
        return field


parser = PydanticOutputParser(pydantic_object=Joke)
prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)
joke_query = "Tell me a joke."
_input = prompt.format_prompt(query=joke_query)
output = model(_input.to_string())
print(parser.get_format_instructions())
print(output)
print(parser.parse(output))

8eb6205fffa4ff245e56112e70980239.png

▐Indexes   _

The index can make the document structured, so that LLM can directly and better interact with the document; for example, for answering questions, knowledge base, etc., LLM first obtains the answer from the document.

LangChain also provides many useful functions and tools in the indexing area, which facilitates us to load and retrieve different document data from the outside;

In terms of data indexing, the main tools provided by LangChain:

  1. Document Loaders: Load documents from different data sources. After using the loader loader to read the data source, the data source needs to be converted into a Document object before it can be used later.

  2. Text Splitters: To achieve text segmentation, there is a character limit every time we send the text as a prompt to the openai api or use the openai api embedding function. For example, if we send a 300-page pdf to openai api and ask him to make a summary, he will definitely report an error of exceeding the maximum Token. So here we need to use a text splitter to split the Document that our loader comes in.

  3. VectorStores: Store documents as vector structures, because data correlation searches are actually vector operations. Therefore, whether we use the openai api embedding function or directly query through the vector database, we need to vectorize our loaded data Document in order to perform vector operation search. Converting to a vector is also very simple. We only need to store the data in the corresponding vector database to complete the vector conversion.

  4. Retrievers: data used to retrieve documents

829cdac7b88014363445f711b05c14ae.png

FAISS in the figure is a vector storage service;

Give a case to understand the usage of different tools:

  1. First load the document

  2. Then separate the document into different blocks:

  3. and then converted to vector storage

  4. Convert the vector storage into a retriever and give it to LangChain for question answering

import os
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.indexes import VectorstoreIndexCreator
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.llms import OpenAI


# 设置代理
os.environ['HTTP_PROXY'] = 'socks5h://127.0.0.1:13659'
os.environ['HTTPS_PROXY'] = 'socks5h://127.0.0.1:13659'


# 创建文本加载器
loader = TextLoader('/Users/aihe/Downloads/demo.txt', encoding='utf8')


# 加载文档
documents = loader.load()


# 文本分块
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)


# 计算嵌入向量
embeddings = OpenAIEmbeddings()


# 创建向量库
db = Chroma.from_documents(texts, embeddings)


# 将向量库转换为检索器
retriever = db.as_retriever()


# 创建检索问答系统
qa = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type="stuff", retriever=retriever)


# 运行问题答案检索
query = "如何申请租户?"
print(qa.run(query))


print(qa.run("能否说明下你可以提供的功能?"))

6e89bbde16156de5bccfc2cb76f102cb.png

   Storage (Memory)

By default, both Agent and Chain are stateless, that is, they do not know what the last conversation was after they are used up. Each query is independent.

But in some applications, it is more important to remember the content of the last session, such as chatting, for which LangChain also provides some related tools.

from langchain import ConversationChain, OpenAI
from langchain.memory import ConversationBufferMemory


memory = ConversationBufferMemory()
memory.chat_memory.add_user_message("你好!")
memory.chat_memory.add_ai_message("你好吗?")
llm = OpenAI(temperature=0)
chain = ConversationChain(llm=llm,
                          verbose=True,
                          memory=memory)
chain.predict(input="最近怎么样!")
print(chain.predict(input="感觉很不错,刚和AI做了场对话."))

99463723ad5b5e15af99f22926db76a5.png

▐Chains   _

Chains allow us to combine multiple components into one application. For example, we create a chain that can accept user input, then format the user input as a prompt word through PromptTemplate, and then input the prompt word to LLM.

We can also combine chains together to build more complex chains.

b84d480fa5bd9e1aff403d94146ad0f3.png

A simple case:

# 引入所需模块和类
from langchain.chains import LLMChain
from langchain.chat_models import ChatOpenAI
from langchain import PromptTemplate
from langchain.prompts.chat import (
    ChatPromptTemplate,  # 引入对话模板类
    HumanMessagePromptTemplate,  # 引入人类消息模板类
)


# 创建人类消息模板类
human_message_prompt = HumanMessagePromptTemplate(
        prompt=PromptTemplate(
            template="给我一个制作{product}的好公司名字?",  # 输入模板,其中product为占位符
            input_variables=["product"],  # 指定输入变量为product
        )
    )


# 创建对话模板类
chat_prompt_template = ChatPromptTemplate.from_messages([human_message_prompt])


# 创建OpenAI聊天模型对象
chat = ChatOpenAI(temperature=0.9)


# 创建LLMChain对象,将聊天模型和对话模板传入
chain = LLMChain(llm=chat, prompt=chat_prompt_template)


# 运行LLMChain对象,并输出结果
print(chain.run("袜子"))

4691ecd55eeff9e3ffdfabf9767720a7.png

▐Agents   _

The agent is using the LLM as a thinking tool to decide what to do at the moment. We will give the agent a series of tools, and the agent will judge which tools can be used to achieve this goal based on our input, and then continue to run the tools to complete the goal.

The agent can be regarded as an enhanced version of Chain, which not only binds the template and LLM, but also adds some tools to the agent.

Agent is an intelligent agent, which is responsible for selecting the appropriate tool from a series of available tools to operate according to user input and application scenarios. Agent can adopt different strategies to decide how to perform operations according to the complexity of the task.

There are two types of Agents:

  1. Action Agents (Action Agents) : This agent performs an action at a time, and then decides the next step based on the result.

  2. Plan-and-Execute Agents (Plan-and-Execute Agents) : This agent first decides a series of operations to be performed, and then executes these operations one by one according to the list judged above.

For simple tasks, action proxies are more common and easier to implement. For more complex or long-running tasks, the initial planning steps of the plan-execute agent help maintain long-term goals and keep focus. But this comes at the cost of more calls and higher latency. These two kinds of agents are not mutually exclusive, and it is possible to make the action agent responsible for executing the plan - executing the agent's plan.

The core concepts involved in the Agent are as follows:

  1. Agent (Agent) : This is the main logic of the application. The agent exposes an interface that accepts user input and a list of actions the agent has performed, and returns AgentAction or AgentFinish.

  2. Tools : These are the actions the agent can take. For example, initiate HTTP requests, send emails, and execute commands.

  3. Toolkits : These are a set of tools designed for a specific use case. For example, in order for an agent to interact optimally with a SQL database, it may need one tool for executing queries and another tool for viewing tables. It can be seen as a collection of tools.

  4. Agent Executor : This wraps the agent with a set of tools. It is responsible for running the agent iteratively until the stopping condition is met.

Proxy execution flow:

576ac565ef9add724ce1e7809d0dc286.png

A case:

# 引入所需模块和类
from langchain.agents import load_tools  # 引入加载工具函数
from langchain.agents import initialize_agent  # 引入初始化代理函数
from langchain.agents import AgentType  # 引入代理类型类
from langchain.llms import OpenAI  # 引入OpenAI语言模型类
import os  # 引入os模块


# 创建OpenAI语言模型对象,设定temperature为0,即关闭随机性
llm = OpenAI(temperature=0)


# 加载所需工具,包括serpapi和llm-math
tools = load_tools(["serpapi", "llm-math"], llm=llm)


# 初始化代理对象,设定代理类型为ZERO_SHOT_REACT_DESCRIPTION,输出详细信息
agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)


# 运行代理对象,向其提问特朗普的年龄和年龄除以2的结果
agent.run("特朗普今年多少岁? 他的年龄除以2是多少?")

f4f6d99f8b2a61545370ee1f51d0c6c8.png

  • Proxy initialization type

In the above code, there is an initialization phase for the Agent, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, the agent type determines how the agent uses tools, processes input, and interacts with users. So as to provide users with targeted services. The types that can be selected are as follows:

initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)
  1. zero-shot-react-description : The proxy uses the ReAct framework to determine which tool to use based solely on the tool's description, any number of tools can be provided. A description is required for each tool.

  2. react-docstore : This proxy uses the ReAct framework to interact with the document store (docstore). Two tools must be provided: a search tool and a lookup tool (they must be named Search and Lookup exactly) . The search tool should be used to search for documents, while the lookup tool should be used to find terms in recently found documents. This proxy is equivalent to the original ReAct paper, specifically the Wikipedia example.

  3. self-ask-with-search : The agent uses a single tool called Intermediate Answer. This tool should be able to find factual answers to questions. This proxy is equivalent to the original self-ask and search paper, which provided the Google Search API as a tool.

  4. conversational-react-description : This proxy is intended to be used in a conversational setting . Hints make agents helpful in conversations. It uses the ReAct framework to decide which tool to use and uses memory to remember previous conversational interactions.

  5. structured-chat-zero-shot-react-description:   Arbitrary tools can be used in the chat and the context of the chat can be remembered.

  • Tools tool

The official has provided a series of toolboxes by default, such as sending Gmail, database query, JSON processing, etc.; there are also a list of individual tools, which can be seen in the document: https://python.langchain.com/en/ latest/modules/agents/tools/getting_started.html

We use a custom tool to understand how to use the tool, because when we use LangChain later, we will continue to customize the tool.

When writing tools, be prepared to:

  1. name

  2. Tool Description: Describe what your tool does

  3. Parameter structure: what structure is the input parameter required by the current tool

18b3d5bd0963d671a181d588982c3580.png

LangChain application case

Suppose we need to build an LLM-based question answering system, which needs to extract information from specified data sources to answer users' questions. We can use the data enhancement generation function in LangChain to interact with external data sources to obtain the required data. Then, the data is fed into the LLM to generate responses. Memory functions can help us maintain relevant state across multiple calls, thereby improving the performance of question answering systems. In addition, we can also use the intelligent agent function to realize the automatic optimization of the system. Finally, with the evaluation hints and chain implementation provided by LangChain, we can evaluate and optimize the performance of the question answering system.

   LangChain generates pictures

A language model-based text generation image tool is implemented, and different tool functions are called to finally generate images. The following tools are mainly provided:

  1. random_poem: Randomly return Chinese poems.

  2. prompt_generate: Generate corresponding English prompt words based on Chinese prompt words.

  3. generate_image: Generate corresponding pictures based on English prompt words.

import base64
import json
import os
from io import BytesIO


import requests
from PIL import Image
from pydantic import BaseModel, Field


from langchain.agents import AgentType, initialize_agent, load_tools
from langchain.chat_models import ChatOpenAI
from langchain.llms import OpenAI
from langchain.tools import BaseTool, StructuredTool, Tool, tool
from langchain import LLMMathChain, SerpAPIWrapper




def generate_image(prompt: str) -> str:
    """
    根据提示词生成对应的图片


    Args:
        prompt (str): 英文提示词


    Returns:
        str: 图片的路径
    """
    url = "http://127.0.0.1:7860/sdapi/v1/txt2img"
    headers = {
        "accept": "application/json",
        "Content-Type": "application/json"
    }
    data = {
        "prompt": prompt,
        "negative_prompt": "(worst quality:2), (low quality:2),disfigured, ugly, old, wrong finger",
        "steps": 20,
        "sampler_index": "Euler a",
        "sd_model_checkpoint": "cheeseDaddys_35.safetensors [98084dd1db]",
        # "sd_model_checkpoint": "anything-v3-fp16-pruned.safetensors [d1facd9a2b]",
        "batch_size": 1,
        "restore_faces": True
    }


    response = requests.post(url, headers=headers, data=json.dumps(data))


    if response.status_code == 200:
        response_data = response.json()
        images = response_data['images']


        for index, image_data in enumerate(images):
            img_data = base64.b64decode(image_data)
            img = Image.open(BytesIO(img_data))
            file_name = f"image_{index}.png"
            file_path = os.path.join(os.getcwd(), file_name)
            img.save(file_path)
            print(f"Generated image saved at {file_path}")
            return file_path
    else:
        print(f"Request failed with status code {response.status_code}")




def random_poem(arg: str) -> str:
    """
    随机返回中文的诗词


    Returns:
        str: 随机的中文诗词
    """
    llm = OpenAI(temperature=0.9)
    text = """
        能否帮我从中国的诗词数据库中随机挑选一首诗给我,希望是有风景,有画面的诗:
        比如:山重水复疑无路,柳暗花明又一村。
    """
    return llm(text)




def prompt_generate(idea: str) -> str:
    """
    生成图片需要对应的英文提示词


    Args:
        idea (str): 中文提示词


    Returns:
        str: 英文提示词
    """
    llm = OpenAI(temperature=0, max_tokens=2048)
    res = llm(f"""
    Stable Diffusion is an AI art generation model similar to DALLE-2.
    Below is a list of prompts that can be used to generate images with Stable Diffusion:


    - portait of a homer simpson archer shooting arrow at forest monster, front game card, drark, marvel comics, dark, intricate, highly detailed, smooth, artstation, digital illustration by ruan jia and mandy jurgens and artgerm and wayne barlowe and greg rutkowski and zdislav beksinski
    - pirate, concept art, deep focus, fantasy, intricate, highly detailed, digital painting, artstation, matte, sharp focus, illustration, art by magali villeneuve, chippy, ryan yee, rk post, clint cearley, daniel ljunggren, zoltan boros, gabor szikszai, howard lyon, steve argyle, winona nelson
    - ghost inside a hunted room, art by lois van baarle and loish and ross tran and rossdraws and sam yang and samdoesarts and artgerm, digital art, highly detailed, intricate, sharp focus, Trending on Artstation HQ, deviantart, unreal engine 5, 4K UHD image
    - red dead redemption 2, cinematic view, epic sky, detailed, concept art, low angle, high detail, warm lighting, volumetric, godrays, vivid, beautiful, trending on artstation, by jordan grimmer, huge scene, grass, art greg rutkowski
    - a fantasy style portrait painting of rachel lane / alison brie hybrid in the style of francois boucher oil painting unreal 5 daz. rpg portrait, extremely detailed artgerm greg rutkowski alphonse mucha greg hildebrandt tim hildebrandt
    - athena, greek goddess, claudia black, art by artgerm and greg rutkowski and magali villeneuve, bronze greek armor, owl crown, d & d, fantasy, intricate, portrait, highly detailed, headshot, digital painting, trending on artstation, concept art, sharp focus, illustration
    - closeup portrait shot of a large strong female biomechanic woman in a scenic scifi environment, intricate, elegant, highly detailed, centered, digital painting, artstation, concept art, smooth, sharp focus, warframe, illustration, thomas kinkade, tomasz alen kopera, peter mohrbacher, donato giancola, leyendecker, boris vallejo
    - ultra realistic illustration of steve urkle as the hulk, intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse mucha


    I want you to write me a list of detailed prompts exactly about the idea written after IDEA. Follow the structure of the example prompts. This means a very short description of the scene, followed by modifiers divided by commas to alter the mood, style, lighting, and more.


    IDEA: {idea}""")
    return res




class PromptGenerateInput(BaseModel):
    """
    生成英文提示词所需的输入模型类
    """
    idea: str = Field()




class GenerateImageInput(BaseModel):
    """
    生成图片所需的输入模型类
    """
    prompt: str = Field(description="英文提示词")




tools = [
    Tool.from_function(
        func=random_poem,
        name="诗歌获取",
        description="随机返回中文的诗词"
    ),
    Tool.from_function(
        func=prompt_generate,
        name="提示词生成",
        description="生成图片需要对应的英文提示词,当前工具可以将输入转换为英文提示词,以便方便生成",
        args_schema=PromptGenerateInput
    ),
    Tool.from_function(
        func=generate_image,
        name="图片生成",
        description="根据提示词生成对应的图片,提示词需要是英文的,返回是图片的路径",
        args_schema=GenerateImageInput
    ),
]




def main():
    """
    主函数,初始化代理并执行对话
    """
    llm = OpenAI(temperature=0)
    agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)
    agent.run("帮我生成一张诗词的图片?")




if __name__ == '__main__':
    main()

2d93b146d778b467de44aad0bae637ae.png

93da9274b128ef97ab90bd5f0df8364e.png

   LangChain Q&A

Refer to the index section above:

import os
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.indexes import VectorstoreIndexCreator
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.llms import OpenAI


# 设置代理
os.environ['HTTP_PROXY'] = 'socks5h://127.0.0.1:13659'
os.environ['HTTPS_PROXY'] = 'socks5h://127.0.0.1:13659'


# 创建文本加载器
loader = TextLoader('/Users/aihe/Downloads/demo.txt', encoding='utf8')


# 加载文档
documents = loader.load()


# 文本分块
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)


# 计算嵌入向量
embeddings = OpenAIEmbeddings()


# 创建向量库
db = Chroma.from_documents(texts, embeddings)


# 将向量库转换为检索器
retriever = db.as_retriever()


# 创建检索问答系统
qa = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type="stuff", retriever=retriever)


# 运行问题答案检索
query = "如何申请租户?"
print(qa.run(query))


print(qa.run("能否说明下你可以提供的功能?"))

3a0782454e983e7f6f98c07b5a0fcf70.png

   Langchain outputs structured JSON data

Referring to the above concepts, the prompt word tool provides an OutputParser that can convert our object into a prompt word and tell LLM what structure to return.

import requests
from langchain.agents import AgentType, initialize_agent
from langchain.chat_models import ChatOpenAI
from langchain.tools import StructuredTool
from pydantic import BaseModel, Field




def post_message(type: str, param: dict) -> str:
    """
     当需要生成人群、分析画像、咨询问题时,使用如下的指示:url 固定为:http://localhost:3001/
     如果请求是生成人群,请求的type为crowd; 如果请求是分析画像,请求的type为analyze; 如果是其他或者答疑,请求的type为question;
     请求body的param把用户指定的条件传进来即可
     """
    result = requests.post("http://localhost:3001/", json={"type": type, "param": param})
    return f"Status: {result.status_code} - {result.text}"




class PostInput(BaseModel):
    # body: dict = Field(description="""格式:{"type":"","param":{}}""")
    type: str = Field(description="请求的类型,人群为crowd,画像为analyze")
    param: dict = Field(description="请求的具体描述")




llm = ChatOpenAI(temperature=0)
tools = [
    StructuredTool.from_function(post_message)
]
agent = initialize_agent(tools, llm, agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION, verbose=True)
agent.run("我想生成一个性别为男并且在180天访问过淘特的人群?")

e32ba1a1e090deae655132ddc30d1ae7.png

   LangChain makes its own chat robot

Originally, some front-end code was needed to make a chat robot, but there are already corresponding open source tools that help us visualize various components of LangChian, just drag and drop, we directly use LangFlow;

pip install langflow

Then run the command:

langfow

If there is a conflict with the local LangChain, you can use Docker to run langfow:

FROM python:3.10-slim


RUN apt-get update && apt-get install gcc g++ git make -y
RUN useradd -m -u 1000 user
USER user
ENV HOME=/home/user \
    PATH=/home/user/.local/bin:$PATH


WORKDIR $HOME/app


COPY --chown=user . $HOME/app


RUN pip install langflow>==0.0.71 -U --user
CMD ["langflow", "--host", "0.0.0.0", "--port", "7860"]

Configure the three components of LangChain on the interface: in the lower right corner is the corresponding chat window, enter the key of openai

145b283067ec5137003a004e258b10e1.png

Start chatting to verify our configuration:

33a54076f6eb5a2c480694cba770e512.png

Basically, you don’t need to write any code in the whole process. You only need to understand what the components of LangChain do, and you can basically build a simple chat robot.

Other LangChain component agents, memory, and data indexes can also be used.

e54bb7cd70c57dc89c47442807835933.png

LangChain's future outlook

LangChain provides a powerful framework for building applications based on large-scale language models, which will be gradually applied to various fields, such as intelligent customer service, text generation, knowledge graph construction, etc. As more tools and resources are integrated with LangChain, the productivity of large language models will be greatly improved.

Application scenario idea:

  1. Intelligent customer service : Combining chat models, autonomous intelligent agents and question-and-answer functions, develop an intelligent customer service system to help users solve problems and improve customer satisfaction.

  2. Personalized recommendation : Use intelligent agents and text embedding models to analyze user interests and behaviors, and provide users with personalized content recommendations.

  3. Knowledge map construction : By combining functions such as question answering, text summarization, and entity extraction, knowledge is automatically extracted from documents to build a knowledge map.

  4. Automatic summarization and key information extraction : Use LangChain's text summarization and extraction functions to extract key information from a large amount of text and generate concise and easy-to-understand summaries.

  5. Code review assistant : Analyze code quality through code understanding and intelligent agent functions, and provide developers with automated code review suggestions.

  6. Search engine optimization : Combining text embedding models and intelligent agents to analyze the relevance of web content and user queries to improve search engine rankings.

  7. Data analysis and visualization : By interacting with the API and querying tabular data functions, the data is automatically analyzed and a visual report is generated to help users understand the insights in the data.

  8. Intelligent programming assistant : Combining code understanding and intelligent agent functions, it can automatically generate code fragments according to user input requirements, improving the developer's work efficiency.

  9. Online education platform : Utilizes the Q&A and chat model functions to provide students with real-time academic support and help them solve problems encountered in their studies.

  10. Automated testing : Combine intelligent agents and agent simulation functions to develop automated testing scenarios and improve the efficiency and coverage of software testing.

There is now a platform that does most of it: https://zapier.com/l/natural-language-actions. The bottom layer is the OpenAI model, using zapier to connect thousands of tools. Various tools can be configured on this platform, and the model will choose the corresponding action according to your goal. Job content automation is not far away and will soon be realized

b7f7b48b829c32177962251c58ef462f.png

d5e8af832efce3e4dbedc53890a1cefb.png

Summarize

This paper introduces the LangChain framework, which enables more powerful applications by combining large language models with other sources of computation or knowledge. Then, the key concepts of LangChain are explained in detail, and some case attempts are made based on this framework, aiming to help readers understand the working principle of LangChain more easily.

Looking ahead, LangChain is expected to play a huge role in various fields and promote the transformation of our work efficiency. We are on the eve of the explosion of AI, and actively embracing new technologies will bring a completely different feeling.

References:

  1. Getting started with Langchain in Chinese: https://github.com/liaokongVFX/LangChain-Chinese-Getting-Started-Guide

  2. LangChain official documentation: https://python.langchain.com/en/latest/modules/indexes/getting_started.html

  3. LangFlow, a visual orchestration tool for LangChain: https://github.com/logspace-ai/langflow

6eed2d42bb157e9c0ffa39358b215ac0.png

team introduction

The technical team of Taobao’s technical user operation platform is a young team that understands users best and is driven by technology. It is user-centered, improves users’ full life cycle experience through technological innovation, and continues to create value for users.

The team regards innovation as one of its core values, and encourages team members to continuously explore, experiment and innovate in their work, so as to promote the advancement of technology in the industry and the improvement of user experience. We not only pay attention to the current industry-leading technology, but also pay more attention to the pre-research and application of future technology. Team members will actively participate in academic research and technical communities, and constantly explore new technical directions and solutions.

Based on the system, the team builds the industry-leading user growth infrastructure. The infrastructure represented by the media outsourcing platform, the ABTest platform, and the user operation platform empowers the user growth of the Ali Group, with an average daily processing data volume of 100 billion scale and calling QPS of 10 million class. In the user growth technology team, we provide a "growth hacker" geek atmosphere and a wealth of job options, and welcome talents from the industry to join.

¤  Extended reading  ¤

3DXR Technology  |  Terminal Technology  |  Audio and Video Technology

Server Technology  |  Technical Quality  |  Data Algorithms

Guess you like

Origin blog.csdn.net/Taobaojishu/article/details/131606537