LangChain (0.0.340) official document five: Model

LangChain official post, LangChain official text, langchain Github , langchain API文档, llm-universe

1. Chat models

参考《Chat models》

1.1 Introduction to Chat models

  LCEL provides a declarative way to composeRunnables into chains. It is a standard interface that makes it easy to define custom chains and They are called in the standard way, and can also be used for batch processing, stream processing, etc. The standard interface includes the following methods (the prefix 'a' indicates the corresponding asynchronous processing method):

  • invoke/ainvoke: Processing a single input
  • batch/abatch: batch processing of multiple input lists
  • stream/astream: Streams a single input and produces output
  • astream_log: Streaming returns data from intermediate steps and final response data

  All ChatModels implement the Runnable interface, which comes with default implementations of all the above methods, which provides basic support for asynchronous, streaming and batch processing for all ChatModels. In addition, the Stream method also needs to be supported by the ChatModel manufacturer itself. For details on the integration of various ChatModels, see "Chat models".

  The underlying implementation of Chat models is LLMs, but it does not use the "text input, text output" API, but uses "chat messages" as the input and output interface, that is, the chat model is based on messages (< a i=1> ) instead of the original text. In langchain, the message interface is defined by BaseMessage, which has two required attributes:List[BaseMessage]

  • content: The content of the message, usually a string.
  • role: The entity category of the message source (BaseMessage), such as:
    • HumanMessage: BaseMessage from human/user.
    • AIMessage: BaseMessage from AI/Assistant.
    • SystemMessage: BaseMessage from the system.
    • FunctionMessage/ToolMessage: BaseMessage containing the output of a function or tool call.
    • ChatMessage: If none of the above roles are suitable, you can customize the role. For details, see"Types of MessagePromptTemplate".
    • The message can also be a str (which will be automatically converted to a HumanMessage ) and a PromptValue (the value of the PromptTemplate).

  Correspondingly, LangChain provides different types of MessagePromptTemplate . The most commonly used ones are AIMessagePromptTemplate , SystemMessagePromptTemplate and HumanMessagePromptTemplate, which create AI messages, system messages and user messages respectively.

1.2 How to call Chat models

  In simple applications, it is okay to use LLM alone, but in more complex applications, it may be necessary to chain multiple large language models or chain calls with other components to process multiple inputs simultaneously. deal with. There are two ways to combine into a Chain:

  • Use built-inChain, such as LLMChain (basic chain type), SequentialChain (processing single input and single output), Router Chain (same input router to different outputs).
  • Use the latestLCEL (LangChain Expression Language) framework to implement chaining and call various implementation methods of Runnables.
1.2.1 Configure Baidu Qianfan

  This chapter takes Baidu Wenxin Yiyan as an example for demonstration. To call the Wenxinyiyan API, you need to obtain the Wenxinyiyan call key first. First we need to enter the Wenxin Qianfan Service Platform, register and log in, and select "Application Access" - "Create Application". Then simply enter basic information, select the default configuration, and create the application.

Insert image description here

  After creation, click the "Details" of the application to see the AppID,API Key,Secret Key of this application. Then go to Baidu Intelligent Cloud Online Debugging Platform-Sample Code Center Quick Debugging Interface to obtain the AccessToken (for any confusion, please see API documentation). Finally, use vim .env (Linux) or type nul > .env (Windows cmd) to create .env file in the project folder and write it in it :

QIANFAN_AK="xxx"
QIANFAN_SK="xxx"
access_token="xxx"

Configure these variables into the environment below, and they can be used automatically later.

# 使用openai、智谱ChatGLM、百度文心需要分别安装openai,zhipuai,qianfan
import os
import openai,zhipuai,qianfan
from langchain.llms import ChatGLM
from langchain.chat_models import ChatOpenAI,QianfanChatEndpoint

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file
openai.api_key =os.environ['OPENAI_API_KEY']
zhipuai.api_key =os.environ['ZHIPUAI_API_KEY']
qianfan.qianfan_ak=os.environ['QIANFAN_AK']
qianfan.qianfan_sk=os.environ['QIANFAN_SK']
from langchain.schema.messages import HumanMessage,SystemMessage

messages = [
    SystemMessage(content="You're a helpful assistant"),
    HumanMessage(content="What is the purpose of model regularization?"),
]
chat = QianfanChatEndpoint()

For more information, please refer to"Baidu Qianfan"

1.2.2 Call Chat models using LCEL method
  1. Regular call
chat.invoke(messages)		# 处理单个输入
chat.batch([messages])		# 批处理输入
# 流式处理输入
for chunk in chat.stream(messages):
    print(chunk.content, end="", flush=True)
AIMessage(content='模型正则化的目的是为了防止模型过拟合,提高模型的泛化能力。模型正则化通过引入额外的项,使得模型在训练过程中需要最小化这些项的损失,从而约束模型的复杂度,避免模型对训练数据中的噪声和异常值过度拟合。这样可以提高模型的泛化能力,使得模型在未见过的数据上也能表现良好。常见的模型正则化方法包括L1正则化、L2正则化、dropout等。', additional_kwargs={'id': 'as-2y2heq69su', 'object': 'chat.completion', 'created': 1701506605, 'result': '模型正则化的目的是为了防止模型过拟合,提高模型的泛化能力。模型正则化通过引入额外的项,使得模型在训练过程中需要最小化这些项的损失,从而约束模型的复杂度,避免模型对训练数据中的噪声和异常值过度拟合。这样可以提高模型的泛化能力,使得模型在未见过的数据上也能表现良好。常见的模型正则化方法包括L1正则化、L2正则化、dropout等。', 'is_truncated': False, 'need_clear_history': False, 'usage': {'prompt_tokens': 9, 'completion_tokens': 101, 'total_tokens': 110}})

Let’s take another simple example:

answer=chat.invoke("使用‘白小林’写一首三行诗")
answer
AIMessage(content='白小林游历天涯,\n心灵轻舞映朝霞。\n笑看人生四季花。', additional_kwargs={'id': 'as-q59v21q44t', 'object': 'chat.completion', 'created': 1701510940, 'result': '白小林游历天涯,\n心灵轻舞映朝霞。\n笑看人生四季花。', 'is_truncated': False, 'need_clear_history': False, 'usage': {'prompt_tokens': 11, 'completion_tokens': 22, 'total_tokens': 33}})

The result returned by is an AIMessage, which can be converted into dictionary format using dict for subsequent processing:

print(answer.dict()['content'])
白小林游历天涯,
心灵轻舞映朝霞。
笑看人生四季花。
print(answer.dict()['additional_kwargs']['usage'])
{'prompt_tokens': 11, 'completion_tokens': 22, 'total_tokens': 33}
  1. Asynchronous call:
# 异步处理输入
await chat.ainvoke(messages)
async for chunk in chat.astream(messages):
    print(chunk.content, end="", flush=True)
async for chunk in chat.astream_log(messages):
    print(chunk)
1.2.3 Call Chat models using built-in Chain

In the old version of LangChain, one or more messages can be passed to the chat model. To complete the chat, the output is a message.

chat(messages)  		# 输出和chat.invoke(messages)一样

You can also use generate to batch process multiple messages:

batch_messages = [
    [
        SystemMessage(
            content="You are a helpful assistant that translates English to French."
        ),
        HumanMessage(content="I love programming."),
    ],
    [
        SystemMessage(
            content="You are a helpful assistant that translates English to French."
        ),
        HumanMessage(content="I love artificial intelligence."),
    ],
]
result = chat.generate(batch_messages)
result
LLMResult(generations=[[ChatGeneration(text='非常好!编程是一项非常有趣和有挑战性的工作。你更喜欢哪种类型的编程?', generation_info={'finish_reason': 'stop', 'id': 'as-eczv9t6wdu', 'object': 'chat.completion', 'created': 1701507897, 'result': '非常好!编程是一项非常有趣和有挑战性的工作。你更喜欢哪种类型的编程?', 'is_truncated': False, 'need_clear_history': False, 'usage': {'prompt_tokens': 4, 'completion_tokens': 19, 'total_tokens': 23}}, message=AIMessage(content='非常好!编程是一项非常有趣和有挑战性的工作。你更喜欢哪种类型的编程?', additional_kwargs={'id': 'as-eczv9t6wdu', 'object': 'chat.completion', 'created': 1701507897, 'result': '非常好!编程是一项非常有趣和有挑战性的工作。你更喜欢哪种类型的编程?', 'is_truncated': False, 'need_clear_history': False, 'usage': {'prompt_tokens': 4, 'completion_tokens': 19, 'total_tokens': 23}}))], [ChatGeneration(text='很好,你对人工智能的热爱很令人赞赏。人工智能技术已经变得越来越重要,它正在改变我们的生活和工作方式。', generation_info={'finish_reason': 'stop', 'id': 'as-7gc409h5d1', 'object': 'chat.completion', 'created': 1701507898, 'result': '很好,你对人工智能的热爱很令人赞赏。人工智能技术已经变得越来越重要,它正在改变我们的生活和工作方式。', 'is_truncated': False, 'need_clear_history': False, 'usage': {'prompt_tokens': 5, 'completion_tokens': 24, 'total_tokens': 29}}, message=AIMessage(content='很好,你对人工智能的热爱很令人赞赏。人工智能技术已经变得越来越重要,它正在改变我们的生活和工作方式。', additional_kwargs={'id': 'as-7gc409h5d1', 'object': 'chat.completion', 'created': 1701507898, 'result': '很好,你对人工智能的热爱很令人赞赏。人工智能技术已经变得越来越重要,它正在改变我们的生活和工作方式。', 'is_truncated': False, 'need_clear_history': False, 'usage': {'prompt_tokens': 5, 'completion_tokens': 24, 'total_tokens': 29}}))]], llm_output={}, run=[RunInfo(run_id=UUID('a81202f3-8210-44f9-959c-0e6af4686ae7')), RunInfo(run_id=UUID('564b8ec4-3531-4b6e-b50c-1baafd16e8ff'))])

1.3 Caching

Reference《Caching》

  LangChain provides an optional caching layer for chat models. If you often make the same request multiple times, it can reduce the number of API calls, thereby reducing costs to a certain extent and speeding up the response speed of the application.

from langchain.globals import set_llm_cache
from langchain.chat_models import ChatOpenAI

llm = QianfanChatEndpoint()
1.3.1 Memory cache
from langchain.cache import InMemoryCache
set_llm_cache(InMemoryCache())

# 第一次生成时间较长
llm.predict("Tell me a joke")
CPU times: user 35.9 ms, sys: 28.6 ms, total: 64.6 ms
Wall time: 4.83 s


"\n\nWhy couldn't the bicycle stand up by itself? It was...two tired!"
# 第二次从内存加载,时间更短
llm.predict("Tell me a joke")
CPU times: user 238 µs, sys: 143 µs, total: 381 µs
Wall time: 1.76 ms


'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'
1.3.2 SQLite cache​

  ˆ SQLite is an embedded relational database engine that is a zero-configuration, serverless, self-contained database system. Unlike traditional database management systems (DBMS), SQLite does not require a separate server process to run, but is embedded directly into the application. This makes SQLite a lightweight, serverless, self-contained database system suitable for small applications and embedded devices.

rm .langchain.db

  Next, set LangChain's LLM cache to SQLiteCache, and specify the file path of the SQLite database as ".langchain.db".

from langchain.cache import SQLiteCache
set_llm_cache(SQLiteCache(database_path=".langchain.db"))
llm.predict("Tell me a joke")
CPU times: user 17 ms, sys: 9.76 ms, total: 26.7 ms
Wall time: 825 ms


'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'
# The second time it is, so it goes faster
llm.predict("Tell me a joke")
CPU times: user 2.46 ms, sys: 1.23 ms, total: 3.7 ms
Wall time: 2.67 ms


'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

1.4 Prompts

1.4.1 Create using (role, content)

  ChatPromptTemplate is a chat message list for chat models. Each chat message is associated with content and has an additional parameter calledrole(role). For example, in the OpenAI Chat Completion API, chat messages can be associated with AI assistants, humans, or system roles. Create a chat prompt template like this:

from langchain.prompts import ChatPromptTemplate

chat_template = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a helpful AI bot. Your name is {name}."),
        ("human", "Hello, how are you doing?"),
        ("ai", "I'm doing well, thanks!"),
        ("human", "{user_input}"),
    ]
)

messages = chat_template.format_messages(name="Bob", user_input="What is your name?")
messages
[SystemMessage(content='You are a helpful AI bot. Your name is Bob.'),
 HumanMessage(content='Hello, how are you doing?'),
 AIMessage(content="I'm doing well, thanks!"),
 HumanMessage(content='What is your name?')]
1.4.2 Create using MessagePromptTemplate

  ChatPromptTemplate.from_messages accepts multiple message representation methods. For example, in addition to using the 2-tuple representation of (type, content) mentioned above, you can also pass in an instance of MessagePromptTemplate or BaseMessage, This gives you a lot of flexibility when building chat prompts, as demonstrated below using Baidu Qianfan.

import os
import openai,qianfan

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file
openai.api_key =os.environ['OPENAI_API_KEY']
qianfan.qianfan_ak=os.environ['QIANFAN_AK']
qianfan.qianfan_sk=os.environ['QIANFAN_SK']
  1. Using MessagePromptTemplate
from langchain.chat_models import QianfanChatEndpoint
from langchain.prompts import HumanMessagePromptTemplate,SystemMessagePromptTemplate

template="You are a helpful assistant that translates {input_language} to {output_language}."
system_message_prompt = SystemMessagePromptTemplate.from_template(template)
human_template="{text}"
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)
chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])

chat = QianfanChatEndpoint()
chat(chat_prompt.format_prompt(input_language="English", output_language="French", text="I love programming.").to_messages())
AIMessage(content='编程是一项非常有趣和有挑战性的工作,我很羡慕你能够享受其中的乐趣。', additional_kwargs={'id': 'as-cxezsmtfga', 'object': 'chat.completion', 'created': 1701520678, 'result': '编程是一项非常有趣和有挑战性的工作,我很羡慕你能够享受其中的乐趣。', 'is_truncated': False, 'need_clear_history': False, 'usage': {'prompt_tokens': 4, 'completion_tokens': 18, 'total_tokens': 22}})
  1. Example of using BaseMessage
from langchain.schema.messages import SystemMessage

chat_template = ChatPromptTemplate.from_messages(
    [
        SystemMessage(
            content=(
                "You are a helpful assistant that re-writes the user's text to "
                "sound more upbeat."
            )
        ),
        HumanMessagePromptTemplate.from_template("{text}"),
    ]
)


chat(chat_template.format_messages(text="i dont like eating tasty things."))
AIMessage(content='很抱歉听到您不喜欢吃美味的食物。您有其他喜欢的食物类型吗?或许我们可以找到一些其他您喜欢吃的食物,您试试看是否能够喜欢呢?', additional_kwargs={
    
    'id': 'as-sdcbpxad11', 'object': 'chat.completion', 'created': 1701520841, 'result': '很抱歉听到您不喜欢吃美味的食物。您有其他喜欢的食物类型吗?或许我们可以找到一些其他您喜欢吃的食物,您试试看是否能够喜欢呢?', 'is_truncated': False, 'need_clear_history': False, 'usage': {
    
    'prompt_tokens': 8, 'completion_tokens': 34, 'total_tokens': 42}})

1.5 Tracking Token Usage

This section describes how to track token usage for calls, currently only implemented for the OpenAI API.

1.5.1 Tracking a single Chat model
from langchain.callbacks import get_openai_callback
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(model_name="gpt-4")

with get_openai_callback() as cb:
    result = llm.invoke("Tell me a joke")
    print(cb)
Tokens Used: 24
    Prompt Tokens: 11
    Completion Tokens: 13
Successful Requests: 1
Total Cost (USD): $0.0011099999999999999

Track multiple requests in sequence:

with get_openai_callback() as cb:
    result = llm.invoke("Tell me a joke")
    result2 = llm.invoke("Tell me a joke")
    print(cb.total_tokens)
48
1.5.2 Track chain or agent
from langchain.agents import AgentType, initialize_agent, load_tools
from langchain.llms import OpenAI

tools = load_tools(["serpapi", "llm-math"], llm=llm)
agent = initialize_agent(tools, llm, agent=AgentType.OPENAI_FUNCTIONS, verbose=True)
with get_openai_callback() as cb:
    response = agent.run(
        "Who is Olivia Wilde's boyfriend? What is his current age raised to the 0.23 power?"
    )
    print(f"Total Tokens: {
      
      cb.total_tokens}")
    print(f"Prompt Tokens: {
      
      cb.prompt_tokens}")
    print(f"Completion Tokens: {
      
      cb.completion_tokens}")
    print(f"Total Cost (USD): ${
      
      cb.total_cost}")
> Entering new AgentExecutor chain...

Invoking: `Search` with `Olivia Wilde's current boyfriend`


['Things are looking golden for Olivia Wilde, as the actress has jumped back into the dating pool following her split from Harry Styles — read ...', "“I did not want service to take place at the home of Olivia's current partner because Otis and Daisy might be present,” Sudeikis wrote in his ...", "February 2021: Olivia Wilde praises Harry Styles' modesty. One month after the duo made headlines with their budding romance, Wilde gave her new beau major ...", 'An insider revealed to People that the new couple had been dating for some time. "They were in Montecito, California this weekend for a wedding, ...', 'A source told People last year that Wilde and Styles were still friends despite deciding to take a break. "He\'s still touring and is now going ...', "... love life. “He's your typical average Joe.” The source adds, “She's not giving too much away right now and wants to keep the relationship ...", "Multiple sources said the two were “taking a break” from dating because of distance and different priorities. “He's still touring and is now ...", 'Comments. Filed under. celebrity couples · celebrity dating · harry styles · jason sudeikis · olivia wilde ... Now Holds A Darker MeaningNYPost.', '... dating during filming. The 39-year-old did however look very cosy with the comedian, although his relationship status is unknown. Olivia ...']
Invoking: `Search` with `Harry Styles current age`
responded: Olivia Wilde's current boyfriend is Harry Styles. Let me find out his age for you.

29 years
Invoking: `Calculator` with `29 ^ 0.23`


Answer: 2.169459462491557Harry Styles' current age (29 years) raised to the 0.23 power is approximately 2.17.

> Finished chain.
Total Tokens: 1929
Prompt Tokens: 1799
Completion Tokens: 130
Total Cost (USD): $0.06176999999999999

2. LLMs

reference""

2.1 Calling of LLMs

  Like Chat models, LLM also implements the Runnable interface, so you can also use LCEL syntax to make calls, such as the invoke method:

from langchain.llms import QianfanLLMEndpoint

llm = QianfanLLMEndpoint()
print(llm.invoke("使用‘白小林’写一首三行诗"))
白小林在林间漫步,
阳光洒落笑语连连。
林中鸟鸣声声醉,
白小林心中乐无边。

  Other iainvoke, batch/abatch, stream/astream, and astream_log methods are the same as Chat models, so they will not be demonstrated one by one. In addition, you can also use the built-in chain method to call:

llm("Tell me a joke")
'\n\nQ: What did the fish say when it hit the wall?\nA: Dam!'
llm_result = llm.generate(["Tell me a joke", "Tell me a poem"] * 3)
len(llm_result.generations)
6

In addition, for details about whether each LLMs supports asynchronous calls and streaming calls, see the Model table in LLMs.

2.2 Asynchronous API

Reference 《Async API》

  LLM provides basic support for asynchronous calls by implementing the Runnable interface. If the LLM provider has native asynchronous implementations, these implementations will be used first. Otherwise, LLM's asynchronous support method will be used by default, and the calls will be moved to the background thread to allow other users in the application to The asynchronous function continues execution.

import asyncio
import time
from langchain.llms import OpenAI

llm = OpenAI(model="gpt-3.5-turbo-instruct", temperature=0.9)

def invoke_serially():
    for _ in range(10):
        resp = llm.invoke("Hello, how are you?")

async def async_invoke(llm):
    resp = await llm.ainvoke("Hello, how are you?")

async def invoke_concurrently():
    tasks = [async_invoke(llm) for _ in range(10)]
    await asyncio.gather(*tasks)


s = time.perf_counter()
# If running this outside of Jupyter, use asyncio.run(generate_concurrently())
await invoke_concurrently()
elapsed = time.perf_counter() - s
print("\033[1m" + f"Concurrent executed in {
      
      elapsed:0.2f} seconds." + "\033[0m")

s = time.perf_counter()
invoke_serially()
elapsed = time.perf_counter() - s
print("\033[1m" + f"Serial executed in {
      
      elapsed:0.2f} seconds." + "\033[0m")
Concurrent executed in 1.03 seconds.
Serial executed in 6.80 seconds.

To simplify the code, we can also use abatch for asynchronous batch processing:

s = time.perf_counter()
# If running this outside of Jupyter, use asyncio.run(generate_concurrently())
await llm.abatch(["Hello, how are you?"] * 10)
elapsed = time.perf_counter() - s
print("\033[1m" + f"Batch executed in {
      
      elapsed:0.2f} seconds." + "\033[0m")
Batch executed in 1.31 seconds.
method illustrate execution time
invoke_serially() Use a loop to perform the LLM operation 10 times About 6.8 seconds
invoke_concurrently() Use ainvoke to perform the operation asynchronously 10 times in a loop About 1.03 seconds, demonstrating the advantages of parallel execution
abatch Batch requests (asynchronous) About 1.31 seconds, demonstrating the efficiency of batch processing
Native asynchronous implementation If the LLM provider supports native asynchronous operations, it may be more efficient than the default asynchronous functions provided

2.3 Custom LLM

2.3.1 Simple implementation of custom LLM

参考《Custom LLM》

For your own LLM or LLM that LangChain does not yet have encapsulation support, you can create a custom LLM wrapper. You only need to implement:

  • _callMethod: This method receives a string and some optional stop words and returns a string
  • _identifying_paramsAttributes (optional): Used to help display information when printing this class (dictionary type)

Below is a very simple custom LLM, whose function is to return the first n characters of the input text.

from typing import Any, List, Mapping, Optional

from langchain.callbacks.manager import CallbackManagerForLLMRun
from langchain.llms.base import LLM
class CustomLLM(LLM):
    n: int

    @property
    def _llm_type(self) -> str:
        return "custom"

    def _call(
        self,
        prompt: str,
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> str:
        if stop is not None:
            raise ValueError("stop kwargs are not permitted.")
        return prompt[: self.n]

    @property
    def _identifying_params(self) -> Mapping[str, Any]:
        """Get the identifying parameters."""
        return {
    
    "n": self.n}

This code defines a class named CustomLLM, which inherits from the LLM base class and implements custom LLM functions.

  • n: int: This is a class attribute used to store an integer value. It represents the number of characters that will be used later in the _call method.

  • _llm_typeMethod: This is a private method that returns a string describing this LLM type. Here, "custom" is returned, which is used to identify this as a custom LLM type.

  • _call method: This method must be implemented. It receives a string prompt as input and can also receive an optional stop parameter (a list of strings), and any other keyword arguments. The function of this method is to return the first characters of the given prompt string. If the parameter is passed in, an exception will be thrown, because the parameter is not allowed here. nstopValueErrorstop

  • _identifying_params method: This is an attribute method that returns a dictionary containing LLM identification parameters. In this example, it returns a dictionary containing a parameter with key "n" and its value is self.n, which is the current number of characters.

Then we can call it like other LLM:

llm = CustomLLM(n=10)
llm("This is a foobar thing")
'This is a '

Print this LLM to see the information:

print(llm)
CustomLLM
Params: {'n': 10}
2.3.2 Custom zhipuai&Baidu Wenxin LLM

  There is a custom implementation of zhipuai LLM in llm-universe projectzhipuai_llm.py, and Define Baidu Wenxin LLM implementationwenxin_llm.py for reference (previous versions of LangChain had many models integrated monthly).

2.4 Caching

参考《Caching》《LLM Caching integrations》

Like the Chat model, LLM can also use caching to obtain duplicate request results.

#### 2.4.1 内存缓存
from langchain.globals import set_llm_cache
from langchain.llms import OpenAI

# To make the caching really obvious, lets use a slower model.
llm = OpenAI(model_name="text-davinci-002", n=2, best_of=2)
2.4.1 Memory cache
from langchain.cache import InMemoryCache
set_llm_cache(InMemoryCache())

# The first time, it is not yet in cache, so it should take longer
llm.predict("Tell me a joke")
# The second time it is, so it goes faster
llm.predict("Tell me a joke")
2.4.2 SQLite cache​
rm .langchain.db
# We can do the same thing with a SQLite cache
from langchain.cache import SQLiteCache
set_llm_cache(SQLiteCache(database_path=".langchain.db"))

# The first time, it is not yet in cache, so it should take longer
llm.predict("Tell me a joke")
# The second time it is, so it goes faster
llm.predict("Tell me a joke")
2.4.3 Turn off the cache of a specific LLM

  You can also choose to turn off the cache of a specific LLM after enabling the global cache.

from langchain.llms import OpenAI

llm = OpenAI(model_name="text-davinci-002", n=2, best_of=2, cache=False)
llm("Tell me a joke")
CPU times: user 5.8 ms, sys: 2.71 ms, total: 8.51 ms
Wall time: 745 ms

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'
llm("Tell me a joke")  # 两次时间差不多,表示没有启用缓存的结果
CPU times: user 4.91 ms, sys: 2.64 ms, total: 7.55 ms
Wall time: 623 ms

'\n\nTwo guys stole a calendar. They got six months each.'
2.4.4 Optional chain cache​

  You can also cache only some nodes in the chain. In the following example, we will load a Summarizer Map-Reduce chain. We will cache the results of the Map step, but not the results of the Combine step (the intermediate results are cached and loaded, and the final results are generated directly).

  First create two OpenAI LLM instances, one of which turns off the cache. Then, use CharacterTextSplitter to read the text content of state_of_the_union.txt, divide it into multiple documents, and create corresponding Document objects.

llm = OpenAI(model_name="text-davinci-002")
no_cache_llm = OpenAI(model_name="text-davinci-002", cache=False)
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains.mapreduce import MapReduceChain
from langchain.docstore.document import Document

text_splitter = CharacterTextSplitter()
with open('../../../state_of_the_union.txt') as f:
    state_of_the_union = f.read()
texts = text_splitter.split_text(state_of_the_union)
docs = [Document(page_content=t) for t in texts[:3]]

  Next, use the load_summarize_chain method to load the Map-Reduce chain of the summarizer, and specify the corresponding LLM and chain type. Here, a cache-enabled LLM is used for the Map step, and a cache-disabled LLM is used for the Reduce step.

from langchain.chains.summarize import load_summarize_chain

chain = load_summarize_chain(llm, chain_type="map_reduce", reduce_llm=no_cache_llm)
chain.run(docs)
CPU times: user 452 ms, sys: 60.3 ms, total: 512 ms
Wall time: 5.09 s


'\n\nPresident Biden is discussing the American Rescue Plan and the Bipartisan Infrastructure Law, which will create jobs and help Americans. He also talks about his vision for America, which includes investing in education and infrastructure. In response to Russian aggression in Ukraine, the United States is joining with European allies to impose sanctions and isolate Russia. American forces are being mobilized to protect NATO countries in the event that Putin decides to keep moving west. The Ukrainians are bravely fighting back, but the next few weeks will be hard for them. Putin will pay a high price for his actions in the long run. Americans should not be alarmed, as the United States is taking action to protect its interests and allies.'

  Again, after running the chain, the execution speed is significantly improved, but the final result is different from the result of the previous run. This is because it is cached in the Map step, but not cached in the Reduce step, but generated directly.

chain.run(docs)
CPU times: user 11.5 ms, sys: 4.33 ms, total: 15.8 ms
Wall time: 1.04 s


'\n\nPresident Biden is discussing the American Rescue Plan and the Bipartisan Infrastructure Law, which will create jobs and help Americans. He also talks about his vision for America, which includes investing in education and infrastructure.'

In addition, there are Redis Cache, GPTCache, Momento Cache, etc., see "LLM Caching integrations" for details.

2.5 Serialization

  Serialization refers to converting the state of an object into a byte stream or formats such as JSON and XML, so that it can be transmitted or persisted between different systems, programming languages, or storage media. Deserialization is the process of converting serialized data into objects again.

  LangChain's Python and LangChain's JavaScript share a serialization scheme, that is, whether the code is written in Python or JavaScript, the same mechanism and format can be used when processing the serialization and deserialization of LangChain objects, providing cross-language interaction and data sharing capabilities.

  You can check whether a LangChain class is serializable by running the class method is_lc_serializable.

from langchain.llms import OpenAI
from langchain.llms.loading import load_llm

OpenAI.is_lc_serializable()
True
2.5.1 Storage

Any serializable object can be serialized into a dict or json string. The dumpd and dumps methods are demonstrated below.

from langchain.load import dumpd, dumps

llm = OpenAI(model="gpt-3.5-turbo-instruct")
dumpd(llm)
{'lc': 1,
 'type': 'constructor',
 'id': ['langchain', 'llms', 'openai', 'OpenAI'],
 'kwargs': {'model': 'gpt-3.5-turbo-instruct',
  'openai_api_key': {'lc': 1, 'type': 'secret', 'id': ['OPENAI_API_KEY']}}}
dumps(llm)
'{"lc": 1, "type": "constructor", "id": ["langchain", "llms", "openai", "OpenAI"], "kwargs": {"model": "gpt-3.5-turbo-instruct", "openai_api_key": {"lc": 1, "type": "secret", "id": ["OPENAI_API_KEY"]}}}'
2.5.2 Loading
from langchain.load import loads
from langchain.load.load import load

loaded_1 = load(dumpd(llm))
loaded_2 = loads(dumps(llm))
print(loaded_1.invoke("How are you doing?"))
I am an AI and do not have the capability to experience emotions. But thank you for asking. Is there anything I can assist you with?

2.6 Tracking token usage (omitted)

  Tracking LLM token usage is similar to Chat models. Currently it is only implemented for OpenAI API. Please check the document directly《Tracking token usage》 That’s fine, I won’t write it here.

Guess you like

Origin blog.csdn.net/qq_56591814/article/details/134753198