[LangChain] Overview (Summarization)

LangChain learning documentation


overview

summarization chainCan be used to summarize multiple documents. One way is to input multiple smaller documents, break them into chunks, and MapReduceDocumentsChainoperate on them using . You can also choose to change the chain being summarized to StuffDocumentsChainor RefineDocumentsChain.

Prepare data

First we prepare the data. In this example, we create multiple documents from one long document , but the documents can be obtained in any way (the focus of this notebook is to highlight what to do after obtaining the documents).

from langchain import OpenAI, PromptTemplate, LLMChain
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains.mapreduce import MapReduceChain
from langchain.prompts import PromptTemplate
# 大模型
llm = OpenAI(temperature=0)
# 初始化拆分器
text_splitter = CharacterTextSplitter()
# 加载长文本
with open("../../state_of_the_union.txt") as f:
    state_of_the_union = f.read()
texts = text_splitter.split_text(state_of_the_union)

from langchain.docstore.document import Document
# 将拆分后的文本转成文档
docs = [Document(page_content=t) for t in texts[:3]]

quick start

If you just want to get started as quickly as possible, here are the recommended methods:

from langchain.chains.summarize import load_summarize_chain
# 注意这里是load_summarize_chain
chain = load_summarize_chain(llm, chain_type="map_reduce")
chain.run(docs)

result:

'问界M5是赛力斯与华为合力打造的高端品牌AITO的首款车,问界M5共推出两款车型,后驱标准版预售价格25万元,四驱性能版28万元。'

If you want to have better control and understanding of what's going on, please see the following information.

stuff Chain

This section presents stuff Chainthe results of aggregation using .

chain = load_summarize_chain(llm, chain_type="stuff")
chain.run(docs)

result:

    ' 问界M5是赛力斯与华为合力打造的高端品牌AITO的首款车,问界M5共推出两款车型,后驱标准版预售价格25万元,四驱性能版28万元。'

Custom prompt ( Custom Prompts)

You can also use your own hints on this chain. In this example we will reply in Italian.

prompt_template = """Write a concise summary of the following:

{text}

CONCISE SUMMARY IN ITALIAN:"""
# 上面的prompt是要用意大利语做摘要
PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])
# summarize_chain
chain = load_summarize_chain(llm, chain_type="stuff", prompt=PROMPT)
chain.run(docs)

Result: No printing.

map_reduce Chain

This section presents map_reduce Chainthe results of aggregation using.

chain = load_summarize_chain(llm, chain_type="map_reduce")
chain.run(docs)
    ' 问界M5是赛力斯与华为合力打造的高端品牌AITO的首款车,问界M5共推出两款车型,后驱标准版预售价格25万元,四驱性能版28万元。'

intermediate steps

We can also go back to the intermediate steps of the chain if we want to check them map_reduce. This is return_map_stepsdone via variables.

chain = load_summarize_chain(OpenAI(temperature=0), chain_type="map_reduce", return_intermediate_steps=True)

chain({
    
    "input_documents": docs}, return_only_outputs=True)

result:

    {
    
    'map_steps': [" In response to Russia's aggression in Ukraine, the United States has united with other freedom-loving nations to impose economic sanctions and hold Putin accountable. The U.S. Department of Justice is also assembling a task force to go after the crimes of Russian oligarchs and seize their ill-gotten gains.",
      ' The United States and its European allies are taking action to punish Russia for its invasion of Ukraine, including seizing assets, closing off airspace, and providing economic and military assistance to Ukraine. The US is also mobilizing forces to protect NATO countries and has released 30 million barrels of oil from its Strategic Petroleum Reserve to help blunt gas prices. The world is uniting in support of Ukraine and democracy, and the US stands with its Ukrainian-American citizens.',
      " President Biden and Vice President Harris ran for office with a new economic vision for America, and have since passed the American Rescue Plan and the Bipartisan Infrastructure Law to help struggling families and rebuild America's infrastructure. This includes creating jobs, modernizing roads, airports, ports, and waterways, replacing lead pipes, providing affordable high-speed internet, and investing in American products to support American jobs."],
     'output_text': " In response to Russia's aggression in Ukraine, the United States and its allies have imposed economic sanctions and are taking other measures to hold Putin accountable. The US is also providing economic and military assistance to Ukraine, protecting NATO countries, and passing legislation to help struggling families and rebuild America's infrastructure. The world is uniting in support of Ukraine and democracy, and the US stands with its Ukrainian-American citizens."}

Custom prompts

You can also use your own on this chain prompt. In this example we will reply in Italian.

# 该prompt说:要用意大利语做摘要
prompt_template = """Write a concise summary of the following:

{text}

CONCISE SUMMARY IN ITALIAN:"""
# 创建prompt的模板
PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])
chain = load_summarize_chain(OpenAI(temperature=0), chain_type="map_reduce", return_intermediate_steps=True, map_prompt=PROMPT, combine_prompt=PROMPT)
chain({
    
    "input_documents": docs}, return_only_outputs=True)

Custom MapReduceChain

Multiple input prompts

You can also use multiple input prompts. In this example, we'll use a MapReduce chain to answer a specific question about our code.

from langchain.chains.combine_documents.map_reduce import MapReduceDocumentsChain
from langchain.chains.combine_documents.stuff import StuffDocumentsChain
# 第一个prompt
map_template_string = """Give the following python code information, generate a description that explains what the code does and also mention the time complexity.
Code:
{code}

Return the the description in the following format:
name of the function: description of the function
"""

# 第二个prompt
reduce_template_string = """Given the following python function names and descriptions, answer the following question
{code_description}
Question: {question}
Answer:
"""
# 第一个prompt模板
MAP_PROMPT = PromptTemplate(input_variables=["code"], template=map_template_string)
# 第二个prompt模板
REDUCE_PROMPT = PromptTemplate(input_variables=["code_description", "question"], template=reduce_template_string)
# 大模型
llm = OpenAI()
# map 链
map_llm_chain = LLMChain(llm=llm, prompt=MAP_PROMPT)
#reduce 链
reduce_llm_chain = LLMChain(llm=llm, prompt=REDUCE_PROMPT)

generative_result_reduce_chain = StuffDocumentsChain(
    llm_chain=reduce_llm_chain,
    document_variable_name="code_description",
)

combine_documents = MapReduceDocumentsChain(
    llm_chain=map_llm_chain,
    combine_document_chain=generative_result_reduce_chain,
    document_variable_name="code",
)

map_reduce = MapReduceChain(
    combine_documents_chain=combine_documents,
    text_splitter=CharacterTextSplitter(separator="\n##\n", chunk_size=100, chunk_overlap=0),
)

The code snippet is:

code = """
def bubblesort(list):
   for iter_num in range(len(list)-1,0,-1):
      for idx in range(iter_num):
         if list[idx]>list[idx+1]:
            temp = list[idx]
            list[idx] = list[idx+1]
            list[idx+1] = temp
    return list
##
def insertion_sort(InputList):
   for i in range(1, len(InputList)):
      j = i-1
      nxt_element = InputList[i]
   while (InputList[j] > nxt_element) and (j >= 0):
      InputList[j+1] = InputList[j]
      j=j-1
   InputList[j+1] = nxt_element
   return InputList
##
def shellSort(input_list):
   gap = len(input_list) // 2
   while gap > 0:
      for i in range(gap, len(input_list)):
         temp = input_list[i]
         j = i
   while j >= gap and input_list[j - gap] > temp:
      input_list[j] = input_list[j - gap]
      j = j-gap
      input_list[j] = temp
   gap = gap//2
   return input_list

"""
# 哪个函数的时间复杂度更好
map_reduce.run(input_text=code, question="Which function has a better time complexity?")

result:

    Created a chunk of size 247, which is longer than the specified 100
    Created a chunk of size 267, which is longer than the specified 100

    'shellSort has a better time complexity than both bubblesort and insertion_sort, as it has a time complexity of O(n^2), while the other two have a time complexity of O(n^2).'

refine Chain

This section shows refinethe results of aggregation using chains.

chain = load_summarize_chain(llm, chain_type="refine")

chain.run(docs)

result:

问界M5是赛力斯与华为合力打造的高端品牌AITO的首款车,问界M5共推出两款车型,后驱标准版预售价格25万元,四驱性能版28万元。

intermediate steps

refineWe can also go back to the intermediate steps of the chain if we want to check them . This is return_refine_stepsdone via variables.

# 注意这里指定参数
chain = load_summarize_chain(OpenAI(temperature=0), chain_type="refine", return_intermediate_steps=True)

chain({
    
    "input_documents": docs}, return_only_outputs=True)
# 结果
'问界M5是赛力斯与华为合力打造的高端品牌AITO的首款车,问界M5共推出两款车型,后驱标准版预售价格25万元,四驱性能版28万元。'

Custom prompt

You can also use your own hints on this chain. In this example we will reply in Italian.

prompt_template = """写出以下内容的简洁摘要:

{text}

意大利语简洁摘要:"""
PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])
refine_template = (
    "你的工作是编写最终摘要\n"
    "我们已经提供了一定程度的现有摘要: {existing_answer}\n"
    "我们有机会完善现有的摘要"
    "(only if needed) 下面有更多背景信息.\n"
    "------------\n"
    "{text}\n"
    "------------\n"
    "鉴于新的背景,完善意大利语的原始摘要"
    "如果上下文没有用,则返回原始摘要。"
)
refine_prompt = PromptTemplate(
    input_variables=["existing_answer", "text"],
    template=refine_template,
)
chain = load_summarize_chain(OpenAI(temperature=0), chain_type="refine", return_intermediate_steps=True, question_prompt=PROMPT, refine_prompt=refine_prompt)
chain({
    
    "input_documents": docs}, return_only_outputs=True)

result:

    {
    
    'intermediate_steps': ["\n\nQuesta sera, ci incontriamo come democratici, repubblicani e indipendenti, ma soprattutto come americani. La Russia di Putin ha cercato di scuotere le fondamenta del mondo libero, ma ha sottovalutato la forza della gente ucraina. Insieme ai nostri alleati, stiamo imponendo sanzioni economiche, tagliando l'accesso della Russia alla tecnologia e bloccando i suoi più grandi istituti bancari dal sistema finanziario internazionale. Il Dipartimento di Giustizia degli Stati Uniti sta anche assemblando una task force dedicata per andare dopo i crimini degli oligarchi russi.",
      "\n\nQuesta sera, ci incontriamo come democratici, repubblicani e indipendenti, ma soprattutto come americani. La Russia di Putin ha cercato di scuotere le fondamenta del mondo libero, ma ha sottovalutato la forza della gente ucraina. Insieme ai nostri alleati, stiamo imponendo sanzioni economiche, tagliando l'accesso della Russia alla tecnologia, bloccando i suoi più grandi istituti bancari dal sistema finanziario internazionale e chiudendo lo spazio aereo americano a tutti i voli russi. Il Dipartimento di Giustizia degli Stati Uniti sta anche assemblando una task force dedicata per andare dopo i crimini degli oligarchi russi. Stiamo fornendo più di un miliardo di dollari in assistenza diretta all'Ucraina e fornendo assistenza militare,",
      "\n\nQuesta sera, ci incontriamo come democratici, repubblicani e indipendenti, ma soprattutto come americani. La Russia di Putin ha cercato di scuotere le fondamenta del mondo libero, ma ha sottovalutato la forza della gente ucraina. Insieme ai nostri alleati, stiamo imponendo sanzioni economiche, tagliando l'accesso della Russia alla tecnologia, bloccando i suoi più grandi istituti bancari dal sistema finanziario internazionale e chiudendo lo spazio aereo americano a tutti i voli russi. Il Dipartimento di Giustizia degli Stati Uniti sta anche assemblando una task force dedicata per andare dopo i crimini degli oligarchi russi. Stiamo fornendo più di un miliardo di dollari in assistenza diretta all'Ucraina e fornendo assistenza militare."],
     'output_text': "\n\nQuesta sera, ci incontriamo come democratici, repubblicani e indipendenti, ma soprattutto come americani. La Russia di Putin ha cercato di scuotere le fondamenta del mondo libero, ma ha sottovalutato la forza della gente ucraina. Insieme ai nostri alleati, stiamo imponendo sanzioni economiche, tagliando l'accesso della Russia alla tecnologia, bloccando i suoi più grandi istituti bancari dal sistema finanziario internazionale e chiudendo lo spazio aereo americano a tutti i voli russi. Il Dipartimento di Giustizia degli Stati Uniti sta anche assemblando una task force dedicata per andare dopo i crimini degli oligarchi russi. Stiamo fornendo più di un miliardo di dollari in assistenza diretta all'Ucraina e fornendo assistenza militare."}

Reference address:

https://python.langchain.com/docs/modules/chains/popular/summarize

Guess you like

Origin blog.csdn.net/u013066244/article/details/131715331