NLP——Summarization


insert image description here
insert image description here

insert image description here

There are two types of summarization tasks:

  • One is to extract more representative sentences
  • The second is to summarize the paragraph content in more abstract language

The goal of summarization:
insert image description here

insert image description here

Extractive summarisation

"Extractive Summarization" is an automatic text summarization method whose goal is to select some key sentences or paragraphs from the original document to form a summary. This summary should preserve as much as possible the main message and meaning of the original document. When we talk about "Single-document Extractive Summarization", we mean the process of extracting key information from a single document to generate a summary.

Single-document

insert image description here

The following are the general steps for Single-document Extractive Summarization:

  • Preprocessing: This usually includes cleaning the text (removing unnecessary symbols, spaces, etc.), tokenizing, removing stop words (such as "the", "is", etc. that are common but not very informative), etc.

  • Feature Computation: Scores each sentence in a document based on specific features. These features might include the length of the sentence, the frequency of keywords in the sentence, the position of the sentence in the document (e.g., in general, summarization may prefer to select sentences at the beginning and end of the article, because sentences at these positions are more likely to contain the main information), similarity of the sentence to the entire document, etc.

  • Sentence selection: Based on the scores from previous steps, the sentences with the highest scores are selected for inclusion in the summary. Usually, we set a threshold or limit the length of the summary.

  • Summary generation: Organize the selected sentences according to the order in which they appear in the original document to form a final summary.

It is worth noting that although this method is simple and effective, it also has some limitations. For example, it may ignore the logical relationship and coherence between sentences because it simply extracts sentences from the original document without any reorganization or generation of new sentences . Also, since it relies on sentences in the original document, the quality of the summary may also suffer if the quality of the original document is not high.

content selection

insert image description here
insert image description here

  • commonly used unsupervised learningmethod
  • The goal is: to find important or salient sentences

TFIDF Method

insert image description here

Log Likelihood Ratio Method log likelihood ratio

insert image description here
This approach attempts to use statistical analysis to determine which sentences or phrases are most likely to contain the key content of the original document.

The log-likelihood ratio is a measure of the relative merits of two probabilistic models; more specifically, it compares how well a more complex model (typically containing more parameters) fits the observed data with a simpler model (typically containing fewer parameters). In this case, the log-likelihood ratio can be used to measure whether a sentence or phrase contains key information related to the entire document.

The basic idea of ​​this method is that, for a given sentence or phrase, if its occurrence frequency in the original document is much higher than that in a larger background corpus, then the sentence or phrase is more likely to contain the key information of the original document. Therefore, we can evaluate the importance of each sentence or phrase by calculating the log-likelihood ratio, and then select the sentence or phrase with the highest score as part of the summary.

The advantage of this method is that it can automatically select those sentences or phrases that contain key information without manually setting a set of keywords . However, it also has some limitations, for example, it may rely too much on frequency information, while ignoring the semantic information and context information of sentences or phrases . Also, for some short or very specific documents, this approach may not provide good results.
insert image description here

Sentence Centrality Method Sentence Centrality Method

insert image description here
Sentence Centrality refers to a measure of the importance of a sentence in a document. The evaluation of sentence centrality is often used in tasks such as automatic text summarization and information extraction.

RST Parsing

insert image description here
insert image description here
Rhetorical Structure Theory (RST, rhetorical structure theory) is a theoretical framework for describing text structure. In this framework, the text is not just a series of sentences, but connected by a series of rhetorical relations (Rhetorical Relations). These rhetorical relations include causal relations, contrastive relations, explanatory relations, etc., which describe how sentences or sentence groups in a text are related to each other to form a unified and coherent information structure.

RST parsing (RST Parsing) refers to the RST analysis of a text to identify the rhetorical relationship in the text and generate a structural representation called an RST tree (RST Tree).

In text summarization (Summarization), RST parsing can help us understand the deep structure and logical relationship of the text, thereby generating higher-quality summaries. For example, we can prioritize those sentences that are at a higher level (i.e., more important) in the RST tree, or those that participate in important rhetorical relationships. In addition, we can also ensure the coherence and logic of the summary according to the RST tree. For example, if we choose a result (Result) sentence, then we may also need to choose the reason (Reason) sentence related to it.

insert image description here

Multi-document

  • The multi-file situation is very similar to the single-file one, except that there may be information redundancy, because there may be multiple sentences that are repeated or very similar
    insert image description here

Content selection

  • You can still use tfidf and log likelihood ratio
  • But to choose to ignore those redundant sentences
    insert image description here

Maximum Marginal Relevance Maximum Marginal Relevance

insert image description here
Maximum Marginal Relevance (MMR) is a strategy used in tasks such as information retrieval and text summarization to weigh the relevance and diversity of information. The basic idea is to select those items that are most relevant to the query or topic, but least similar to the selected content.

In the context of text summarization, MMR can help us generate better summaries. For example, in extractive summarization, we can use MMR to select sentences to ensure that the selected sentences are not only related to the topic of the document, but also contain different information as much as possible. In this way, repetitive or redundant content can be avoided in the abstract, thereby improving the information density and reading experience of the abstract.

Information Ordering information ordering

  • Sort by time
  • Sort by cohesion
    insert image description here

Sentence Realization

insert image description here
"Sentence Realization" (Sentence Realization) generally refers to the process of converting a semantic representation or semantic frame into a complete, grammatically correct sentence in Natural Language Generation (NLG).

This usually involves the following steps:

  • Vocabulary selection: choose the appropriate words to express the meaning according to the semantic representation. For example, if the semantic representation is "move", you can choose words such as "move", "walk", "run", etc.
  • Word order determination: Different languages ​​have different word order rules, and the order of words needs to be determined according to grammatical rules.
  • Morphogenesis: In some languages, word forms change according to their role in a sentence, for example in English the tense of a verb may need to change according to context.
  • Modifier Addition: It may be necessary to add some extra words like articles, prepositions, conjunctions, etc. to generate grammatically correct sentences.

Abstractive summarisation

Single-document (deep learning models!)

insert image description here

Encoder-Decoder model

insert image description here

  • In order to train these models, we use different types of data, one of which is: use the first sentence of the article as a document, and the summary is the title of the article
    insert image description here
    insert image description here

  • An example of the result produced in this way is as follows:

    • G isgroundtruth
    • A is generatedsummary
  • There are other dataset forms:
    insert image description here

Improvements

insert image description here

Encoder-Decoder with Attention

A simple explanation of how it works is:

  • Encoder: The task of an encoder is to convert an input source text (e.g., an article) into a series of vector representations that capture the semantic information of the text. Common encoders are recurrent neural networks (RNN) or Transformer encoders.

  • Attention Mechanism: The attention mechanism plays an important role in the encoding and decoding process. Its basic idea is to consider not only the current state of the decoder but also all words in the source text when generating each output word, and assign different weights to different words. These weights are called "attention", and they indicate how much attention the decoder pays to each word in the source text when generating the current word. Through the attention mechanism, the decoder can better utilize the information of the source text and thus generate more accurate summaries.

  • Decoder: The task of the decoder is to generate the target text (e.g., a summary) based on the encoder output and attention weights. When generating each word, the decoder refers to all previously generated words and attention weights. Common decoders are recurrent neural network (RNN) or Transformer decoders.

Method based on Copy mechanism

insert image description here

As shown above, specifically:

  • The above method combines copythe mechanism to avoid direct decoding at the decoding end, but consider the information at the encoding end at each time step of decoding
  • Bi-lstmIn the above figure, as is used encoder, decoderand the terminal is used as ordinary lstm.
  • Suppose the current decodertime step is ti t_itiUse the vector decoderof hiddenand the vector encoderof each time step at the terminal hiddento calculate the similarity, and softmaxget it after passing attention distribution. According to this, after weighting and summing distributioneach vector, the current time step ti t_i is obtainedencodertiofcontext vector
  • Then at the decoding end vectorand context vectorcalculate the similarity, and get P gen P_{gen}Pgenthis scalar.
  • Then proportional to the original attention distributionand ti t_itidecoderThe state of the time step hiddenis added to get the final word used for decoding Final Distribution, and the most argmaxgenerated word of the current time step is selected from it
  • The original attention distributionis copythe part of , which means to directly use the information of the original text
    insert image description here
    insert image description here

Transformer-based

insert image description here

  • Since BERTis just the encoding side of the transformer, this kind of task is not possible
  • Such tasks require encoder+ decoderor onlydecoder

Guess you like

Origin blog.csdn.net/qq_42902997/article/details/131219330