Three lines of code, automatically generate an abstract for your paper

Three lines of code, automatically generate an abstract for your paper

  • Those who are interested in code and programming can follow old K to play code and communicate with me!

Three lines of code, automatically generate an abstract for your paper

Three lines of code, automatically generate an abstract for your paper

I wrote a cool article about the weight reduction of the paper before, and received a good response.
The magic weapon for graduates, one-click weight reduction of thesis~!
Recently, I did a statistics on the question and answer of the graduation thesis, and found that in addition to weight reduction, the preparation and writing of the "abstract" is also a topic of great concern to the students. Is there a shortcut to the abstract?
There is indeed, old K will introduce to you here:

1. First import the required third-party library


# encoding:utf-8
from gensim.summarization import summarize
import re
  • Since our paper is mainly composed of Chinese characters, in order to avoid coding problems, now write the utf-8 statement in the head of the script;
  • What we will use is a library called gensim:
“gensim是一个用于自然语言处理的库,最早是用来生成给定文章相似内容的工具,gensim本身就是“generate similar”的合成词。
  • Therefore, gensim is a very suitable library if you want to implement summary functions.
  • The dependent libraries it will use are numpy and smart_open.

    2. Preprocess the text


text = re.sub(r'。|?|!', '. ', text)
  • gensim's algorithm is based on sentences.
  • Since gensim itself is for English text, in order to allow gensim to recognize the concept of "sentence" in Chinese. We did the above processing.
  • End with ".", so that gensim can determine that this is the termination symbol of a sentence.

    3. One line of code implementation summary


abstract = summarize(text)
  • The summary module of gensim is a variant based on textrank.
  • The textrank is the smallest unit of sentence, which calculates the degree of relevance between sentences to obtain the most representative text. The most representative text is itself the best sentence for summarizing a long text.

    ex. Summary


# encoding:utf-8
from gensim.summarization import summarize
import re

def do_abstract(text):
    text = re.sub(r'。|?|!', '. ', text)
    abstract = summarize(text)
    return abstratc

The above code can help you get the core content of the paper as quickly as possible. You can use it to:

  • Quickly obtain the core content of the reference to determine whether the paper is suitable for your thesis theme
  • Quickly generate a draft of the abstract content of your paper, which can be modified directly from the draft, reducing the trouble of organizing the language from scratch
“textrank是源于google的pagerank算法启发,应用到自然语言处理领域的结果。
  • The summarize module has many parameters that can be set to adapt to the actual environment where the needs are more personalized.
  • Friends who are interested in textrank can learn more about the principle in the paper "TextRank: Bringing Order into Texts" written by Baidu Rada Mihalcea and Paul Tarau.
  • Friends who are not suitable for English reading can also follow my public account "Old K Play Code", and I will write a personal Chinese analysis of this paper in the future.
    This code can only help you summarize the core content of the article, but it cannot completely replace the abstract writing work.
    Old K wrote this article with technology sharing as his original intention, and still encourages graduates to independently create thesis.

Previous wonderful recommendations:

Are there any good Python projects for data visualization?
[Recommended] The Python introductory book recommended by the big guys is an overview of
deep learning. Which community forums do you see?
[Recommended] 10 free python machine learning projects to
teach newbies how to install Anaconda and configure the development environment

Fan benefits:


  • Read and share "Learn the basic skills of JavaScript, old K recommends these books", have a chance to get a copy of "JavaScript Advanced Programming"
  • Read and share "11 Must-Read Bibliographies Recommended for Newbies to Learn Python by themselves", and have a chance to get 1 copy of "Python Core Programming"
    Three lines of code, automatically generate an abstract for your paper

Guess you like

Origin blog.51cto.com/15069443/2576231