Research progress and trend of automatic text generation

Summary

We look forward to the day when computers can write like humans and can write high-quality natural language texts. Automatic text generation is the key technology to achieve this purpose. According to different input divisions, automatic text generation can include text-to-text generation, meaning-to-text generation, data-to-text generation, image-to-text generation, etc. Each of the above-mentioned technologies is extremely challenging, and there are quite a lot of cutting-edge research in the fields of natural language processing and artificial intelligence. In recent years, the industry has also produced a number of achievements and applications with international influence. This paper makes a comprehensive summary of the research status of the above-mentioned cutting-edge technologies at home and abroad, and looks forward to the development trend.

Keywords : natural language generation, text-to-text generation, meaning-to-text generation, data-to-text generation, image-to-text generation

Abstract

We expect that computers can write high-quality natural language texts like human beings in the near future. Automatic text generation is the key technique for achieving this goal. According to different data types of inputs, automatic text generation techniques include text-to-text generation, meaning-to-text generation, data-to-text generation and image-to-text generation. All the above text generation techniques are very challenging, and they are the frontier research topics in the natural language processing and artificial intelligence fields. In recent years, a few internationally influential achievements and applications have been yielded in academia and industry. In this article, we conduct a comprehensive survey of recent advances of automatic text generation at home and abroad. We also discuss the research and development trends.

Keywords: natural language generation, text-to-text generation, meaning-to-text generation, data-to-text generation, image-to-text generation

1 Introduction

Automatic text generation is an important research direction in the field of natural language processing, and the realization of automatic text generation is also an important sign of the maturity of artificial intelligence. To put it simply, we look forward to the day when computers can write like humans and can write high-quality natural language texts. Automatic text generation technology has great application prospects. For example, automatic text generation technology can be applied to systems such as intelligent question and answer and dialogue, machine translation, etc., to achieve more intelligent and natural human-computer interaction; we can also use the automatic text generation system to replace editors to realize automatic writing and publishing of news, and eventually there will be It may disrupt the news publishing industry; the technology may even be used to help scholars write academic papers, thereby changing the mode of scientific research creation.

According to different input divisions, automatic text generation can include text-to-text generation, meaning-to-text generation, and data-to-text generation (data-to-text generation). generation) and image-to-text generation. Each of the above-mentioned technologies is extremely challenging, and there are quite a lot of cutting-edge research in the fields of natural language processing and artificial intelligence. In recent years, the industry has produced a number of achievements and applications with international influence. Most notably, the Associated Press has been using news writing software since July 2014 to automatically write news articles to report company performance, which greatly reduces the workload of reporters. The Los Angeles Times also has an app for writing breaking news. There are many companies in the United States that can provide news writing software and services. For example, Automated Insights in the United States has written 300 million reports using "language expert" software, including rugby and financial reports. These advances indicate that automatic text generation is no longer a technology on paper, but has had a major impact on human work and life.

At present, domestic academia and industry do not pay enough attention to automatic text generation technology, and there is a general lack of understanding of cutting-edge technologies and progress in this direction. Therefore, this technical report will conduct a comprehensive research, analysis and summary of the cutting-edge technology of automatic text generation for the first time, and provide an important reference for domestic counterparts to fully understand the technology of automatic text generation. At the same time, it is expected that the academia and industry will work together to realize the automatic generation system of Chinese text as soon as possible, and seize the commanding heights of automatic generation technology of Chinese text.

It should be pointed out that the natural language generation technology in the field of natural language processing specifically refers to the technology of generating natural language text from machine-readable data, while the automatic text generation technology introduced in this article is more extensive, including text-to-text Generative techniques, and image-to-text generative techniques.

2. Text-to-text generation

2.1 Current status of international research

Text-to-text generation technology mainly refers to the technology of transforming and processing a given text to obtain a new text, specifically including text summarization (Document Summarization), sentence compression (Sentence Compression), sentence fusion (Sentence Fusion), text paraphrase (Paraphrase Generation) and so on. The above-mentioned different technologies have been studied for many years internationally, and the relevant research results are mainly published in academic conferences and journals related to natural language processing, such as ACL, EMNLP, NAACL, COLING, AAAI, IJCAI, SIGIR, INLG, ENLG, etc. Several major international research institutions include University of Michigan, University of Southern California, Columbia University, University of North Texas, University of Edinburgh, etc. It should be pointed out that machine translation can also be regarded as a text generation technology from source language to target language to some extent, but since machine translation itself is a relatively independent research field, the content of this article will not cover machine translation technology.

2.1.1 Text summarization

Text summarization technology automatically analyzes a given document or document set, extracts the key information, and finally outputs a short summary (usually containing a few sentences or hundreds of words), the sentences in the summary can be directly from the original text, It is also possible to rewrite the result. The purpose of the abstract is to provide users with a concise description of the content by compressing and refining the original text.

According to different classification standards, document summaries can be mainly divided into the following different types:

According to the number of documents processed, summarization can be divided into single-document summarization and multi-document summarization. Single-document summarization generates summaries only for a single document, while multi-document summarization generates summaries for a set of documents.

Depending on whether context is provided, summaries can be divided into topic or query-independent summaries and topic- or query-related summaries. Topic- or query-related summaries are able to explain the topic or answer the query given a certain topic or query; while topic- or query-independent summaries refer to documents or document collections that are not given a topic or query. summary of .

According to the method adopted by summarization, summarization can be divided into generative and extractive. Generative methods usually need to use natural language understanding technology to analyze the grammar and semantics of the text, fuse information, and use natural language generation technology to generate new summary sentences. The extractive method is relatively simple. Usually, different methods are used to evaluate document structural units (sentences, paragraphs, etc.), each structural unit is given a certain weight, and then the most important structural unit is selected to form a summary. The extraction method is widely used, and the structural unit usually used is a sentence.

According to the application type of the abstract, the abstract can be divided into title abstract, biographical abstract, movie abstract, etc. These summaries usually meet specific application requirements. For example, the purpose of biographical summaries is to generate a general description of a person, usually including various attributes of the person, such as name, gender, address, birth, hobbies, etc. By browsing a person's biographical summary, users can get an overview of that person.

The research on document automatic summarization has always been active in the field of libraries and natural language processing, and the earliest application requirements came from libraries. Libraries need to generate abstracts for a large number of literature and books, and the efficiency of manual abstraction is very low. Therefore, there is an urgent need for automatic abstraction methods to replace human beings to complete the task of literature abstraction efficiently. With the development of information retrieval technology, document automatic summarization is becoming more and more important in information retrieval system, and has gradually become one of the research hotspots. The first paper on document automatic summarization technology comes from Luhn (1958) [[1]]. After decades of development, and driven by the international evaluation of automatic summarization organized by DUC1 and TAC [2], text summarization technology has achieved Great progress. It is worth mentioning that the widespread use of ROUGE [3], an automatic summarization quality assessment tool developed by Dr. Chin-Yew Lin of the University of Southern California (now working at Microsoft Research Asia), is also a driving force for the rapid development of automatic summarization technology. Several well-known systems for document automatic summarization in the world include ISI's NeATS system [2], Columbia University's NewsBlaster system [4] [3], Michigan University's NewsInEssence system [5] [4] and so on. In 2013, Yahoo spent $30 million to acquire Summly, an automatic news summary application, marking the maturity of news summary technology.

The current text summarization method is mainly based on sentence extraction, that is, the sentence in the original text is used as a unit for evaluation and extraction. The advantage of this type of method is that it is easy to implement and can ensure that the summary sentences have good readability. This type of method mainly includes two steps: one is to calculate or rank the importance of sentences in the document, and the other is to select important sentences and combine them into a final summary. The first step can use a rule-based method to determine the importance of the sentence by using the sentence position or the clue words it contains; it can also use various machine learning methods (including deep learning methods) to comprehensively consider multiple features of the sentence. Classification, regression or ranking of sentence importance, such as CRF[5], HMM[6], SVM[7][8], RNN[9], etc. The second step is based on the results of the previous step. It needs to consider the similarity between sentences, avoid selecting repeated sentences (such as the MMR algorithm [10]), and further arrange the selected summary sentences coherently (such as bottom-up method [11]) to obtain the final abstract. In recent years, the academic community has further proposed methods based on integer linear programming[12][13][14] and methods for maximizing submodular functions[15][16], which can simultaneously consider sentence redundancy in the process of sentence selection. .

Different from the above methods, the compressed text summarization method considers the compression of sentences to allow the summarization to cover more content under a shorter length limit. The most representative approach is to simultaneously perform sentence selection and sentence compression [17][19][19], which can achieve better ROUGE performance. In addition to compression, some works also use techniques such as sentence fusion to transform existing sentences to obtain new summary sentences [20][21].

There are still some researchers in the world who study generative summarization in the true sense, that is, by semantically understanding the original document, expressing the original document as a deep semantic form (such as a deep semantic map), and then analyzing and obtaining the deep semantic representation of the abstract ( such as deep semantic subgraphs), and finally the summary text is generated from the deep semantic representation of the summary. A recent attempt is generative summarization based on Abstract Meaning Representation (AMR) [22]. The summary sentences obtained by this type of method are not based on the original sentence, but are directly generated from the semantic expression by using natural language generation technology. This type of method is relatively complicated, and since neither natural language understanding nor natural language generation itself has been well solved, the current generative summarization method is still in the exploratory stage, and its performance is not satisfactory.

The above summarization methods are all oriented to news summarization, and in recent years, more and more attention has been paid to the summarization of academic literature. On the one hand, the citation relationship between academic documents and citations can be used to help summarize academic documents [23]; on the other hand, automatic review of academic documents is also a very interesting research problem [24]. More information on text summarization techniques can be found in the review [25].

2.1.2 Sentence compression and fusion

Sentence compression and sentence fusion technologies are generally used in text summarization systems to generate more compact summaries and obtain better summaries.

Sentence compression technology generates a short sentence based on a long sentence. The short sentence is required to retain important information in the long sentence, that is, the important information is basically not lost, and the short sentence is required to be smooth. An example of sentence compression is given below:

原句：But they are still continuing to search the area to try and see if there were, in fact, any further shooting incidents.

Compressed sentence: They are continuing to search the area to see if there were any further incidents.

The academic community has tried a variety of methods to achieve sentence compression, including deleting words from sentences [26], or replacing, reordering, or inserting words in sentences [27]. Among them, the method of directly deleting words from sentences has become the mainstream method because of its low complexity. Researchers have proposed a variety of methods for sentence compression based on word deletion, including noisy channel models [28], structured discriminative models [29], tree-to-tree conversion [30], integer linear programming [31], etc. . But as far as the overall effect is concerned, the words deleted for most sentences are generally less, and the compression effect is not obvious.

Sentence fusion technology is to combine two or more related sentences with overlapping content to get a sentence. According to different purposes, one type of sentence fusion only retains the common information in multiple sentences, and filters irrelevant detailed information (similar to the intersection operation in set operations), and the other type of sentence fusion only filters out the differences between multiple sentences. Repeated content between (similar to the union operation in the set operation). The following are two related sentences and the sentences obtained after manual merging:

句子 1：In 2003, his nomination to the U.S. Court of Appeals for the District of Columbia sailed through the Senate Judiciary Committee on a 16-3 vote.

句子 2：He was nominated to the U.S. Court of Appeals for the District of Columbia Circuit in 1992 by the first President Bush and again by the president in 2001.

Merged sentences (take the intersection): He was nominated to the US Court of Appeals for the District of Columbia Circuit.

合并后的句子(取并集)：In 2003, his nomination by the first President Bush, and again by the second Bush in 2001 to the U.S. Court of Appeals for the District of Columbia sailed through the Senate Judiciary Committee on a 16-3 vote.

For the sentence fusion problem, Regina Barzilay of MIT and Kathleen McKeown of Columbia University proposed a pipeline algorithm, including Identification of Common Information, Fusion lattice computation, and Lattice linearization. step [20]. Other representative methods proposed by researchers for the sentence fusion problem include methods based on structured discriminative learning [32], methods based on integer linear programming [33], methods based on graph shortest paths [34], etc.

The above-mentioned studies are all oriented to English, and a small number of researchers have published the datasets used on the Internet, but these datasets are relatively small in scale and narrow in coverage, and the industry has not organized sentence compression or fusion-related evaluations. In recent years, academic papers related to sentence compression and sentence fusion technology are relatively rare, which is not unrelated to the above factors.

2.1.3 Text retelling

Text paraphrase generation technology generates a new paraphrase text by rewriting the given text. Generally, the output text is required to be different from the input text in terms of expression, but the expressed meaning is basically the same. Text paraphrase generation technology is widely used. For example, text paraphrase technology can be used in machine translation systems to simplify complex input texts to facilitate translation. In information retrieval systems, text paraphrase technology can be used to rewrite user queries. In children's teaching systems Text paraphrase technology can be used to simplify difficult-to-understand texts into texts that are easy for children to understand.

According to actual needs, compared with the original text, the output text obtained by paraphrase generation technology may only have one or two words changed (such as Example 1), or the entire text may be completely different (such as Example 2).

例1：all the members of –> all members of

例2：He said there will be major cuts in the salaries of high-level civil servants. =>He claimed to implement huge salary cut to senior civil servants.

Simple text paraphrase generation can be realized by synonym replacement, or by manually or automatically constructed paraphrase rules [35]. For example, according to a rule for changing the position of adverbials, a simple paraphrase sentence for the following sentence can be obtained:

输入：He booked a single room in Beijing yesterday.

输出：Yesterday, he booked a single room in Beijing.

In order to realize complex text paraphrase generation, researchers have proposed methods based on natural language generation [36], methods based on machine translation [37] and methods based on pivots (Pivot) [38][39], etc. The method based on natural language generation simulates the way of thinking of human beings. First, the semantic understanding of the input sentence is obtained to obtain the semantic representation of the sentence, and then a new sentence is generated based on the obtained semantic representation. Machine translation-based methods regard the problem of text paraphrase generation as a monolingual machine translation problem, thereby utilizing existing machine translation models (such as noisy channel models) to generate paraphrase text for a given text. Pivot-based methods translate the input text in the current language to another language (pivot), and then translate the translated text back to the current language again. Since each translation process requires the semantics of the text in the source and target languages to be consistent, the resulting text can be expected to be semantically consistent with the input text. Note that the pivot language can be single language or multiple languages. For example, the following example uses Italian as the pivot language to generate paraphrased text for an input English sentence through two translations:

Enter an English sentence: What toxins are English most hazardous to expectant mothers?

Translated Italian sentence: Che tossine sono più pericolose alle donne incinte?

The English sentence after re-translation: What toxins are more dangerous to pregnant women?

In general, the existing methods can generate paraphrases with small differences for a given text, but it is difficult to effectively generate high-quality paraphrases with large differences. It is difficult to guarantee its semantic consistency with the original text, and on the other hand, it is difficult to guarantee the readability of the text. In recent years, academic papers related to text paraphrase generation have been rarely seen at important conferences on natural language processing, indicating that research on this problem has encountered a bottleneck.

It should be pointed out that sentence simplification (Sentence Simplification) can be regarded as a special kind of paraphrase generation problem. Its purpose is to rewrite complex long sentences into multiple short sentences that are simple, better readable, and easy to understand. Users read quickly. In terms of implementation, the above-mentioned methods can still be used, such as the method based on single-language machine translation [40], the method based on tree conversion [41], etc. Many studies on the sentence reduction problem use Wikipedia [6] and the corresponding simple Wikipedia [7] data for learning and testing. Simple Wikipedia is intended for children as well as adults who are learning English, and authors on Simple Wikipedia are asked to write articles using simple vocabulary and short sentences. A simple Wikipedia article generally corresponds to an ordinary Wikipedia article, so a large amount of useful corpus can be obtained through this alignment relationship between texts. Kristian Woodsend and Mirella Lapata of the University of Edinburgh proposed to simplify ordinary Wikipedia articles into simple Wikipedia articles based on Quasi-synchronous grammar (Quasi-synchronous grammar) and integer linear programming model [42].

2.2 Current status of domestic research

2.2.1 Text summarization

Compared with popular fields such as machine translation, automatic question answering, knowledge graph, and sentiment analysis, text summarization has not received enough attention in China. Units that have engaged in research on text summarization include the Institute of Computer Science and Technology of Peking University, the Institute of Computational Languages of Peking University, the Information Retrieval Laboratory of Harbin Institute of Technology, and the State Key Laboratory of Intelligent Technology and Systems of Tsinghua University. Among them, the Institute of Computer Science and Technology of Peking University has conducted long-term and in-depth research on text summarization, and proposed a variety of automatic summarization methods based on graph sorting[43][44][45][46] and compressed summarization methods[47 ], and explored a variety of novel summarization tasks such as cross-language summarization, comparative summarization, and evolutionary summarization [4[8]][4[9]][50]. In terms of abstracts of academic literature, the automatic generation method of presentation slides based on supervised learning and integer linear programming models [51] and the automatic generation method of relevant work chapters in academic papers are respectively proposed [52].

Early domestic basic resources and evaluation 8 conducted the evaluation task of single-document summarization, but the scale of the test set was relatively small, and no automated evaluation tools were provided. In 2015, the CCF Chinese Information Technology Special Committee organized NLPCC evaluation9, which included news summary tasks for Weibo, provided relatively large-scale sample data and test data, and adopted automatic evaluation methods, attracting many Teams participate in the evaluation, and the data is currently publicly available. However, the above-mentioned Chinese summarization evaluation tasks are all aimed at single-document summarization tasks, and there is no industry-recognized Chinese multi-document summarization data, which actually hinders the development of Chinese automatic summarization technology.

In recent years, some text mining products have appeared on the market, which can provide document summarization (especially single-document summarization), such as the products of Founder Zhisi, TRS, Massive Technology and other companies. Search engines such as Baidu can provide simple single-document summaries for retrieved documents. These document summarization functions are all considered as subsidiary functions of the system, and their implementation methods are relatively simple. Since none of these modules participated in public evaluation, their performance is unknown.

2.2.2 Sentence compression and fusion

A few units and scholars in China have conducted research on sentence compression. For example, the Linguistic Computing and Internet Mining Laboratory of Peking University proposed a sentence compression method based on dual decomposition [53], and the Intelligent Information Acquisition Research Group of Tsinghua University proposed a method based on Markov logic network. Sentence compression method [54], etc. As for the research on sentence fusion, domestic units and scholars have basically not dabbled in it.

The above-mentioned research by domestic scholars still focuses on English data. The main reason is that there is a lack of relevant Chinese evaluation data. It is not easy to construct a high-quality Chinese sentence compression or fusion evaluation data set, which requires annotators who have a deep understanding of the language.

2.2.3 Text retelling

A few units and scholars in China have conducted some research on text paraphrase generation. For example, the Information Retrieval Center of Harbin Institute of Technology, in cooperation with Microsoft Asia Research Institute, Baidu and other units, proposed to use various resources (including multiple dictionaries, parallel corpora, etc.) Paraphrase generation methods for machine translation [55], paraphrase generation methods using multiple machine translation engines [56], and paraphrase generation methods for different applications [57].

The above research is still oriented to the English field, and English data is used for evaluation, but few people have been involved in Chinese paraphrase generation technology, which is a pity.

2.3 Development Trend and Prospect

Text-to-text generation includes multiple tasks, which are closely related, and many methods are general to different tasks. In the next few years, with the development of deep semantic analysis technology, researchers can make full use of the results of deep semantic analysis in the research process. In addition, the maturity of deep learning technology will open another door for our research, but we need Seriously think about how to make good use of deep semantic analysis technology and deep learning technology. With the widespread use of social media, we can also make full use of social media data to serve our research.

In order to better promote the development of text-to-text generation technology, the industry can start from the following aspects:

First, build a large-scale evaluation data set. Data is the cornerstone of research, and large-scale, high-quality evaluation datasets are crucial to research work. However, many of the above-mentioned tasks currently lack large-scale evaluation datasets, especially Chinese evaluation datasets. The construction of data sets requires a lot of manpower and material resources, so a feasible way is to use crowdsourcing.

Second, build an open source platform. Although various solutions have been proposed in the industry for the above-mentioned tasks, many methods are not easy to realize. The industry needs to build an open source platform for each task. Integrating mainstream algorithms into this platform will greatly facilitate the research of latecomers and promote the development of research.

3. Meaning-to-text generation

3.1 Current status of international research

Different from text-to-text generation, the input of the task of meaning-to-text generation has not reached a consensus in the academic circle. The root cause is that neither philosophers nor linguists have reached a consensus on what is the semantics of natural language. definition.

In the field of computational linguistics, the semantic research principles generally followed by researchers are based on the "Truth Condition", which is believed to find the conditions that can make natural language sentences come true, that is, to characterize to some extent Semantics of natural language. On the basis of the assumption of truth conditions, scholars generally use logical methods to represent semantics, and conduct research from the perspectives of Model Theory and Proof Theory. Many scholars often call this type of The semantics of are logical semantics. Existing research on meaning-to-text generation generally assumes that logical semantic representations—represented by logical expressions—are used as input and natural language sentences are used as output. This paper also introduces these studies. Figure 3.1 shows an example of semantic representation based on type λ-calculus. In this example, the input of the problem is a λ-expression and the output is an English sentence.

Figure 3.1 Example of λ-expression-to-text generation

The generation of meaning-to-text is closely related to Compositional Semantic Parsing, which aims to automatically parse linear word sequences and obtain their truth conditions. Because Frege's Principle of Compositionality is followed in the analysis process, it is called Compositional Semantic Analysis to distinguish it from Distributional Semantics. Combinatorial semantic analysis is a core technology of natural language processing and an important bridge towards deep semantic understanding. It has potential applications in multiple core tasks of natural language processing, such as intelligent question answering and machine translation. From the definition of the problem itself, meaning-to-text generation and compositional semantic analysis are a pair of reciprocal natural language processing tasks. In the current international research, there are not many scholars who only focus on the task of meaning-to-text generation, and some scholars who mainly focus on syntactic and semantic analysis will also take this aspect into consideration.

3.1.1 Text Generation Based on Deep Grammar

In the early research of natural language processing, computational linguistics played a great role. Computational linguists modeled natural language from a formal and computable perspective, and proposed a series of syntax aimed at explaining the operating mechanism of language. Semantic models, and build natural language processing systems based on these models. Relevant research has achieved fruitful research results in the 1980s and 1990s. A series of grammar paradigms (Grammar Formalism) with both language ontology explanatory power and computability were proposed, such as Combinatorial Categorical Grammar (CCG for short). )[59] and Head-driven Phrase-Structure Grammar (HPSG for short)[60] and so on. Different from the Context-Free Grammar (CFG) mainly used in syntactic analysis at present, the above-mentioned grammatical paradigm has the ability to express beyond context-free, and its grammatical derivation process is often more complicated and contains more information, and these information It can be used for more transparent semantic analysis. Simply put, these deep grammar paradigms can better support syntactic and semantic synchronous language analysis. With the support of deep grammar, the combined semantics of natural language can be obtained through the collaborative derivation of syntax and semantics; and when the semantic representation is used as input, the generation of meaning to text can be completed through its inverse process.

Shieber [61] proposed a unified framework for syntactic-semantic analysis and generation. In this framework, Shieber unifies language processing as a process of logical deduction (Deduction), the difference being that the starting point of the deduction—the axiom—is different from the end point of the deduction—the goal. From this perspective, traditional parsing technology can be transplanted to text generation, such as chart parsing technology can be transformed into chart generation technology [62]. Shieber subsequently cooperated with other scholars to refine the idea of deduction, use unification grammar to express the syntax-semantics interface (Syntax-Semantics Interface), and propose the generation of semantic center drive [63].

The complexity of deep grammar is high, and how to construct grammar rules with high coverage (Broad Coverage) for intricate language phenomena is a great problem in itself. The above research mainly discusses the prototype algorithm, and because the real and usable large-scale deep grammar was not well developed at that time, the above research did not present very representative empirical results. After more than ten years of long development, researchers have developed English Resource Grammar (ERG for short) [10] [64] on the basis of HPSG theory. It is a relatively successful deep grammar with high coverage. Rule system, and the text generation research around ERG has also made useful progress. Carroll and Oepen [65] re-discussed the line graph-based generation technology based on ERG and real test data, and gave a very informative empirical evaluation; in addition, they also proposed two new technologies to improve the unification grammar-based The feasible solution of Compact Representation (Compact Representation) and its related decoding algorithm - Selective Unpacking, especially the latter, effectively utilizes the discriminative learning model to improve the ambiguity resolution encountered in the process of text generation.

Combination category grammar is a grammar paradigm widely concerned by scholars in the field of natural language processing. Its design follows the principle of type transparency (Type Transparency) and has a simplified syntax and semantic interface. It is often used by semantic analysis [66] and text generation [67]. model used. White and Baldridge [67] discussed how to combine the line graph generation method with the combined category grammar, and developed an open source sentence realization (Realization) tool based on the combined category grammar - OpenCCG [11]. White and other scholars jointly proposed some algorithms to further improve text generation [68][69][70].

3.1.2 Text Generation Based on Synchronous Grammar

In the past two decades, statistical syntactic analysis and statistical machine translation are recognized as two natural language processing tasks that have made great progress. In addition to drawing on successful experience from mature statistical syntactic analysis—such as discriminative disambiguation—many scholars have also tried to reuse successful machine translation models to complete text generation. The goal of machine translation is to translate a certain natural language sentence into another natural language sentence, and try to keep the meaning unchanged; while text generation can be regarded as translating a certain form of language sentence into a natural language sentence. are extremely comparable.

Chiang [71] proposed a hierarchical phrase-based translation model (Hierarchical Phrase-based Model), the core of which is to use Synchronous Context-Free Grammar (Synchronous Context-Free Grammar) to coordinate the parsing of source language sentences and the generation of target language sentences. At present, synchronous grammar has also been used for reference in the research of text generation [72][58]. Two authors, Wong and Mooney [72], discussed two formal languages for representing meaning: the first is a formal language for directing robot actions, and the second is a variable-free database retrieval language; while Lu and Ng [58] conduct research on Typed λ-expression with strong expressive ability. The common point of the two studies is to construct the tree-based structure of the formal language, and establish a consistent correspondence between the relevant structure and the tree structure of the natural language to be generated, so as to complete the task of text generation; another common point is the extensive use of Existing machine translation technologies (including open source software, etc.) are used for grammar extraction and decoding.

3.2 Current status of domestic research

There are few formal studies on the semantics of natural language in the field of domestic linguistics and computational linguistics, and the research on the comprehensive semantic characterization of Chinese is still blank. On the other hand, researchers engaged in natural language processing are less involved in deep language structure processing, and research on meaning-to-text generation is even rarer, and it is rare to see relevant academic achievements published in important academic conferences and journals superior.

3.3 Development Trend and Prospect

With the development of deep natural language understanding, researchers are paying more and more attention to the core task of natural language generation, which is the generation of meaning to text. The task of meaning-to-text generation will change with the complexity of different problems in the meaning representation system. Traditional generation methods based on deep syntax analysis still need to be improved due to problems such as poor decoding efficiency and insufficient grammar robustness. Good technical solution. In recent years, some sporadic works have tried to apply more mature combinatorial optimization techniques to syntax analysis and machine translation, such as Lagrangian relaxation [73][74], trying to solve some NP-hard problems involved . To deal with the highly complex problem of meaning-to-text generation, we can also try to apply related technologies. For the problem of insufficient robustness of deep grammar, data-driven grammar approximation (Grammar Approximation) [75] has achieved good results. The results show that low-level grammar approximation can effectively improve the robustness of deep grammar analysis. How to apply correlation It is also a very worthwhile research direction to solve the problems encountered in text generation by thinking.

As far as the research on Chinese text generation is concerned, the academic circles at home and abroad need to make greater efforts. First of all, in terms of language ontology analysis, scholars need to establish a relevant semantic representation system and analyze the special language phenomena of Chinese to support the deep processing of Chinese. Secondly, in terms of text generation algorithms, we also need to devote more scientific research energy to design model algorithms suitable for automatic Chinese generation.

4. Data-to-text generation

4.1 Current status of international research

Data-to-text generation technology refers to the generation of relevant texts based on given numerical data, such as generating weather forecast texts, sports news, financial reports, medical reports, etc. based on numerical data. Data-to-text generation technology has strong application prospects. At present, great research progress has been made in this field, and multiple generation systems for different fields and applications have been developed in the industry. Research units for data-to-text generation technology are mainly concentrated in a few units, such as the University of Aberdeen, the University of Brighton, the University of Edinburgh, etc. The relevant research results are mainly published in professional academic conferences such as INLG and ENLG .

Ehud Reiter of the University of Aberdeen proposed a general framework for the data-to-text generation system based on the three-stage pipeline model [76], as shown in the figure below:

Figure 4.1 The general framework of a data-to-text generative system

in:

The input of the signal analysis module (Signal Analysis) is numerical data, by using various data analysis methods to detect the basic patterns in the data, and output discrete data patterns. Examples include peaks in stock data, longer-term growth trends, etc. This module is related to specific application fields and data types, and the output data patterns for different application fields and data types are different.

The input of the data interpretation module (Data Interpretation) is basic patterns and events. By analyzing the basic patterns and input events, it infers more complex and abstract messages, at the same time infers the relationship between them, and finally outputs high-level messages and messages. relationship between. For stock data, for example, you could create a message if the drop exceeds a certain value. It is also necessary to detect the relationship between messages, such as causal relationship, timing relationship, etc. It is worth noting that the data interpretation module is not required in all text generation systems. For example, in the weather forecast text generation system, the basic model is sufficient to meet the requirements, so the data interpretation module is not required.

The input of the document planning module (Document Planning) is messages and relationships. It analyzes and determines which messages and relationships need to be mentioned in the text, and at the same time determines the structure of the text, and finally outputs the messages that need to be mentioned and the document structure. At a higher level, the signal analysis and data interpretation module will generate a large number of messages, patterns and events, but the text is usually limited in length and can only describe a part of it, so the document planning module must determine the messages that need to be explained in the text . Generally, selection and determination can be made according to expert knowledge, importance and novelty of the news, etc. Of course, this module is also very related to the field. The factors considered in the selection of messages in different fields are different, and the structure of the document will also be different.

The input of the Microplanning and Realization module is the selected message and structure, and the final text is output through natural language generation technology. This module mainly involves sentence planning and sentence realization, and requires that the final realized sentence has correct grammar, morphology and spelling, and uses accurate referential expressions. There are quite a lot of researches on the techniques used in academia, please refer to Section 3 "Meaning-to-Text Generation" of this paper for details.

At present, the industry has developed data-to-text generation systems for multiple fields. The frameworks of these systems are not much different from the above-mentioned general frameworks. Some systems combine the two modules in the above-mentioned frameworks into one module, or omit the one of the modules.

Data-to-text generation technology has been most successfully applied in the field of weather forecasting. Several systems have been developed in the industry to summarize weather forecast data and generate weather forecast text. For example, the FoG system [78] can generate bilingual weather forecast text from the data operated by the user; the SumTime system [79] can generate marine weather forecast text, and the experimental evaluation shows that users are sometimes more inclined to read the weather forecast generated by SumTime, rather than weather forecasts written by experts [80]. In addition, Anja Belz of the University of Aberdeen proposed a probabilistic generative model for weather language text generation [81]. Anja Belz and Eric Kow further compared a variety of data-to-text generation systems based on the analysis of weather forecast data. The results showed that the use of a higher degree of automation will not reduce the quality of text generation, and at the same time, the automatic evaluation method of text quality will underestimate the quality of text based on manual methods. rules-based systems, while overestimating automated systems [82].

The industry has also developed multiple text generation systems for other fields, such as the text generation system for air quality [83],

Text generation system for financial data [84], text generation system for medical diagnosis data TOPAZ[85], Suregen [86],

BT-45 [87] et al. Among them, the BT-45 can generate text summaries for monitoring data in the neonatal intensive care unit (NICU) to help doctors make decisions. The following two figures show the input sample and the generated text of the BT-45 system respectively.

Figure 4.2 NICU data sample, HR, TcPO2, TcPCO2, SaO2, T1 & T2, and Mean BP respectively from top to bottom [Portet et al., 2009]

Figure 4.3 Corresponding text generated by the BT-45 system [Portet et al., 2009]

Due to the huge application value of data-to-text generation technology, the industry has established a number of companies engaged in text generation, which can generate industry reports or news reports for multiple industries based on industry data, thereby saving a lot of manpower. Well-known companies include ARRIA[12], AI[13], NarrativeScience[14], etc. Among them, ARRIA is a company headquartered in Europe. It was formerly known as Data2Text. It was founded by two professors Ehud Reiter and Yaji Sripada from the University of Aberdeen. Later, Robert Dale, another scientist in the field of natural language generation, also joined the company. company, the company's core technology is the ARRIA NLG engine. AI (Automated Insights) is an American artificial intelligence company founded by Robbie Allen, a former Cisco engineer. It was the first to generate text summaries based on sports data. It can currently provide services including finance, personal fitness, business intelligence, website analysis, etc. Generate text reports from data in multiple fields, and its core technology is WordSmith NLG engine. At present, AI companies have generated hundreds of millions of news reports for many units such as the Associated Press, resulting in huge influence. NarrativeScience was developed based on StatsMonkey, a research project of Northwestern University in the United States, and its core technology is the Quill NLG engine. Forbes is a typical client of NarrativeScience. There is a special NarrativeScience page[15] on the website, and all articles are automatically generated by NarrativeScience. Here is an automatically generated sample news article:

Figure 4.4 Sample news automatically generated by NarrativeScience

4.2 Current status of domestic research

Domestic academic circles have little research on data-to-text generation, and few relevant academic achievements have been published in important academic conferences and journals. Some units in the domestic industry have developed a template-based text generation system. For example, Xinhua News Agency has developed a system for generating corporate annual reports from financial report data. The system is based on manual templates and fills in the required data into the written templates to generate annual financial reports. Since the templates used are relatively fixed, the financial reports and annual reports generated for different companies are relatively similar and not vivid enough.

4.3 Development Trend and Prospect

The generation technology from data to Chinese text is of great research significance, and at the same time, it is very practical. If the generation of Chinese news from data can be achieved, it will greatly ease the burden on editors and reporters, and realize the transformation of the media and publishing industry. The realization of such a system must rely on the cooperation between scientific research institutes and news publishing organizations. News publishing agencies can provide a large amount of data and expert knowledge, while scientific research institutes are good at theories and methods of natural language understanding and generation.

In addition, it is quite complicated and difficult to develop a general-purpose data-to-text generation system for different fields. Therefore, a better approach is to select one or two fields (such as finance and sports) for system development, and wait for the system to mature. Then consider migrating the system to other areas.

5. Image to Text Generation

5.1 Current status of international research

Image-to-text generation technology refers to the generation of natural language text describing the content of the image based on a given image, such as the title attached to the news image, the description attached to the medical image, the common picture-seeing speech in children's education, and the user's micro-blog The description text provided when uploading pictures in Internet applications. Depending on the level of detail and length of the generated natural language text, this task can be divided into automatic generation of image captions and automatic generation of image captions. The former needs to highlight the core content of the image according to the application scenario. For example, the title generated for a news photo needs to highlight the news events closely related to the image content, and seek new ways of expression to attract readers’ attention; while the latter usually needs to describe the image in detail The main content of , for example, provides concise and detailed picture descriptions for people with visual impairments, and strives to present the content of the pictures in a comprehensive and orderly manner, but there are no specific requirements on the specific expression methods.

For the task of automatic image-to-text generation, humans can easily understand the image content and express it in the form of natural language sentences according to specific needs; however, for computers, it is necessary to comprehensively use image processing, computer vision and Research results in several fields such as natural language processing. As a landmark cross-field research task, automatic image-to-text generation has attracted the attention of researchers from different fields. Since 2010, relevant papers have been published in well-known international conferences and journals ACL, TACL and EMNLP in the field of natural language processing; since 2013, the top international journal IEEE TPAMI in the field of pattern recognition and artificial The international journal IJCV has also begun to publish the research progress of related work. By 2015, nearly 10 papers on related work have been published in CVPR, a well-known international conference in the field of computer vision. At the same time, there are also 2 papers in ICML, a well-known international conference in the field of machine learning. Related papers were published. The task of automatic image-to-text generation has been recognized as a fundamental challenge in the field of artificial intelligence.

Similar to the general text generation problem, solving the automatic image-to-text generation problem also needs to follow the three-stage pipeline model [76], and at the same time, some adjustments need to be made according to the characteristics of image content understanding:

In terms of content extraction, concepts such as objects, orientations, actions, and scenes need to be extracted from images. Objects can be specifically located in a specific area in the image, while other concepts require semantic indexing. This part mainly relies on pattern recognition and computer vision techniques.

In terms of sentence content selection, it is necessary to select the most important (such as the most prominent in the image screen, or the most relevant to the application scenario) concept with coherent meaning expression according to the application scenario. This part requires the comprehensive use of computer vision and natural language processing technology.

Finally, in the sentence realization part, select an appropriate expression method according to the characteristics of the actual application to sort out the selected concepts into natural language sentences that conform to grammatical habits. This part mainly relies on natural language processing technology.

The early work was mainly realized according to the above-mentioned three-stage pipeline mode. For example, in the work of Yao et al. [88], the image is carefully segmented and marked as objects and their components, as well as the scene represented by the image, and on this basis, a description template related to the scene is selected to recognize the object The result is filled into the template to get the description text of the image. Feng and Lapata[89][90] used probabilistic graphical models to model text information and image information at the same time, and selected appropriate keywords from the text reports where news pictures were located as keywords to reflect image content, and then used The language model links the selected content keywords and necessary functional vocabulary into image titles that basically conform to grammatical rules. There are also some works [91] [92] [93] [94] [95] rely on the existing object recognition technology in the field of computer vision to extract objects from images (including common objects such as people, animals, flowers, cars, tables, etc. type), and locate them to obtain the hyponym relationship between objects, and then rely on the probabilistic graphical model and language model to select an appropriate description order to concatenate these object concepts and prepositional phrase blocks into complete sentences. Hodosh et al. [96] used Kernel Canonical Correlation Analysis (KCCA) based on kernel functions to find the correlation between text and images, and sorted the candidate sentences according to the image information, so as to obtain the best description sentences. It is worth noting that neither the work of Hodosh et al. [96] nor the work of Feng and Lapata [90][91] relies on existing object recognition techniques.

Figure 5.1 A typical pipeline model

With the wide application of deep learning methods in the fields of pattern recognition, computer vision, and natural language processing, large-scale image classification and semantic annotation technologies based on massive data have developed rapidly; at the same time, techniques related to natural language generation such as statistical machine translation There has also been a significant improvement. This has also led to a series of work on the joint modeling of image semantic annotation and natural language sentence generation. On the one hand, a multi-layer deep convolutional neural network (Deep Convolution Neural Network, DCNN) is used on the image side to carry out the concept of objects in the image. On the other hand, Recurrent Neural Network (RNN) or Recursive Neural Network (Recursive Neural Network) is used on the text side to model the generation process of natural language sentences [97]. Traditional image semantic annotation work mainly focuses on the recognition of a specific object and the relative positional relationship between objects, while less attention is paid to abstract concepts such as actions. Socher et al. [98] proposed to use recurrent neural network to model sentences, and use syntactic parsing tree to highlight the modeling of actions (verbs), and then jointly optimize the image side and text side, which better describes objects and actions. The relationship between. In order to unify the data of two different modalities under one framework, Chen and Zitnick [99] fused text information and image information in the same recurrent neural network, and used image information as a memory module to guide the generation of text sentences. At the same time, with the help of a reconstructed image information layer, the bidirectional representation from image to text and from text to image is realized. Mao et al. [100] fused the image information and text information obtained by DCNN into the same recurrent neural network (m-RNN), integrated the image information into the sequence process of natural language sentence generation, and achieved good results. . Similar ideas were also applied by Donahue et al. [101] in the process of action recognition and video description generation. However, in the process of m-RNN sentence generation, there are no significant constraints on the image side. For example, in the figure below, when the word "man" is generated, there is no direct or indirect association with the task annotation in the image information.

Figure 5.2 Multimodal m-RNN model

Researchers from Google and the University of Montreal and the University of Toronto respectively drew on the latest research progress in the field of statistical machine translation to advance the joint modeling of image-to-text automatic generation [102] [103]. The former uses the deep convolutional neural network DCNN to model the image, and after the image information is "encoded", it is directly "decoded" by another LSTM neural network (Long-Short Term Memory Network, LSTM) connected to it ( decoding) into natural language sentences, without the sub-steps of traditional models such as image-word alignment and ordering. The latter, under the framework of neural network-based machine translation, proposes to use the "Attention" mechanism in the field of computer vision to promote the alignment between words and image blocks, thereby simulating the "attention" mechanism of human vision in the process of sentence generation. The "attention" transfer process can promote each other with the generation process of word sequences, so that the generated sentences are more in line with people's expression habits.

Figure 5.3 Image caption generation process guided by visual “attention”

In addition, Microsoft researchers [104] used convolutional neural network CNN and multiple instance learning (Multiple Instance Learning, MIL) to model images, and used discriminative language models to generate candidate sentences, and used the classic statistical machine translation research Minimum Error Rate Training (MERT) is used to explore text and image-level features to rank candidate sentences.

Although the image-to-text generation technology is still in the exploratory stage, and there is still a certain distance from the actual industrial application, the industry has begun to pay attention to the theoretical research value and potential application prospects of this technology, and actively cooperates with the academic community to expand research directions . In the 2015 LSUN Challenge (Large-scale Scene Understanding) held at the well-known computer vision international conference CVPR 2015, the evaluation task of automatically generating image titles was also carried out. Finally, Google [102] and Microsoft Research [104] The Montreal-Toronto United team [103] and another Microsoft Research team [105] tied for the third place in the overall score, and the University of California, Berkeley [101] won the fifth place.

5.2 Domestic Research Status

The research on image-to-text generation technology in the domestic academic circles is relatively late, and most research institutes focus on tasks such as semantic annotation and retrieval of cross-media data. For example, in the ImageCLEF evaluation organized by the European Union in 2015, Renmin University and Tencent won the first place in the Image Sentence Generation task.

In terms of industry, scientific research institutions such as Baidu and Tencent are also relying on their own research advantages in cross-media semantic annotation, classification and retrieval, and gradually carry out research work in related directions. It also achieved good results in the image caption automatic generation task evaluated by LSUN.

5.3 Development Trend and Prospect

The generation technology from image to text requires the integration of pattern recognition and machine learning, computer vision, natural language processing, and even research results in the field of cognitive science, which has extremely high theoretical research value and practical prospects. To a certain extent, this technology, together with tasks such as image semantic annotation, has become a way for major top scientific research institutions to compete in the comprehensive research strength in the field of artificial intelligence, which will surely promote its rapid development.

For this task itself, the bigger challenge still lies in how to correctly extract the content of the image, and at the same time choose an appropriate expression method according to human language habits to convert the image content into natural language sentences. It should be pointed out that the current research is still focused on whether the concept of the object in the image is fully extracted, whether the correct words are selected, whether the generated sentences conform to the grammatical habits, etc.; it can be foreseen that in the near future, practical application scenarios and contexts Constraints such as context will further promote the progress of related technologies, and will be widely used in news communication, online education, smart home and other fields.

6. Summary and Outlook

This paper gives a comprehensive introduction to automatic text generation technology, including text-to-text generation, meaning-to-text generation, data-to-text generation, image-to-text generation, etc. Since each of the above-mentioned technologies has been studied by many researchers and relevant academic achievements have emerged in an endless stream, it is inevitable that there are some omissions in the summary of this article. It is hoped that the content of this article can be helpful to relevant researchers and practitioners.

Comparing the international and domestic research status of automatic text generation technology, it can be seen that domestic research input and output in this field are far from enough, and original methods, resources, systems and products are relatively scarce, and This area has not received enough attention from the industry. We must catch up, build relevant Chinese resources, propose original text generation methods, build Chinese text generation systems and develop related products, in order to occupy the commanding heights of Chinese text generation. We expect the first Chinese text generation system to be developed by a domestic unit.

references

[1] Luhn, H. P. (1958). The automatic creation of literature abstracts. IBM Journal of research and development, 2(2), 159-165.

[2] Lin, C. Y., & Hovy, E. (2002, July). From single to multi-document summarization: A prototype system and its evaluation. In Proceedings of the 40th Annual Meeting on Association for Computational

Linguistics (pp. 457-464). Association for Computational Linguistics.

[3] Evans, D. K., Klavans, J. L., & McKeown, K. R. (2004, May). Columbia newsblaster: multilingual news summarization on the Web. In Demonstration Papers at HLT-NAACL 2004 (pp. 1-4). Association for Computational Linguistics.

[4] Radev, D., Otterbacher, J., Winkel, A., & Blair-Goldensohn, S. (2005). NewsInEssence: summarizing online news topics. Communications of the ACM, 48(10), 95-98.

[5] Shen, D., Sun, J. T., Li, H., Yang, Q., & Chen, Z. (2007, January). Document Summarization Using Conditional Random Fields. In IJCAI (Vol. 7, pp. 2862-2867).

[6] Conroy, J. M., & O’leary, D. P. (2001, September). Text summarization via hidden markov models. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 406-407). ACM.

[7] Schilder, F., & Kondadadi, R. (2008, June). FastSum: fast and accurate query-based multi-document summarization. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers (pp. 205-208). Association for Computational Linguistics.

[8] Ouyang, Y., Li, W., Li, S., & Lu, Q. (2011). Applying regression models to query-focused multi-document summarization. Information Processing & Management, 47(2), 227-237.

[9] Cao, Z., Wei, F., Dong, L., Li, S., & Zhou, M. (2015, February). Ranking with recursive neural networks and its application to multi-document summarization. In Twenty-Ninth AAAI Conference on Artificial Intelligence.

[10] Carbonell, J., & Goldstein, J. (1998, August). The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval (pp. 335-336). ACM.

[11] Bollegala, D., Okazaki, N., & Ishizuka, M. (2010). A bottom-up approach to sentence ordering for multi-document summarization. Information processing & management, 46(1), 89-109.

[12] McDonald, R. (2007). A study of global inference algorithms in multi-document summarization (pp. 557-564). Springer Berlin Heidelberg.

[13] Gillick, D., & Favre, B. (2009, June). A scalable global model for summarization. In Proceedings of the Workshop on Integer Linear Programming for Natural Langauge Processing (pp. 10-18). Association for Computational Linguistics.

[14] Li, C., Qian, X., & Liu, Y. (2013, August). Using Supervised Bigram-based ILP for Extractive Summarization. In ACL (1) (pp. 1004-1013).

[15] Lin, H., & Bilmes, J. (2010, June). Multi-document summarization via budgeted maximization of submodular functions. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (pp. 912-920). Association for Computational Linguistics.

[16] Lin, H., & Bilmes, J. (2011, June). A class of submodular functions for document summarization. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1 (pp. 510-520). Association for Computational Linguistics.

[17] Qian, X., & Liu, Y. (2013). Fast Joint Compression and Summarization via Graph Cuts. In EMNLP (pp. 1492-1502).

[18] Li, C., Liu, Y., Liu, F., Zhao, L. & Weng, F. (2014). Improving Multi-documents Summarization by Sentence Compression based on Expanded Constituent Parse Trees. In EMNLP.

[19] Berg-Kirkpatrick, T., Gillick, D., & Klein, D. (2011, June). Jointly learning to extract and compress. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1 (pp. 481-490). Association for Computational Linguistics.

[20] Barzilay, R., & McKeown, K. R. (2005). Sentence fusion for multidocument news summarization. Computational Linguistics, 31(3), 297-328.

[21] Bing L., Li P., Liao Y., Lam W., Guo W., & Passonneau R. J. (2015). Abstractive Multi-Document Summarization via Phrase Selection and Merging. In ACL.

[22] Liu, F., Flanigan, J., Thomson, S., Sadeh, N., & Smith, N. A. (2015). Toward Abstractive Summarization Using Semantic Representations. In NAACL.

[23] Abu-Jbara, A., & Radev, D. (2011, June). Coherent citation-based summarization of scientific papers. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1 (pp. 500-509). Association for Computational Linguistics.

[24] Saif Mohammad, Bonnie Dorr, Melissa Egan, Ahmed Hassan, Pradeep Muthukrishan, Vahed Qazvinian, Dragomir Radev, and David Zajic. 2009. Using citations to generate surveys of scientific paradigms. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 584-592. Association for Computational Linguistics.

[25] Nenkova, A., & McKeown, K. (2012). A survey of text summarization techniques. In Mining Text Data (pp. 43-76). Springer US.

[26] Knight, K., & Marcu, D. (2002). Summarization beyond sentence extraction: A probabilistic approach to sentence compression. Artificial Intelligence, 139(1), 91-107.

[27] Cohn, T., & Lapata, M. (2008, August). Sentence compression beyond word deletion. In Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1 (pp. 137-144). Association for Computational Linguistics.

[28] Knight, K., & Marcu, D. (2000, August). Statistics-based summarization-step one: Sentence compression. In AAAI/IAAI (pp. 703-710).

[29] McDonald, R. T. (2006, April). Discriminative Sentence Compression with Soft Syntactic Evidence. In EACL.

[30] Cohn, T. A., & Lapata, M. (2009). Sentence compression as tree transduction. Journal of Artificial Intelligence Research, 637-674.

[31] Clarke, J., & Lapata, M. (2008). Global inference for sentence compression: An integer linear programming approach. Journal of Artificial Intelligence Research, 399-429.

[32] Thadani, K., & McKeown, K. (2013). Supervised sentence fusion with single-stage inference. In Proceedings of the Sixth International Joint Conference on Natural Language Processing (pp. 1410-1418).

[33] Elsner, M., & Santhanam, D. (2011, June). Learning to fuse disparate sentences. In Proceedings of the Workshop on Monolingual Text-To-Text Generation (pp. 54-63). Association for Computational Linguistics.

[34] Filippova, K. (2010, August). Multi-sentence compression: finding shortest paths in word graphs. In Proceedings of the 23rd International Conference on Computational Linguistics (pp. 322-330). Association for Computational Linguistics.

[35] Barzilay, R., & Lee, L. (2003, May). Learning to paraphrase: an unsupervised approach using multiple-sequence alignment. In Proceedings of the 2003 Conference of the North American Chapter of

the Association for Computational Linguistics on Human Language Technology-Volume 1 (pp. 16-23). Association for Computational Linguistics.

[36] Fujita, A., Inui, K., & Matsumoto, Y. (2005). Exploiting lexical conceptual structure for paraphrase generation. IJCNLP 2005, LNAI 3651, pp. 908-919.

[37] Quirk, C., Brockett, C., & Dolan, W. B. (2004, July). Monolingual Machine Translation for Paraphrase Generation. In EMNLP (pp. 142-149).

[38] Duboue, P. A., & Chu-Carroll, J. (2006, June). Answering the question you wish they had asked: The impact of paraphrasing for question answering. In Proceedings of the Human Language Technology

Conference of the NAACL, Companion Volume: Short Papers (pp. 33-36). Association for Computational Linguistics.

[39] Max, A. (2009, August). Sub-sentential paraphrasing by contextual pivot translation. In Proceedings of the 2009 Workshop on Applied Textual Inference (pp. 18-26). Association for Computational Linguistics.

[40] Wubben, S., Van Den Bosch, A., & Krahmer, E. (2012, July). Sentence simplification by monolingual machine translation. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1 (pp. 1015-1024). Association for Computational Linguistics.

[41] Zhu, Z., Bernhard, D., & Gurevych, I. (2010, August). A monolingual tree-based translation model for sentence simplification. In Proceedings of the 23rd international conference on computational linguistics (pp. 1353-1361). Association for Computational Linguistics.

[42] Woodsend , K. , & Lapata , M. (2011, April). WikiSimple: Automatic Simplification of Wikipedia Articles. In AAAI.

[43] Wan, X., Yang, J., & Xiao, J. (2007, January). Manifold-Ranking Based Topic-Focused Multi-Document Summarization. In IJCAI (Vol. 7, pp. 2903-2908).

[44] Wan, X., & Yang, J. (2008, July). Multi-document summarization using cluster-based link analysis. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval (pp. 299-306). ACM.

[45] Wan, X., & Zhang, J. (2014, July). CTSUM: extracting more certain summaries for news articles. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval (pp. 787-796). ACM.

[46] Yan, S., & Wan, X. (2014). SRRank: leveraging semantic roles for extractive multi-document summarization. Audio, Speech, and Language Processing, IEEE/ACM Transactions on, 22(12), 2048-2058.

[47] Jin-ge Yao, Xiaojun Wan, Jianguo Xiao. (2015). Compressive Document Summarization via Sparse Optimization. In IJCAI.

[48] Yan, R., Wan, X., Otterbacher, J., Kong, L., Li, X., & Zhang, Y. (2011, July). Evolutionary timeline summarization: a balanced optimization framework via iterative substitution. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval (pp. 745-754). ACM.

[49] Wan, X. (2011, June). Using bilingual information for cross-language document summarization. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1 (pp. 1546-1555). Association for Computational Linguistics.

[50] Wan, X., Jia, H., Huang, S., & Xiao, J. (2011, July). Summarizing the differences in multilingual news. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval (pp. 735-744). ACM.

[51] Hu, Y., & Wan, X. (2015). PPSGen: Learning-Based Presentation Slides Generation for Academic Papers. Knowledge and Data Engineering, IEEE Transactions on, 27(4), 1085-1097.

[52] Hu, Y., & Wan, X. (2014). Automatic Generation of Related Work Sections in Scientific Papers: An Optimization Approach. In EMNLP.

[53] Yao, J. G., Wan, X., & Xiao, J. (2014). Joint Decoding of Tree Transduction Models for Sentence Compression. In EMNLP.

[54] Huang, M., Shi, X., Jin, F., & Zhu, X. (2012, July). Using first-order logic to compress sentences. In Twenty-Sixth AAAI Conference on Artificial Intelligence.

[55] Shiqi Zhao, Cheng Niu, Ming Zhou, Ting Liu, and Sheng Li. 2008. Combining Multiple Resources to Improve SMT-based Paraphrasing Model. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-08: HLT), pages 1021-1029.

[56] Shiqi Zhao, Haifeng Wang, Xiang Lan, and Ting Liu. 2010. Leveraging Multiple MT Engines for Paraphrase Generation. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), pages 1326-1334.

[57] Shiqi Zhao, Xiang Lan, Ting Liu, Sheng Li. 2009. Application-driven Statistical Paraphrase Generation.

In Proceedings of Joint conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL-IJCNLP 2009), pages 834-842.

[58] Wei Lu; Hwee Tou Ng. 2011. A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions. In Proceedingds of the 2011 Conference on Empirical Methods in Natural Language Processing.

[59] Mark Steedman. 2000. The Syntactic Process. MIT Press.

[60] Carl Pollard, Ivan A. Sag. 1994. Head-Driven Phrase Structure Grammar. University of Chicago Press.

[61] Stuart M. Shieber. 1988. A uniform architecture for parsing and generation. In Proceedings of the 12th International Conference on Computational Linguistics.

[62] Martin Kay. 1996. Chart Generation. In Proceedings of the 34th annual meeting on Association for Computational Linguistics.

[63] Stuart M. Shieber, Gertjan van Noord, Fernando C. N. Pereira, and Robert C. Moore. 1990. Semantic-head–driven generation. Computational Linguistics.

[64] Dan Flickinger. 2002. On building a more efficient grammar by exploiting types. Collaborative Language Engineering.

[65] Carroll, J., & Oepen, S. (2005). High efficiency realization for a wide-coverage unification grammar. In Natural Language Processing–IJCNLP 2005 (pp. 165-176). Springer Berlin Heidelberg.

[66] Luke S. Zettlemoyer and Michael Collins. 2005. Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars. In Proceedings of UAI.

[67] Michael White and Jason Baldridge. 2003. Adapting Chart Realization to CCG. In Proc. of the 9th European Workshop on Natural Language Generation.

[68] Michael White. 2004. Reining in CCG Chart Realization. In Proc. of the 3rd International Conference on Natural Language Generation.

[69] Michael White. 2006. CCG Chart Realization from Disjunctive Inputs. In Proc. of the 4th International Conference on Natural Language Generation (INLG-06).

[70] Michael White, Rajakrishnan Rajkumar and Scott Martin. 2007. Towards Broad Coverage Surface Realization with CCG. In Proc. of the 2007 Workshop on Using Corpora for NLG: Language Generation and Machine Translation.

[71] David Chiang. 2005. A Hierarchical Phrase-Based Model for Statistical Machine Translation. In Proceedings of the 43rd annual meeting on Association for Computational Linguistics.

[72] Yuk Wah Wong; Raymond Mooney. 2007. Generation by Inverting a Semantic Parser that Uses Statistical Machine Translation. In Proceedings of Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics.

[73] Terry Koo, Alexander M. Rush, Michael Collins, Tommi Jaakkola, and David Sontag. 2010. Dual Decomposition for Parsing with Non-Projective Head Automata. In Proceedings of EMNLP 2010.

[74] Alexander M. Rush and Michael Collins. 2011. Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation. In Proceedings of ACL 2011.

[75] Yi Zhang, Hans-Ulrich Krieger. 2011. Large-Scale Corpus-Driven PCFG Approximation of an HPSG.

In Proceedings of 12th International Conference on Parsing Technologies.

[76] Reiter, E. & Dale, R. (2000). Building natural language generation systems (Vol. 33). Cambridge: Cambridge university press.

[77] Reiter, E. (2007, June). An architecture for data-to-text systems. In Proceedings of the Eleventh European Workshop on Natural Language Generation (pp. 97-104). Association for Computational Linguistics.

[78] Goldberg, E., Driedger, N., & Kittredge, R. (1994). Using natural-language processing to produce weather forecasts. IEEE Expert, 9(2), 45-53.

[79] Sripada, S., Reiter, E., & Davy, I. (2003). SumTime-Mousam: Configurable marine weather forecast generator. Expert Update, 6(3), 4-10.

[80] Reiter, E., Sripada, S., Hunter, J., Yu, J., & Davy, I. (2005). Choosing words in computer-generated weather forecasts. Artificial Intelligence, 167(1), 137-169.

[81] Belz, A. (2008). Automatic generation of weather forecast texts using comprehensive probabilistic generation-space models. Natural Language Engineering, 14(04), 431-455.

[82] Belz, A., & Kow, E. (2009, March). System building cost vs. output quality in data-to-text generation. In Proceedings of the 12th European Workshop on Natural Language Generation (pp. 16-24). Association for Computational Linguistics.

[83] Bohnet, B., Lareau, F., & Wanner, L. (2007). Automatic production of multilingual environmental information. In Proceedings of the 21st Conference on Informatics for Environmental Protection (EnviroInfo-07), Warsaw, Poland.

[84] Kukich, K. (1983, June). Design of a knowledge-based report generator. In Proceedings of the 21st annual meeting on Association for Computational Linguistics (pp. 145-150). Association for Computational Linguistics.

[85] Kahn, M. G., Fagan, L. M., & Sheiner, L. B. (1991). Combining physiologic models and symbolic methods to interpret time-varying patient data. Methods of information in medicine, 30(3), 167-178.

[86] Hüske-Kraus, D. (2003, April). Suregen-2: A shell system for the generation of clinical documents. In Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics-Volume 2 (pp. 215-218). Association for Computational Linguistics.

[87] Portet, F., Reiter, E., Gatt, A., Hunter, J., Sripada, S., Freer, Y., & Sykes, C. (2009). Automatic generation of textual summaries from neonatal intensive care data. Artificial Intelligence, 173(7), 789-816.

[88] B. Yao, X. Yang, L. Lin, M. W. Lee, and S.-C. Zhu. 2010. I2t:image parsing to text description. IEEE Xplore.

[89] Y. Feng and M. Lapata, “How Many Words Is a Picture Worth? Automatic Caption Generation for News Images,” Proc. Assoc. for Computational Linguistics, pp. 1239-1249, 2010.

[90] Y. Feng and M. Lapata. 2013. Automatic caption gen- eration for news images. IEEE Trans. Pattern Anal. Mach. Intell., 35.

[91] Y. Yang, CL Teo, H. Daume´ III, and Y. Aloimonos. Corpus-guided sentence generation of natural images. In EMNLP , 2011

[92] G. Kulkarni, V. Premraj, S. Dhar, S. Li, Y. Choi, A. C. Berg, and T. L. Berg. 2011. Baby talk: Understanding and generating image descriptions. In CVPR.

[93] Kulkarni, Girish, Premraj, Visruth, Ordonez, Vicente, Dhar, Sag- nik, Li, Siming, Choi, Yejin, Berg, Alexander C, and Berg, Tamara L. Babytalk: Understanding and generating simple im- age descriptions. PAMI, IEEE Transactions on, 35(12):2891– 2903, 2013.

[94] Mitchell, Margaret, Han, Xufeng, Dodge, Jesse, Mensch, Alyssa, Goyal, Amit, Berg, Alex, Yamaguchi,

Kota, Berg, Tamara, Stratos, Karl, and Daume ́ III, Hal. Midge: Generating im- age descriptions from computer vision detections. In European Chapter of the Association for Computational Linguistics, pp. 747–756. ACL, 2012.

[95] Elliott, Desmond and Keller, Frank. Image description using vi- sual dependency representations. In EMNLP, pp. 1292–1302, 2013.

[96] Hodosh, Micah, Young, Peter, and Hockenmaier, Julia. Framing image description as a ranking task:

Data, models and evalu- ation metrics. Journal of Artificial Intelligence Research, pp. 853–899, 2013.

[97] A. Karpathy and L. Fei-Fei. Deep visual-semantic align- ments for generating image descriptions. CVPR, 2015.

[98] R. Socher, A. Karpathy, Q. V. Le, C. D. Manning, and A. Y. Ng. Grounded compositional semantics for finding and de- scribing images with sentences. TACL, 2014.

[99] X. Chen and C. L. Zitnick. Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation. CVPR, 2015

[100] Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang and Alan L. Yuille, Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN), ICLR 2015

[101] J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell.

Long-term recur- rent convolutional networks for visual recognition and de- scription. CVPR, 2015.

[102] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. CVPR, 2015.

[103] Xu, K., Ba, J., Kiros, R., Courville, A., Salakhutdinov, R., Zemel, R., & Bengio, Y. (2015). Show, attend and tell: Neural image caption generation with visual attention. In ICML.

[104] H. Fang, S. Gupta, F. Iandola, R. Srivastava, L. Deng, P. Dolla ́r, J. Gao, X. He, M. Mitchell, J. Platt, C.

L. Zitnick, and G. Zweig. From captions to visual concepts and back. CVPR, 2015

[105] Jacob Devlin, Hao Cheng, Hao Fang, Saurabh Gupta, Li Deng, Xiaodong He, Geoffrey Zweig,

Margaret Mitchell, Language Models for Image Captioning: The Quirks and What Works, arXiv 2015

About the Author

Dr. Wan Xiaojun is a researcher and doctoral supervisor at the Institute of Computer Science and Technology, Peking University. The main research directions are natural language processing and text mining. Email: [email protected]

Feng Yansong, Ph.D., is a lecturer at the Institute of Computer Science and Technology, Peking University. The main research direction is natural language processing. Email: [email protected]

Dr. Weiwei Sun is a lecturer at the Institute of Computer Science and Technology, Peking University. The main research direction is computational linguistics. Email: [email protected]

Note: Sections 1, 2, 4, and 6 of this article were written by Wan Xiaojun, Section 3 was written by Sun Weiwei, and Section 5 was written by Feng Yansong. Doctoral student Yao Jinge participated in the proofreading work.

[1] http://duc.nist.gov/

[2] http://www.nist.gov/tac/

[3] http://www.berouge.com

[4] http://www1.cs.columbia.edu/nlp/newsblaster/

[5] http://lada.si.umich.edu:8080/clair/nie1/nie.cgi

[6] http://en.wikipedia.org

[7] http://simple.wikipedia.org

[8] http://www.863data.org.cn

[9] http://tcci.ccf.org.cn/conference/2015/pages/page05_evadata.html

[10] http://www.delph-in.net/erg/

[11] https://github.com/OpenCCG/openccg

[12] https://www.arria.com/

[13] http://automatedinsights.com

[14] http://www.narrativescience.com

[15] http://www.forbes.com/sites/narrativescience

Research progress and trend of automatic text generation

Guess you like