Evolution and Practice Jingdong intelligent content creation algorithm: based on keywords automatically generated summary

Source | Jingdong chi cloud developers

REVIEW: AI technical capabilities Jingdong Mall behind Secret: based on keywords automatically generated summary

Over the past few decades, human computing power was a great upgrade; As data accumulate, increasingly sophisticated algorithm, we have entered the era of artificial intelligence. Indeed, it is difficult to understand the concept of artificial intelligence, technology is great, data and algorithms behind the very large and complex. Many people are wondering, what practical applications now or in the future AI will it?

In fact, on the practical application of AI and the business value and not so "fantasy", often already at our side. Next, [ reading the paper AI ] column will,, AI technology is how to energize, and associated landing and practice from the field of electronic business for everyone Secret shallow depth by reading papers related to AI.

Artificial intelligence technology in the field of electronic business, has a wealth of application scenarios. Scenario is the data entry, the data obtained by the refining technique, in turn, acts on the technology, two complement each other.

Jingdong based on natural language understanding and knowledge map technology developed AI commodity marketing content writing services. And apply this technology to find better goods Jingdong Mall [] channel.

Jingdong [channel] found that good stuff

Photo by hundreds of thousands of commodity marketing material AI creation, not only to fill the huge gap between updates and product updates of people writing content, but also enhance the richness of the contents of the channel.

Meanwhile, AI-generated content in terms of exposure CTR, conversion rate and other details into the business actually showed a better than artificial creation of content marketing.

Let's let me through reading selected  2020 AAAI  papers to take a look at how to implement the different groups using different marketing strategies and different styles of marketing copy to improve marketing conversion by AI.

Automatic Text Summarization (referred to as "automatic abstracting") is a traditional role in the field of natural language processing, which is proposed in the 1950s. Target automatic summarization tasks for a given text, access to some of the most important of which include a simplified text information. Commonly used methods include automatic summarization automatic extraction Abstracts (Extractive Summarization) and automatic summarization formula (Abstractive Summarization). Removable automatic summarization by extracting a given text already exists in the key words, phrases or sentences digest; generating automatic abstracts through the establishment of a given text abstract semantic representation, generation technology, natural language generation Abstract.

Described in this article is based on a summary sentence formula keyword guidance, which combines extraction and automatic abstract formula Automatic Abstracting, compared with the comparative model on Gigaword sentence summary data set, and achieve better performance.

Papers link:

http://box.jd.com/sharedInfo/B2234BB08E365EEC

 

Generative sentence summary

Summary input sentence formula (Abstractive Sentence Summarization) task is a long sentence, the output of the input sentence is simplified phrase.

We note, enter some important words (ie, keywords) sentence provides guidance cues to generate summary. On the other hand, when people in the creative input sentence summary, also tend to first identify the key words in the input sentence, and then organize these language keywords connected in series. Finally, the generated content will not only cover these keywords, but also to ensure its fluency and grammatical correctness. We believe that, compared to the pure removable automatic summarization and automatic generation abstract, keyword-based guidance for generating automatic abstracts closer to the habit of creative people digest.

FIG 1 : keyword overlap between the input sentence and the reference digest ( red mark ) covers the critical information input sentence, we can generate a summary based on keywords extracted from the input sentence

We give an example of a simple sentence summary. 1, we can substantially overlap the input sentence and word reference digest (excluding stop words) as a keyword, word covers the overlapping point input sentence. For example, we keyword "world leaders", "close" and "Chernobyl", you can get the gist of the information input sentence, that "world leaders called for closure of the Chernobyl", which is the actual reference summary "urged world leaders to support the Chernobyl nuclear power plant closure plan" is consistent. This phenomenon is common in sentence summary task: On Gigaword sentence summary data set, words refer to the summary of more than half will appear in the input sentence.

Model Overview

Enter sentence summary task for a longer sentence, the output is a brief summary of the text. Our motivation is the key word in the input text can provide important guidance for automatic abstracting system. First of all, we will enter the word overlap between the text and the reference digest (except disabled word) as the Ground-Truth keyword, by the way multi-task learning, sharing the same encoder to encode the input text, keyword extraction model train and a digest generation model, wherein the model is based on keyword extraction sequence encoder state denoted hidden layer model, the digest generation models are based on end-keyword model guidance. After the keyword extraction model and abstract generation model are trained convergence, we use the keyword extraction model trained on the training set text to extract keywords, use keywords drawn to fine-tune the model to generate summaries. When testing, we first use the keyword extraction model for extracting keyword text test set, eventually using the extracted keywords, and the original text generation test summary.

1, multi-task learning

Text summary tasks and keyword extraction tasks are very similar in a sense, it is to extract the text input key information. Different points in the form of its output: the output of text summarization task was a complete text, keyword extraction task output is a set of keywords. We believe that these two tasks require the encoder to identify important capability information of the input text. Therefore, we use multi-task learning framework, these two tasks share the encoder to enhance the performance of the encoder.

2, keyword-based guidance digest generation model

We work by Zhou et al. [1] inspired, proposes a keyword-based guidance selective coding. Specifically, since more important information containing the keyword, keyword guidance, we constructed a select gate network, which encodes the secondary semantic information input text hidden layer, construct a new hidden layer. Subsequent decoding based on this new hidden layer.

We decoder based network Pointer-Generator [2] , i.e., end to end fusion model replication mechanism. For Generator module, we present direct way door fusion and hierarchical integration of the original input text and the keyword context information fusion; Pointer to the module, our model can be selectively input text and keywords in the original copied to the output summary.

Experiment and Analysis

1, a data set

In this experiment, we chose to experiment on Gigaword data set, the data set contains approximately 3.8 million training sentence summary right. We used 8000 pairs as a validation set, 2000 pairs as a test set.

2, the experimental results

Table 1 shows the model we propose is better than no model keyword performance guidance. We tested different selectivity encoding schemes are selected from the input text, keyword selection and mutual selection, experimental results show that the best mutual selection; for Generator module, we have found the way hierarchical integration better than other two fusion way; our bi-Pointer module better performance than the original model can be copied from the input text.

Table 1

  

to sum up

This paper aims generative sentence summary of the task, that is, how long will a sentence converted into a short summary. Our proposed model can use keywords as a guide to produce more high-quality summary, obtained better results than the comparative model.

1) to extract keywords and generate summary by using multi-task learning framework;

2) through access to important information during the encoding process based on the selective coding strategy keyword;

3) through a dual focus mechanism, a dynamic blend of original input sentence and keyword information;

4) by dual replication mechanism, copy the original input words in sentences and keywords to the output summary.

On the standard sentence summary data set, we verify the effectiveness of the keyword sentence summary task.

Comment:

[1]  Zhou, Q.; Yang, N.; Wei, F.; and Zhou, M. 2017. Selective encoding for abstractive sentence summarization. In Proceedings of ACL, 1095–1104.

[2] See, A.; Liu, P. J.; and Manning, C. D. 2017. Get to the point: Summarization with pointer-generator networks. In Proceedings of ACL, 1073–1083.

At the same time, open to all developers under the Fanger Wei code scanning fill out the "big developers and AI research", just 2 minutes, you can harvest value of 299 yuan, "AI developers million people congress" live online tickets!

推荐阅读全球呼吸机告急!医疗科技巨头美敦力“开源”设计图和源代码京东商城背后的AI技术能力揭秘 - 基于关键词自动生成摘要互联网之父确诊新冠,一代传奇:任谷歌副总裁、NASA 访问科学家微软为一人收购一公司?破解索尼程序、写黑客小说,看他彪悍的程序人生!在Kubernetes上部署一个简单的、类PaaS的平台,原来这么容易!2020年,这20个大家都认识的加密交易所过得怎么样?
你点的每个“在看”,我都认真当成了AI
Released 1375 original articles · won praise 10000 + · views 6.85 million +

Guess you like

Origin blog.csdn.net/dQCFKyQDXYm3F8rB0/article/details/105320698