"Natural Language Processing (NLP)" [Borealis AI] cross-domain text coherence generate neural network model! !

Source: AINLPer micro-channel public number (click to look at it)
Edit: ShuYini
proofreading: ShuYini
Time: 2020-1-12

TILE: A Cross-Domain Transferable Neural Coherence Model
Contributor : Borealis AI
Paper: https://www.aclweb.org/anthology/P19-1067.pdf
Code: None

Thesis

Coherence is an important aspect of Quality Assessment, but also ensure the readability of the key. An important limitation of the existing model of consistency, the training will not be easily extended to other categories of text fields in a domain. Previous work (Li and Jurafsky, 2017) claims to generate cross-domain model generalization, because the model is determined, the training process to distinguish the sentence incoherent spatial order is too large. In this work, we propose a partial identification of neural model with a smaller sampling of negative space, the model can effectively learn the simple mistake of ordering and model structure, and significantly better than the previous standard on Wall Street benchmark datasets Several new challenging environment of the latest method journals corpus, as well as the transfer to the Wikipedia article discusses the invisible category.

Contributed articles

1, to correct such a misunderstanding, that is, the use of new local Discriminating NN model, the model can not distinguish a good overview of cross-domain consistency score
2, proposes a set of more complex cross-domain consistency assessment data set . .
3, the proposed method are closing in on the previous domain WSJ WSJ data sets and data set of all open fields is much better than previous methods , so that the decision on the modeling techniques more step.
4, the proposed method even with the most simple sentences encoder (average GloVe), often better than the previous method, but by using a more powerful encoder can achieve higher accuracy .

The main content of the article

Model Introduction

In this paper, the local coherence discriminant model (LCD), it is assumed that the operation is global document discrimination can be approximated by fractional coherence between successive sentences on average. (The author has done for this hypothesis verification) For now, this will allow us to simplify learning problems projection as follows: The training documents (assumed to be coherent) consecutive sentence for $(s_i，s_{i + 1})$ And incoherent sentences for $(s_i，s_0)$ Differentiate (to be negative for the structure).
** training objectives: ** Officially, this discriminant model $f_th (.,.)$ A sentence and returns to a score. The higher the score, the higher the consistency of input. So our training objectives are: ** loss function: ** verified by experiments, we found that the loss of the edge of the problem better. Specifically, L takes the following form: $L（f^+，f^-）= max(0，η-f^+ + f^-)$ , where η is the super-balance parameter.
** Negative samples: ** Technically, we are free to choose any sentence s0 and si a negation right. However, due to potential differences in type, theme and style of writing, these negative factors may cause the discriminant model learning has nothing to do with the continuity of clues. Therefore, we only select the same document to construct a sentence negative right.

Model structure

For $f_th$ The specific neural architecture as shown in FIG. Here we assume that the use of some pre-training sentences encoder.
Given an input sentence pair, the encoder maps these sentences to sentences in real vector S and T. Then, we compute the following series of features: the series (1) two vectors (S, T); and (2) an element-wise difference S -T; (3) an element-wise product S * T; (4) the difference in direction of the element the absolute value | ST |. The feature is then cascaded to a feed to a single output MLP coherence scores.
In practice, we have an input (S, T) of the forward model and the reverse model has the same architecture with input (T, S) but separate parameters to make our overall consistency model to become a two-way through training. So, consistency score is the average of the two models.
Herein may be used with any model pre-trained sentence encoder, fitted from simple to more complex average GloVe supervised or unsupervised pre-training sentences coder. As mentioned in the introduction, since the models are typically generated can be converted to the sentence encoder, thus generating a coherent model our model can be utilized to benefit from the advantages of generating training and discrimination training. After initialization, we freeze the model parameters generated in order to avoid over-fitting.

Experimental results

Based on Wikipedia created a new data set, and designed three more and more difficult to assess cross-domain protocol. Based DBpedia3 defined, we chose seven different categories in the Person field, select a category from three other unrelated fields. We resolve all the articles in these categories and extract paragraphs contain multiple sentences of 10 to serve as a training and evaluation paragraph. Statistical data of the data set as shown in the "Wall Street Journal" data set identification and assessment of the accuracy of the insertion task. Wiki-A decision model and then on accuracy. Decision model and then Wiki-C accuracy on. Paragraph sequence reconstruction result.