Dynamic memory network: one step closer to general-purpose NLP

image

Model Semantic memory module

Semantic memory module refers to word embedding (word vector representation), such as Glove vector, that is, the vector that the input text is converted into before being passed to the input module.

 Input module

image

The input module refers to the standard GRU (or BiGRU), and the final hidden state of each sentence is clearly accessible.

 Problem module

image

The question module is also a standard GRU, in which the question to be answered is used as an input item, and the final hidden state is accessible.

 Episodic memory module

image

This module allows input data to be fed forward multiple times. At each feedforward, the sentence embedding in the input module is passed as input to the GRU in the episodic memory module. At this time, each sentence embedded representation will be given a weight, which corresponds to the relevance of the question being asked.

For different feedforwards, sentence embedding representations will be given different weights. For example, in the following example:

image

Since sentence (1) is not directly related to the question, it may not be given high weight the first time. However, in the first feed forward, the model found that football is related to John, so in the second feed forward, sentence (1) is given a higher weight.

In the first feedforward (or the first "episode"), the question embedding'q' is used to calculate the attention score of the sentence embedding from the input module. Then, input the attention score of sentence sᵢ into the softmax layer (making the total attention score 1) or a single sigmoid unit to obtain gᵢ. gᵢ is the weight given to the sentence sᵢ and is used as a global gate for the output of GRU in timestep i.

The hidden states of timestep i and episode t are calculated as follows:

image

When g = 0, the hidden state is directly copied:

image

In the paper, mᵗ is ​​used to represent the final hidden state of the t-th episode of GRU, which can be regarded as the fact aggregation found in the t-th episode. Starting from the second episode, mᵗ is ​​used to calculate the attention score of the sentence embedding representation and question embedding representation q in the t+1th episode.

The calculation process is as follows:

image

The paper uses many simple measurement methods to calculate the similarity between sᵢ and q and sᵢ and mᵗ-1, namely element multiplication and absolute value. Then input the connected result into a 2-layer neural network to calculate the attention score of sᵢ. For the first episode, m⁰ is replaced with q.

The number of episodes can be a fixed, predefined number, or it can be determined by the network itself. In the latter case, a special end-of-passes representation is attached to the input. If the gate function selects the vector, it stops the iteration.

 Answer module

image

The answer module is composed of the decoder GRU. At each timestep, the previous output will be entered into the module as an input along with the question embedding representation.

image

Then use the standard softmax on the vocabulary to generate the output.

The decoder is initialized by a function of m vectors (the last hidden state calculated by the GRU from the episodic memory module).

Sentiment analysis application

When the paper was published, its model achieved the most advanced results in the field of sentiment analysis at that time.

image

For example, in the following example, the model will focus on all adjectives. When only 1 feedforward is allowed, the model will produce incorrect predictions. However, when two feedforwards are allowed, the model will pay very high attention to the positive adjectives during the second feedforward and produce correct predictions.

image

(Emotional attention analysis)

Performance of other data sets

image

image

Replacement module

An important advantage of modularity is that one module can be replaced with another without modifying any other modules, as long as the replacement module has the correct interface.

The paper "Dynamic Memory Network for Visual and Text Question Answering" shows the role of dynamic memory network in answering questions based on images.

The input module is replaced by another module that extracts feature vectors from the image based on a CNN-based network. Then the extracted feature vector will be input to the event memory module as before.

image

image

image


Guess you like

Origin blog.51cto.com/15060462/2678954