Re43: Read the paper DNS Deep Neural Solver for Math Word Problems

The gods are silent-personal CSDN blog post directory

Full name of the paper: Deep Neural Solver for Math Word Problems
model has no official abbreviation, but it is abbreviated as DNS in (2020 COLING) Solving Math Word Problems with Multi-Encoders and Multi-Decoders

Paper link: https://aclanthology.org/D17-1088/

This article is a 2017 EMNLP paper, focusing on MWP issues.
It is the first paper to solve the MWP problem with a neural network, directly mapping the problem into a formula with RNN. Then use a combination of RNN and similarity-based retrieval model, when the similarity score of the retrieval model is higher than the threshold, use the formula template of the retrieval result, otherwise use RNN.

1. Background

insert image description here

insert image description here

The introduction part is too lazy to read.

An interesting reference is (2016 ACL) How well do Computers Solve Math Word Problems? Large-Scale Dataset Construction and Evaluation found that simple similarity-based methods can already exceed most statistical learning models.

2. Model

insert image description here
number mapping→number identification→retrieval→directly apply the formula template or use the seq2seq model

I am too lazy to write some model hyper-parameter details, it is still quite conventional-RNN.

Variables: V p = { v 1 , … , vm , x 1 , … , xk } V_p=\{v_1,\dots,v_m,x_1,\dots,x_k\}Vp={ v1,,vm,x1,,xk} (known number and unknown variable)

2.1 Data preprocessing

number mapping
maps formulas to formula templates: replace known numbers with number tokens

Significant number identification
Considering that not all numbers are used, so only focus on important numbers: use LSTM for binary classification (input is number and context)

insert image description here

insert image description here

2.2 RNN based Seq2seq Model

insert image description here

Encoding and decoding use GRU and LSTM respectively

If the activation function uses softmax directly, it will lead to illegal symbols. Therefore, the illegal characters are judged according to the formula generated before, which is realized according to the predefined rules:
insert image description here
insert image description here

ρ \rho ρ is a vector, each element is 0 or 1, representing whether the character is mathematically correct (or conforms to the above rules):
insert image description here
based on the LSTM decoder output → the probability of generating characters

2.3 hybrid model

Correct scale for both models:
insert image description here

2.3.1 Retrieval Model

Calculate the lexical similarity between the sample and all training set samples

Representing the Question: Word TF-IDF Scores
insert image description here

The similarity is the Jaccard similarity of the TF-IDF vectors:insert image description here

One observation is the relationship between the similarity threshold and the accuracy of the two models ( θ \thetaθ is the threshold, that is, if the similarity is greater than the threshold, we use this retrieval model):
insert image description here

3. Experiment

3.1 Dataset

insert image description here

3.2 baseline

Pure Retrieval Model
ZDC

KAZB is too big to try

3.3 Results of the main experiment

insert image description here

3.4 Experimental analysis

insert image description here
The greater than sign means that the row is greater than the column

insert image description here

insert image description here

Guess you like

Origin blog.csdn.net/PolarisRisingWar/article/details/131772810