The gods are silent-personal CSDN blog post directory
Full name of the paper: Deep Neural Solver for Math Word Problems
model has no official abbreviation, but it is abbreviated as DNS in (2020 COLING) Solving Math Word Problems with Multi-Encoders and Multi-Decoders
Paper link: https://aclanthology.org/D17-1088/
This article is a 2017 EMNLP paper, focusing on MWP issues.
It is the first paper to solve the MWP problem with a neural network, directly mapping the problem into a formula with RNN. Then use a combination of RNN and similarity-based retrieval model, when the similarity score of the retrieval model is higher than the threshold, use the formula template of the retrieval result, otherwise use RNN.
Article Directory
1. Background
The introduction part is too lazy to read.
An interesting reference is (2016 ACL) How well do Computers Solve Math Word Problems? Large-Scale Dataset Construction and Evaluation found that simple similarity-based methods can already exceed most statistical learning models.
2. Model
number mapping→number identification→retrieval→directly apply the formula template or use the seq2seq model
I am too lazy to write some model hyper-parameter details, it is still quite conventional-RNN.
Variables: V p = { v 1 , … , vm , x 1 , … , xk } V_p=\{v_1,\dots,v_m,x_1,\dots,x_k\}Vp={ v1,…,vm,x1,…,xk} (known number and unknown variable)
2.1 Data preprocessing
number mapping
maps formulas to formula templates: replace known numbers with number tokens
Significant number identification
Considering that not all numbers are used, so only focus on important numbers: use LSTM for binary classification (input is number and context)
2.2 RNN based Seq2seq Model
Encoding and decoding use GRU and LSTM respectively
If the activation function uses softmax directly, it will lead to illegal symbols. Therefore, the illegal characters are judged according to the formula generated before, which is realized according to the predefined rules:
ρ \rho ρ is a vector, each element is 0 or 1, representing whether the character is mathematically correct (or conforms to the above rules):
based on the LSTM decoder output → the probability of generating characters
2.3 hybrid model
Correct scale for both models:
2.3.1 Retrieval Model
Calculate the lexical similarity between the sample and all training set samples
Representing the Question: Word TF-IDF Scores
The similarity is the Jaccard similarity of the TF-IDF vectors:
One observation is the relationship between the similarity threshold and the accuracy of the two models ( θ \thetaθ is the threshold, that is, if the similarity is greater than the threshold, we use this retrieval model):
3. Experiment
3.1 Dataset
3.2 baseline
Pure Retrieval Model
ZDC
KAZB is too big to try
3.3 Results of the main experiment
3.4 Experimental analysis
The greater than sign means that the row is greater than the column