论文阅读和分析：Multi-Scale Attention with Dense Encoder for Handwritten Mathematical Expression Recognition

HMER论文系列
1、论文阅读和分析：When Counting Meets HMER Counting-Aware Network for HMER_KPer_Yang的博客-CSDN博客
2、论文阅读和分析：Syntax-Aware Network for Handwritten Mathematical Expression Recognition_KPer_Yang的博客-CSDN博客
3、论文阅读和分析：A Tree-Structured Decoder for Image-to-Markup Generation_KPer_Yang的博客-CSDN博客
4、论文阅读和分析：Watch, attend and parse An end-to-end neural network based approach to HMER_KPer_Yang的博客-CSDN博客
5、论文阅读和分析：Multi-Scale Attention with Dense Encoder for Handwritten Mathematical Expression Recognition_KPer_Yang的博客-CSDN博客
6、论文阅读和分析：Mathematical formula recognition using graph grammar_KPer_Yang的博客-CSDN博客
7、论文阅读和分析：Hybrid Mathematical Symbol Recognition using Support Vector Machines_KPer_Yang的博客-CSDN博客
8、论文阅读和分析：HMM-BASED HANDWRITTEN SYMBOL RECOGNITION USING ON-LINE AND OFF-LINE FEATURES_KPer_Yang的博客-CSDN博客

主要工作：

1、将当时比较火的DenseNet用到HMER任务中；

2、使用多尺度注意力模型，通过将高分辨率、低语义的特征和低分辨率、高语义特征相融合，识别出细小符号，例如小数点；

核心模型实现：

1、多尺度注意力编码器：

在这里插入图片描述

k:growth rate

D:(number of convolution layers) of each block:

for example:D = 32 which means each block has 16 1 × 1 convolution layers and 16 3 × 3 convolution layers.(A batch normalization layer [24] and a ReLU activation layer [25] are performed after each convolution layer consecutively)

流程:

(1)先从正常的DenseNet中，第一个池化层分出，进行DenseB的处理得到B： $\mathbf{B}\in \mathbb{R} ^{2H \times 2W \times C^{'}}$ .

(2)GRU计算t步的s hat；
$\mathbf{\hat{s}}_t=GRU\left(\mathbf{y}_{t-1},\mathbf{s}_{t-1}\right)$
(3)计算A和B的a single-scale coverage based attention model.
$\mathbf{c}\mathbf{A}_{t}=f_{\mathrm{catt}}\left(\mathbf{A},\mathbf{\hat{s}}_{t}\right)\\ \mathbf{cB}_{t}=f_{ {\mathrm{catt}}}\left(\mathbf{B},\mathbf{\hat{\mathbf{s}}}_{t}\right)$
$f_{catt}$ : coverage based attention计算
$\begin{aligned}&\mathbf{F}=\mathbf{Q}*\sum_{t=1}^{t-1}\boldsymbol{\alpha}_l\\ &e_{ti i}=\boldsymbol{\nu}_{\mathrm{anh}}^{T}\tanh(\mathbf{U}_i\mathbf{S}_i+\mathbf{U}_a\mathbf{a}_i +\mathbf{U}_f\mathbf{f}_i)\\ &\alpha_{t i}=\frac{\exp(e_{t i})}{\sum_{k=1}^L\exp(e_{t k})}\\ &\mathbf{c}\mathbf{A}_t=\sum_{t=1}^L\alpha_{t i}\mathbf{a}_i\end{aligned}$

(4)将其拼接
$\mathbf{c}_t=[\mathbf{cA}_t;\mathbf{cB}_t]$
(5)计算t步的隐状态和输出：
$\mathbf{s}_t=GRU\left(\mathbf{c}_t,\mathbf{\hat{s}}_t\right)$

2、解码器

定义annotation sequence A：
$\mathbf{A}=\{\mathbf{a}_1,\ldots,\mathbf{a}_L\},\mathbf{a}_i\in\mathbb{R}^C\quad$
$L$ ： $L=H\times W$ 。假设CNN的输出是 $\times W \times C$ .

定义LateX string：
$\mathbf{Y}=\{\mathbf{y}_1,\ldots,\mathbf{y}_T\},\mathbf{y}_i\in\mathbb{R}^K$
$T$ ：Latex的长度；

$K$ ：总的字符数目；因为 $\mathbf{y}_i$ 是一个one-hot编码的向量；

注意：annotation sequence A和LateX字符串Y都是可变长的，如何去解决？

计算解码步t的中间量 $\mathbf{c}_t$ :通过注意力机制，计算的定长的上下文向量。
$\mathbf{c}_t=\sum_{i=1}^L\alpha_{ti}\mathbf{a}_i$
再利用上下文向量，计算每个预测符号的概率：
$p(\mathbf y_t|\mathbf y_{t-1},\mathbf X)=g\left(\mathbf W_oh(\mathbf Ey_{t-1}+\mathbf W_s\mathbf s_t+\mathbf W_c\mathbf c_t)\right)$
$g$ ：softmax activation function

$h$ ：maxout activation function

$E$ ：maxout activation function

$\mathbf{W}_{s}\in\mathbb{R}^{m\times n}$ ：m:embedding dim and n:GRU decoder state respectively；

$\mathbf{W}_{o}\in\mathbb{R}^{K\times\frac{m}{2}}$

提高技巧：

1、beam search algorithm，每一步保留10个假设符号，结束则用< eos >符号。

2、集成学习，使用不同初始化、相同训练的模型，在beam search过程中，进行求平均。

实验：

1、消融实验：

在这里插入图片描述

COMPARISON OF RECOGNITION PERFORMANCE (IN %) ON CROHME 2014 AND CROHME 2016 WHEN EMPLOYING DENSE ENCODER AND MULTI-SCALE ATTENTION MODEL

2、深度D

在这里插入图片描述

COMPARISON OF RECOGNITION PERFORMANCE (IN %) ON CROHME 2014 AND CROHME 2016 WHEN INCREASING THE DEPTH OF DENSE BLOCK IN MULTI-SCALE BRANCH.

3、对比其他模型

CROHME 2014 ：

在这里插入图片描述

CROHME 2016 ：

在这里插入图片描述

参考：

[1]:J. Zhang, J. Du and L. Dai, “Multi-Scale Attention with Dense Encoder for Handwritten Mathematical Expression Recognition,” 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China, 2018, pp. 2245-2250, doi: 10.1109/ICPR.2018.8546031.

2018 24th International Conference on Pattern Recognition (ICPR)*, Beijing, China, 2018, pp. 2245-2250, doi: 10.1109/ICPR.2018.8546031.

[2]:Jianshu Zhang, Jun Du, Shiliang Zhang, Dan Liu, Yulong Hu, Jinshui Hu, Si Wei, Lirong Dai,Watch, attend and parse: An end-to-end neural network based approach to handwritten mathematical expression recognition,Pattern Recognition,Volume 71,2017,Pages 196-206,ISSN 0031-3203,