论文阅读和分析:Watch, attend and parse An end-to-end neural network based approach to HMER

HMER论文系列
1、论文阅读和分析:When Counting Meets HMER Counting-Aware Network for HMER_KPer_Yang的博客-CSDN博客
2、论文阅读和分析:Syntax-Aware Network for Handwritten Mathematical Expression Recognition_KPer_Yang的博客-CSDN博客
3、论文阅读和分析:A Tree-Structured Decoder for Image-to-Markup Generation_KPer_Yang的博客-CSDN博客
4、 论文阅读和分析:Watch, attend and parse An end-to-end neural network based approach to HMER_KPer_Yang的博客-CSDN博客
5、 论文阅读和分析:Multi-Scale Attention with Dense Encoder for Handwritten Mathematical Expression Recognition_KPer_Yang的博客-CSDN博客
6、 论文阅读和分析:Mathematical formula recognition using graph grammar_KPer_Yang的博客-CSDN博客
7、 论文阅读和分析:Hybrid Mathematical Symbol Recognition using Support Vector Machines_KPer_Yang的博客-CSDN博客
8、论文阅读和分析:HMM-BASED HANDWRITTEN SYMBOL RECOGNITION USING ON-LINE AND OFF-LINE FEATURES_KPer_Yang的博客-CSDN博客

论文阅读和分析:Watch, attend and parse An end-to-end neural network based approach to handwritten mathematical expression recognition

注意:在这篇文章中可以看到如果有一个并行的工作已经在arxiv上发表,如何发自己的文章。文中在相关工作中说明并行工作,说明区别。并且最后的实验部分,将自己的方法用到并行工作上,对并行工作也有提高,从而说明自己的创新性。

主要工作:

1、使用神经网络Watch,Attend and Parse(WAP),避免符号分割的问题和使用ME语法的计算;

2、在watcher使用全卷积网络(FCN):可以高效处理大尺度图片、可变输入尺寸。使用基于coverage-based attention model去解决训练WAP的a lack of coverage问题;

3、可视化注意力,可以看到WAP如何完成符号分割和解析二维结构;

WAP的网络结构

在这里插入图片描述

Fig. 1. Architectures of Watch, Attend, Parse for handwritten mathematical expression recognition

WAP的网络结构使用FCN提取图片特征、带注意力的GRU解码。

1、其中的FCN:(没有展示归一化层和RELU激活层)

在这里插入图片描述

FCN configurations. The convolutional layer parameters are denoted as “conv(receptive field size)-[number of channels]”. For brevity, the batch normalization layer and ReLU activation function is not shown.

2、添加注意力的GRU

最基本的GRU:
在这里插入图片描述

添加注意力机制:

可以简单地理解为,解析器应该关注的输入图像部分取决于已经生成的输出序列中的单词。

在这里插入图片描述

对应的公式表示如下:
β t = ∑ l t − 1 α l F = Q ∗ β t e t i = ν a T tanh ⁡ ( W a h t − 1 + U a a i + U f f i ) \begin{aligned} \boldsymbol\beta_t&=\sum_l^{t-1}\boldsymbol\alpha_l \\ \mathbf{F} &= Q* \boldsymbol\beta_t\\ e_{ti}&=\boldsymbol{\nu}_a^\mathrm{T}\tanh(\mathbf{W}_a\mathbf{h}_{t-1}+\mathbf{U}_a\mathbf{a}_i+\mathbf{U}_{f}\mathbf{f}_i) \end{aligned} βtFeti=lt1αl=Qβt=νaTtanh(Waht1+Uaai+Uffi)

β t \beta_t βt:过去注意力概率的和;

f i \mathbf{f}_i fi:annotation a i a_i ai的coverage向量,初始化0;

c t \mathbf{c}_t ct

e t i = ν a T tanh ⁡ ( W a h t − 1 + U a a i ) α t i = e x p ( e t i ) ∑ k = 1 L exp ⁡ ( e t k ) c t = ∑ i L α t i a i \begin{aligned} e_{ti}&=\boldsymbol{\nu}_a^\text{T}\tanh(\mathbf{W}_a\mathbf{h}_{t-1}+\mathbf{U}_a\mathbf{a}_i)\\ \alpha_{ti}&=\frac{\mathbf{exp}(e_{ti})}{\sum_{k=1}^L\exp(e_{tk})}\\ \mathbf{c}_t&=\sum_i^L\alpha_{ti}\mathbf{a}_i \end{aligned} etiαtict=νaTtanh(Waht1+Uaai)=k=1Lexp(etk)exp(eti)=iLαtiai

学习到的5种空间关系:

在这里插入图片描述

Fig. 5. The model learning procedure of determining five spatial relationships (horizontal, vertical, subscript, superscript and inside) through attention visualization.

实验

1、显示有\没有注意力的图片:

在这里插入图片描述

Examples of attention with and without the coverage vector. The recognized LaTeX sequences of the right side of the equation are printed below each image (the white areas in the images indicate the attended regions, and the underlined text in the LaTeX sequences indicates the corresponding words).

在这里插入图片描述

Attention visualization of a tested mathematical expression image whose LaTeX sequence is “ ( sin ( x ) ) ∧ { 2 } + ( cos ( x ) ) ∧ { 2 } ”.

2、在CROHME 2014的实验:CORRECT EXPRESSION RECOGNITION RATE (IN %)

在这里插入图片描述

2、在CROHME 2016的实验:CORRECT EXPRESSION RECOGNITION RATE (IN %)

在这里插入图片描述

4、和一个并行工作的比较:

WYGIWYS:we find a parallel work similar to this study submitted as the arXiv preprint [48], named WYGIWYS (What You Get Is What You See), which decompiles a machine-printed mathematical expression into presentational markup。

增加deep FCN、增加coverage attention、增加轨迹信息trajectory

在这里插入图片描述

参考:

[1]:J. Zhang, J. Du and L. Dai, “Multi-Scale Attention with Dense Encoder for Handwritten Mathematical Expression Recognition,” 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China, 2018, pp. 2245-2250, doi: 10.1109/ICPR.2018.8546031.

[2]:Jianshu Zhang, Jun Du, Shiliang Zhang, Dan Liu, Yulong Hu, Jinshui Hu, Si Wei, Lirong Dai,Watch, attend and parse: An end-to-end neural network based approach to handwritten mathematical expression recognition,Pattern Recognition,Volume 71,2017,Pages 196-206,ISSN 0031-3203,

猜你喜欢

转载自blog.csdn.net/KPer_Yang/article/details/129483137