Text Understanding with the Attention Sum Reader Network

其他 2019-03-06 09:51:02 阅读次数: 0

版权声明：本文为博主原创文章，未经博主允许不得转载。 https://blog.csdn.net/LaineGates/article/details/79240232

关键词

Bi-GRU, Bi-LSTM, attention sum

来源

arXiv 2016.03.04 (published at ACL 2016)

问题

使用带attention的深度模型解决完型填空问题

技术细节

模型比attentive reader简单，分以下几步：

使用双向GRU/LSTM单元计算docment每个词的拼接词向量doc_endcoer
使用双向GRU/LSTM单元计算query正向尾词和反向首词的拼接词向量query_endcoer
计算doc_endcoer和query_endcoer的乘积，获得attention_res，并softmax(以保证值为正)
将attention_res中备选词的attention分别累和（论文的关键所在，成为之后完型填空的深度模型的必备结构）
计算交叉熵并更新梯度
如图：

模型实现关键点

由于document长，大约600+/700+，有极个别更长的，这就导致之后训练时document的gradients很大，占用很多内存，笔者的11G显存经常报不够用。所以document长度700内就够了，batch_size设32基本就到极限了
计算准确率时，要计算本epoch内累积准确，而不能以batch为单位，否则会出现准确不断跳动的情况，让人以为训练有错
第5步计算交叉熵时，不能再计算第二次softmax，要计算normalize；即假设第4步输出为 $outputs$ ，那么 $y_{p r e d i c t} = o u t p u t s / \sum (o u t p u t s)$ $y_{predict}=outputs/\sum(outputs)$ $c r o s s E n t r o p y = - \sum (y * t f . l o g (y_{p r e d i c t}))$ $crossEntropy=-\sum(y*tf.log(y_{predict}))$
因为第3步计算attention_res已经是softmax过的，其内所有值都属于 $[0,1)$ ，document长度为700左右，每个值大约都是千分之几到百分之几，这些数再softmax之后，基本成了平均数，比如 $e^{0.005}\approx1.004$ 。

实现代码

Theano版本
 tensorflow版本

猜你喜欢

转载自blog.csdn.net/LaineGates/article/details/79240232

Text Understanding with the Attention Sum Reader Network

Text Understanding with the Attention Sum Reader Network翻译

Understanding Safari Reader

Question Directed Graph Attention Network for Numerical Reasoning over Text

Understanding Hadoop Clusters and the Network

Co-attention network with label embedding for text classification，Neurocomputing2022

[翻译] understanding Linux Network internals

Understanding TCP/IP Network Stack

Residual Attention Network 翻译

Residual Attention Network

HAN（Hierarchical Attention Network）

Message Passing Attention Networks for Document Understanding

ECO: Efficient Convolutional Network for Online Video Understanding

Pyramid Attention Network for Semantic Segmentation

Residual attention network for image classification

residual attention network 论文解读

Dual attention network for scene segmentation

【博文笔记】AoA Reader_Attention-over-Attention Neural Networks for Reading Comprehension

Text Level Graph Neural Network for Text Classification

Reader

Hierarchical Attention Network for Document Classification阅读笔记

「Computer Vision」Notes on Residual Attention Network

《17.Residual Attention Network for Image Classification》

Residual Attention Network for Image Classification 论文阅读

Dual Attention Network for Scene Segmentation讲解

Harmonious Attention Network for Person Re-Identification

Sequential Recommender System based on Hierarchical Attention Network

Residual Attention Network——TensorFlow低阶API实现

Paper | Residual Attention Network for Image Classification

文章阅读：Dual Attention Network for Scene Segmentation

今日推荐

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

面壁智能发布 Eurux-8x22B 开源大模型 —— 堪称「理科状元」

开源日报 | 谷歌扶持鸿蒙上位；开源Rabbit R1；Docker加持的安卓手机；微软的焦虑和野心；海尔电器把开放平台关了

中国码农的“35岁魔咒”

蘭雅 CorelDRAW 插件 2024.5.1 国际劳动节版，免费下载

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

周排行

Java自定义时间格式

同步整形电路

在开发中最最最常用的字符串的属性大集合

Linux 查看端口占用并杀掉

Java基础四：ArrayList

多线程之死锁就是这么简单

mysql 基础命令集

awk 命令详解

Centos6.3编译安装nginx+php步骤

OCR （Optical Character Recognition，光学字符识别）

每日归档

更多

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)