Basic Information

作者：Qingyun Wang

论文：Paper Abstract Writing through Editing Mechanism (ACL)

源码：https://github.com/EagleW/Writing-editing-Network

Data Preprocessing

1. Load Data

抽取出headline 和 abstract
分词（自己写了一个分词，当然也可以用NLTK等工具包）
构造corpus：列表（all sample）==> 子列表（a sample: headline+abstract）==> 子子列表 (tokens + sentence)

2. Construct Vocabulary

根据训练数据构造词表；按照词频排序，删去低频词，将vocabulary大小设置为vocab_size
在单词、标点的基础上，增加<pad>，<unk>，<eos>，<bos>等几个标志位
将<pad>放在字典的第一位

3. word2id，id2word

将corpus中的token映射成id（未进行padding）
在abstract的前后增加了两个标志位<eos>，<bos>，（当teacher forcing training时）作为解码输入相当于进行了shift，而标题不需要添加<eos>，<bos>
按照编码输入的长度对进行corpus进行排序（按照headline从长到短对corpus进行排序）
获得max_x_len，max_y_len

Model Construction

1. Embedding

随机初始化embedding matrix，传入nn.Embedding两个参数：vocab_size，embed_dim
Embedding定义好之后，encoder for title，encoder for draft，decoder共用。即headline，draft，abstract共用同一个编码

2. Encoder for Title

single-layer bidirectional GRU
因为编码输入是按照title顺序排序好的，所以输入的时候可以利用nn.utils.rnn.pack_padded_sequence进行编码，得到了结果再利用nn.utils.rnn.pad_packed_sequence补出padding位。如此可以使结果更加准确，排除了padding位对双向编码的影响
input_sorted_by_length ==> pack ==> encode ==> pad ==> encoder_output

3. Encoder for Draft

single-layer bidirectional GRU
这个encoder是对解码器输出的draft进行编码，输入的序列虽然在形状上都是 [batch_size, max_y_len]，但是其中的有效序列元素个数参差不齐，此时再进行排序，编码，恢复batch中sample原来的位置就比较麻烦
所以就不用nn.utils.rnn.pack_padded_sequence和nn.utils.rnn.pad_packed_sequence两个函数了，编码结果会受到padding位的影响，但是无伤大雅

4. Decoder for All-pass Decoding

single-layer unidirectional GRU
每个pass的解码，用的都是同一个decoder
包含两种attention：
1. 对于上一pass中decoder hidden states的attention
2. 当前pass的decoder hidden states，对于encoder hidden states的attention

5. Complete Model

包含以上提到的两种encoder，decoder，以及word probability layer

6. Instantiate Model

# 为模型设置可见GPU
torch.cuda.set_device(0)

# 打印GPU信息
if torch.cuda.is_available():
    print("congratulations! {} GPU(s) can be used!".format(torch.cuda.device_count()))
    print("currently, you are using GPU" + str(torch.cuda.current_device()), 
          "named", torch.cuda.get_device_name(torch.cuda.current_device()))

    # 利用CUDA进行加速
    model = model.cuda()
    criterion = nn.CrossEntropyLoss(ignore_index=0).cuda()
    
else:
    print("sadly, CUDA is not available!")

输出：

Congratulations! 1 GPU(s) can be used!
Currently, you are using GPU0 named "GeForce GTX 1050 Ti"

Training Process

进入train_epoch函数
1. 进入train_batch函数
  1. 设置 previously generated draft 为 None
  2. input [batch_size, max_x_len]， target [batch_size, max_y_len]
  3. 对输入进行编码，[batch_size, max_x_len]
  4. 进入multi-pass decoding循环
    1. 如果previously generated draft 为 None，说明是第一次进行解码，只进行decoder - encoder端的attention
      1. 解码得到隐含向量
      2. 映射到vocabulary size，得到概率分布
      3. 采样得到解码输出（greedy search）
      4. 作为previously generated draft
    2. 如果previously generated draft 不为 None，说明之前已经进行过解码了，所以要有decoder-encoder之间的attention，还要对生成的draft的隐含状态进行attend
    3. 该implementation中对每一pass的解码结果，即draft，都与ground truth进行了交叉熵的Loss计算，反向传播，更新参数（训练一个batch，反传多次，希望草稿也尽量接近目标序列）
2. 记录batch的loss，在一个epoch完结之后打印average epoch loss
3. 每个epoch中没有进行validation
epoch循环结束之后保存一组参数
如果当前epoch的平均loss大于前一epoch的平均loss的话，需要停止训练防止过拟合，不过此处模型是利用Training set上的loss来计算的

Inference Phase

TO DO

每个epoch结束进行一次validation，当validation的平均loss大于之前时刻的loss，停止训练。
添加mixed objective function
将encoder放入循环外，对于一个batch只编码一次
将每一pass的decoder 换做不一样的。甚至可以一个是Transformer，一个是RNN（进行两次beam search）。这个实现中，前面几个pass的解码都是greedy search
搞清楚Transformer的解码原理，理清楚beam searcher应该输入什么维度Tensor，方便进行植入

[codes] Writing Editing Networks Source Code Analysis

Basic Information

Data Preprocessing

1. Load Data

2. Construct Vocabulary

3. word2id，id2word

Model Construction

1. Embedding

2. Encoder for Title

3. Encoder for Draft

4. Decoder for All-pass Decoding

5. Complete Model

6. Instantiate Model

Training Process

Inference Phase

TO DO

猜你喜欢