[The Annotated Transformer] 代码修正

1. RuntimeError: "exp" not implemented for 'torch.LongTensor'

class PositionalEncoding(nn.Module)

div_term = torch.exp(torch.arange(0., d_model, 2) *
                             -(math.log(10000.0) / d_model))

将 “0” 改为 “0.”

否则会报错：RuntimeError: "exp" not implemented for 'torch.LongTensor'

2. RuntimeError: expected type torch.FloatTensor but got torch.LongTensor

class PositionalEncoding(nn.Module)

position = torch.arange(0., max_len).unsqueeze(1)

将 “0” 改为 “0.”

否则会报错：

pe[:, 0::2] = torch.sin(position * div_term)
RuntimeError: expected type torch.FloatTensor but got torch.LongTensor

3. UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.

def make_model

nn.init.xavier_uniform_(p)

将“nn.init.xavier_uniform(p)” 改为 “nn.init.xavier_uniform_(p)”

否则会提示：UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.

4. UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.

class LabelSmoothing

self.criterion = nn.KLDivLoss(reduction='sum')

将 “self.criterion = nn.KLDivLoss(size_average=False)” 改为 “self.criterion = nn.KLDivLoss(reduction='sum')”

否则会提示：UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.

5. IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

class SimpleLossCompute

return loss.item() * norm

将 “loss.data[0]” 改为 loss.item()，

否则会报错：IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

6. floating point exception (core dumped)

直接运行“A First Example”会报错：floating point exception (core dumped)

修改方法：https://github.com/harvardnlp/annotated-transformer/issues/26

修改 run_epoch 函数，将计数值转换为numpy。方法：.detach().numpy() 或者直接 .numpy()

以下是亲测可用的代码：

def run_epoch(data_iter, model, loss_compute):
    "Standard Training and Logging Function"
    start = time.time()
    total_tokens = 0
    total_loss = 0
    tokens = 0
    for i, batch in enumerate(data_iter):
        out = model.forward(batch.src, batch.trg, batch.src_mask, batch.trg_mask)
        loss = loss_compute(out, batch.trg_y, batch.ntokens)
        total_loss += loss.detach().numpy()
        total_tokens += batch.ntokens.numpy()
        tokens += batch.ntokens.numpy()
        if i % 50 == 1:
            elapsed = time.time() - start
            print("Epoch Step: %d Loss: %f Tokens per Sec: %f" % (i, loss.detach().numpy() / batch.ntokens.numpy(), tokens / elapsed))
            start = time.time()
            tokens = 0
    return total_loss / total_tokens

7. loss 均为整数

class SimpleLossCompute

在运行“A First Example” 时，结果显示的 loss 全部是整数，这就很奇怪了。测试后发现，是 class SimpleLossCompute中的返回值的问题，norm这个tensor是int型的，虽然loss.item()是浮点数，但是return loss.item() * norm的值仍是int型tensor.

修改方法：将norm转为float再进行乘法运算：

return loss.item() * norm.float()