RNN an example of a simple neural network training process of combing

This article is a complete learning set of intellectual academy "PyTorch introductory courses: depth study on the torch - Natural Language Processing (NLP)" combing series lesson after.

The task is to predict the characters (numbers), so that the neural network to find the law the following numbers.

012
00112
0001112
000011112
00000111112

When we are given a set of data (such as 0000001) the time, so the neural network to predict what is behind the numbers should

1. Establish a neural network architecture

We construct a class RNN

class simpleRNN(nn.Module):
    def __init():
        ...
    def forword():
        ...
    def initHidden():
        ...

Wherein the function initHiddeneffect is hidden layer initialization vector

def initHidden(self):
    # 对隐含单元的初始化
    # 注意尺寸是: layer_size, batch_size, hidden_size
    return Variable(torch.zeros(self.num_layers, 1, self.hidden_size))

Use the init function

Init used to set up the structure of the neural network, the network input dimensions and output dimensions, the dimensions and the number of hidden layers, the process models, and so the need to use, are defined in the init.

Wherein nnthe direct carrying pytorch module, which contains a built- Embedding ,RNN, Linear, logSoftmaxlike models, can be used directly.

# 引入pytorch 中的 nn(模型模块)
import torch.nn as nn
def __init__(self, input_size, hidden_size, output_size, num_layers = 1):
        # 定义
        super(SimpleRNN, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        # 一个embedding层
        self.embedding = nn.Embedding(input_size, hidden_size)
        # PyTorch的RNN模型,batch_first标志可以让输入的张量的第一个维度表示batch指标
        self.rnn = nn.RNN(hidden_size, hidden_size, num_layers, batch_first = True)
        # 输出的全链接层
        self.linear = nn.Linear(hidden_size, output_size)
        # 最后的logsoftmax层
        self.softmax = nn.LogSoftmax()

Using neural networks as a function of forward operation process

Operation is also well understood, is the input step by step on the buried layer, RNN layer, Linear layer, and a layer softmax

  • embedding that (buried layer): an input layer to the hidden layer of embedded. Generally the process input vector is first converted to one-hot encoding, re-encoding is a vector of dimension hidden_size
  • RNN layer: After one model RNN
  • linear layer (full link layer): All dimensions hidden layer vector to the output of one mapping, can be understood as sharing information
  • softmax: The data normalization
 # 运算过程
def forward(self, input, hidden):
        # size of input:[batch_size, num_step, data_dim]
        
        # embedding层:
        # 从输入到隐含层的计算
        output = self.embedding(input, hidden)
        # size of output:[batch_size, num_step, hidden_size]
        
        output, hidden = self.rnn(output, hidden)
        # size of output:[batch_size, num_step, hidden_size]
      
        # 从输出output中取出最后一个时间步的数值,注意output输出包含了所有时间步的结果
        output = output[:,-1,:]
        # size of output:[batch_size, hidden_size]
        
        # 全链接层
        output = self.linear(output)
        # output尺寸为:batch_size, output_size
        
        # softmax层,归一化处理
        output = self.softmax(output)
         # size of output:batch_size, output_size
        return output, hidden

Middle of a special operation on the training results RNN

output = output[:, -1 ,:]

output size [batch_size, step, hidden_size], this step is the time step of the second dimension data last path number. Because RNN feature is memory, the last step before the data contains all the information the number of steps. So here only need to take the final number

Use the init and forword

initAnd forwardall the python classbuilt two functions.

  • If you define __init__, it will automatically run in the instance of the class when the initbody of the function, and the instantiation parameter is the initparameter of the function
  • If you define forwardwhen you execute in this class, it automatically performs forwardthe function
# 实例化类simpleRNN,此时执行__init__函数
rnn = simpleRNN(input_size = 4, hidden_size = 1, output_size = 3, num_layers = 1)

# 使用类simpleRNN
output, hidden = rnn(input, hidden)

Then perform a forward equivalent to a training process: Input -> Output

2. You can start training a

The first is the construction 损失函数and优化器

Powerful pytorch comes with a general loss of function and optimization model. A command to get everything.

criterion = torch.nn.NLLLoss()
optimizer = torch.optim.Adam(rnn.parameters(), lr = 0.001)

损失函数criterion: Training for loss record, all the weight loss will be adjusted according to the value of each step. As used herein, the NLLLossloss of function, loss is a relatively simple calculation, calculate the absolute difference between the true value and the predicted value

# output是预测值,y是真实值
loss = criterion(output, y)

优化器optimizer: Iterative training process operation. Comprising backpropagation gradient and gradient emptying. The parameters passed to the parameters of the neural network rnn.parameters()and the learning ratelr

# 梯度反传,调整权重
optimizer.zero_grad()
# 梯度清空
optimizer.step()

Training process

Train of thought is:

  1. Preparing training data, test data and check data (data set for each set of data is a sequence of numbers)
  2. Numerical sequence number of cycles, this number as an input, the next label as a digital (i.e., actual results)
  3. Each cycle through a network rnn
  4. Loss t_loss calculated and recorded for each group
  5. Optimizer optimization parameters
  6. Repeat steps 1-5 Training n times, n being custom

Preparation of training data is not within the scope of this discussion, the following process directly give good results here.

train_set = [[3, 0, 0, 1, 1, 2],
            [3, 0, 1, 2],
            [3, 0, 0, 0, 1, 1, 1, 2],
            [3, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2]
            ...]

Begin training

# 重复进行50次试验
num_epoch = 50
loss_list = []
for epoch in range(num_epoch):
    train_loss = 0
    # 对train_set中的数据进行随机洗牌,以保证每个epoch得到的训练顺序都不一样。
    np.random.shuffle(train_set)
    # 对train_set中的数据进行循环
    for i, seq in enumerate(train_set):
        loss = 0
        # 对每一个序列的所有字符进行循环
        for t in range(len(seq) - 1):
            #当前字符作为输入
            x = Variable(torch.LongTensor([seq[t]]).unsqueeze(0))
            # x尺寸:batch_size = 1, time_steps = 1, data_dimension = 1
            # 下一个字符作为标签
            y = Variable(torch.LongTensor([seq[t + 1]]))
            # y尺寸:batch_size = 1, data_dimension = 1
            output, hidden = rnn(x, hidden) #RNN输出
            # output尺寸:batch_size, output_size = 3
            # hidden尺寸:layer_size =1, batch_size=1, hidden_size
            loss += criterion(output, y) #计算损失函数
        loss = 1.0 * loss / len(seq) #计算每字符的损失数值
        optimizer.zero_grad() # 梯度清空
        loss.backward() #反向传播
        optimizer.step() #一步梯度下降
        train_loss += loss #累积损失函数值
        # 把结果打印出来
        if i > 0 and i % 500 == 0:
            print('第{}轮, 第{}个,训练Loss:{:.2f}'.format(epoch, i, train_loss.data.numpy()[0] / i))
    loss_list.appand(train_loss)
            

Here lossis the loss of each training cycle (epoch), in fact, regardless of how training, where the loss will decrease, because the neural network will make the final result is close to the real data as possible, so the loss of the training set and actually training can not be used to evaluate the quality of a model.

In the actual training process, we will after each round of training, the model was put 校验集to calculate the loss, this result is more objective.

Set loss calculation and verification of the training set exactly the same, but the train_setreplacement has become valid_set, and does not need to optimize the parameters according to the results of this step in the training has been done, it is to look at the role of validation set of training effect model:

for epoch in range(num_epoch):
    # 训练步骤
    ...
    valid_loss = 0
    for i, seq in enumerate(valid_set):
        # 对每一个valid_set中的字符串做循环
        loss = 0
        outstring = ''
        targets = ''
        hidden = rnn.initHidden() #初始化隐含层神经元
        for t in range(len(seq) - 1):
            # 对每一个字符做循环
            x = Variable(torch.LongTensor([seq[t]]).unsqueeze(0))
            # x尺寸:batch_size = 1, time_steps = 1, data_dimension = 1
            y = Variable(torch.LongTensor([seq[t + 1]]))
            # y尺寸:batch_size = 1, data_dimension = 1
            output, hidden = rnn(x, hidden)
            # output尺寸:batch_size, output_size = 3
            # hidden尺寸:layer_size =1, batch_size=1, hidden_size               
            loss += criterion(output, y) #计算损失函数
        loss = 1.0 * loss / len(seq)
        valid_loss += loss #累积损失函数值
#     # 打印结果
    print('第%d轮, 训练Loss:%f, 校验Loss:%f, 错误率:%f'%(epoch, train_loss.data.numpy() / len(train_set),valid_loss.data.numpy() / len(valid_set),1.0 * errors / len(valid_set)))

The loss of output validation set, we can plot the final loss changes.

3. The test model to predict the effect

Configuration data, test whether the model can guess the next number of the current number. Success rate is very high
first data structure is configured as a digital sequence of lengths of 0 to 20

for n in range(20):
    inputs = [0] * n + [1] * n

Then test for each sequence

for n in range(20):
    inputs = [0] * n + [1] * n
    
    outstring = ''
    targets = ''
    diff = 0
    hiddens = []
    hidden = rnn.initHidden()
    for t in range(len(inputs) - 1):
        x = Variable(torch.LongTensor([inputs[t]]).unsqueeze(0))
        # x尺寸:batch_size = 1, time_steps = 1, data_dimension = 1
        y = Variable(torch.LongTensor([inputs[t + 1]]))
        # y尺寸:batch_size = 1, data_dimension = 1
        output, hidden = rnn(x, hidden)
        # output尺寸:batch_size, output_size = 3
        # hidden尺寸:layer_size =1, batch_size=1, hidden_size
        hiddens.append(hidden.data.numpy()[0][0])
        #mm = torch.multinomial(output.view(-1).exp())
        mm = torch.max(output, 1)[1][0]
        outstring += str(mm.data.numpy()[0])
        targets += str(y.data.numpy()[0])
         # 计算模型输出字符串与目标字符串之间差异的字符数量
        diff += 1 - mm.eq(y)
    # 打印出每一个生成的字符串和目标字符串
    print(outstring)
    print(targets)
    print('Diff:{}'.format(diff.data.numpy()[0]))

The end result is output

[0, 1, 2]
[0, 1, 2]
Diff: 0
[0, 0, 1, 1, 2]
[0, 0, 1, 1, 2]
Diff: 0
[0, 0, 0, 1, 1, 1, 2]
[0, 0, 0, 1, 1, 1, 2]
Diff: 0
...
# 结果不一一列出,大家可以自行尝试

to sum up

Neural network can be understood as the process of using various mathematical methods to let the computer find the law from a pile of data. We can understand the internal mechanisms of neural networks by dissecting some simple tasks. When faced with complex tasks, just need the data to the model, it will be able to do our best to give you a good result.

This article is a complete learning set of intellectual academy "PyTorch introductory courses: depth study on the torch - Natural Language Processing (NLP)" combing series lesson after. Moreover, on the course lstm, 翻译任务实操and other infrastructure and a wealth of knowledge, I will come back

Guess you like

Origin blog.csdn.net/weixin_34390105/article/details/90858667