RNN adder pits and corresponding

Pj2 for pattern recognition

rnn adder

First look carefully at the general code given by the teaching assistant. The main function that requires completion is the forward function:

class myPTRNNModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.embed_layer = nn.Embedding(10, 32)
        self.rnn = nn.RNN(64, 64, 2)
        self.dense = nn.Linear(64, 10)

    def forward(self, num1, num2):
        '''
        Please finish your code here.
        '''
        num1 = self.embed_layer(num1)
        num2 = self.embed_layer(num2)
        input = torch.cat((num1, num2), 2)
        #packed = pack_padded_sequence(input, encode_length.tolist(), batch_first=True)
        r_out, (h_n, h_c) = self.rnn(input, None)
        logits = self.dense(r_out)
        return logits

Looking at this code puzzled me at the beginning and the corresponding answers:

  1. What is embedding?

    embedding can be seen as a dimensionality reduction of text encoding, such as onehot encoding can reduce dimensionality to a lower

  2. Why use embedding to upgrade dimensions? As you can see from the code, because there are only 10 numbers, why do you need to upgrade to 32?

    This is because of another role of embedding. When the dimension of low-dimensional data is upgraded, some other features may be enlarged, or the general features may be separated.

  3. What are the three parameters defined by rnn in __init__? What is it when called?

    The first parameter of rnn is input_size, which is the dimension of the input vector. For example, in the current situation, the input vector is 32 + 32 (because the two numbers to be added are connected in series);

    The second parameter of rnn is hidden_size, which refers to the dimension of the output vector, we are still 64 here;

    The third parameter of rnn is num_layer, that is, it will enter two rnn layers;

    When calling, the first two things thrown into rnn are the input in series, and the second is the initialization of hidden_state. I filled in none here, which means that all are initialized to 0, which is the worst way to initialize .

  4. Logits do not need to take the last time here, because they are processed separately when evaluating later.

Training results

The problem is that there is no way to carry high, the accuracy will be 0 when the high carry is involved

4 digits 3000 rounds-0.15, only 5 below 3 digits 3000 rounds-27.5, only below 5

Modify direction

  1. clipping the gradient
  2. Change the rnn model: using lstm, the shorter the current is, the higher the accuracy of the addition is. You can also consider bidirectional lstm (but it should be useless). In addition, other additions use the decoder and encoder models. Can this solve the carry problem?
  3. The initialization of various gates uses orthogonal initialization
  4. First debug the accuracy of the training set
  5. Even if the network size is small, there is only one layer of rnn, adding dropout and l2 regularization will alleviate overfitting
  6. Selection of learning rate
  7. Will direct tandem be arranged without crossing?

What you do n’t understand in the code

  1. There is a reverse function for processing data. This is because the addition will only affect the high bit in the low bit, so the sequence is turned to make the low bit number advanced in the rnn network.

other work ideas

  1. Look at the accuracy of the training set
  2. Plot the training set and test set
  3. If you have problems, see the official documentation of pytorch
  4. There is also a relative path that is broken
  5. The pit of pytorch installation: Among them, the -c pytorch parameter specifies conda to obtain the channel of pytorch, which is specified as the pytorch warehouse that conda comes with. Therefore, you only need to remove the -c pytorch statement to quickly install pytorch using the Tsinghua mirror source. This is the pit where pytorch is installed
  6. for o in list (zip (datas [2], res)) [: 20]: print (o [0], o [1], o [0] == o [1])

Guess you like

Origin www.cnblogs.com/samanthadigitalife/p/12759852.html