LSTM that learns while writing code

1. What is LSTM

Long short-term memory network LSTM (long short-term memory) is a variant of RNN, the core concept of which is the cell state and the "gate" structure. The cell state is equivalent to the path of information transmission, so that information can be passed on in the serial connection. You can think of it as the "memory" of the network. Theoretically, the state of the cell can carry relevant information all the way through the sequence processing. Thus, even information from earlier time steps can be carried to cells at later time steps, which overcomes the effect of short-term memory. The addition and removal of information is realized through the "gate" structure, which will learn which information to save or forget during the training process.
 

insert image description here

 

2. Experiment code

2.1. Build a model with only one layer of RNN and Dense network.

2.2. Verify the logic in LSTM

 Suppose my input data is x = [1,0], 

kernel = [[[2, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0],

              [1, 1, 0, 1, 1, 0, 0, 1, 1 ,0, 0, 0],]]

recurrent_kernel = [[1, 0, 0, 1, 2,1,0,1,2,0,1,0],

                              [1, 1, 0, 0, 2,1,0,1,2,2,0,0],

                              [1, 0, 1, 2, 0,1,0,1,1,0,1,0]]

bias = [3, 1, 0, 1, 1,0,0,1,0,2,0.0,0]

Through the following hand calculation, the result of h is [0, 4,1], and the result of c is [0,4,1]. Note that there is no activation function.

The code verifies the above result


def change_weight():
    # Create a simple Dense layer
    lstm_layer = LSTM(units=3, input_shape=(3, 2), activation=None, recurrent_activation=None, return_sequences=True,
                      return_state= True)

    # Simulate input data (batch size of 1 for demonstration)
    input_data = np.array([
                [[1.0, 2], [2, 3], [3, 4]],
                [[5, 6], [6, 7], [7, 8]],
                [[9, 10], [10, 11], [11, 12]]
        ])

    # Pass the input data through the layer to initialize the weights and biases
    lstm_layer(input_data)

    kernel, recurrent_kernel, biases = lstm_layer.get_weights()

    # Print the initial weights and biases
    print("recurrent_kernel:", recurrent_kernel, recurrent_kernel.shape ) # (3,3)
    print('kernal:',kernel, kernel.shape) #(2,3)
    print('biase: ',biases , biases.shape) # (3)


    kernel = np.array([[2, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0],
                       [1, 1, 0, 1, 1, 0, 0, 1, 1 ,0, 0, 0],])

    recurrent_kernel = np.array([[1, 0, 0, 1, 2,1,0,1,2,0,1,0],
                                 [1, 1, 0, 0, 2,1,0,1,2,2,0,0],
                                 [1, 0, 1, 2, 0,1,0,1,1,0,1,0]])

    biases = np.array([3, 1, 0, 1, 1,0,0,1,0,2,0.0,0])

    lstm_layer.set_weights([kernel, recurrent_kernel, biases])
    print(lstm_layer.get_weights())

    # test_data = np.array([
    #     [[1.0, 3], [1, 1], [2, 3]]
    # ])

    test_data = np.array([
        [[1,0.0]]
    ])

    output, memory_state, carry_state  = lstm_layer(test_data)

    print(output)
    print(memory_state)
    print(carry_state)
if __name__ == '__main__':
    change_weight()

Results of the:

recurrent_kernel: [[-0.36744034 -0.11181469 -0.10642298  0.5450207  -0.30208975  0.5405432
   0.09643812 -0.14983998  0.1859854   0.2336958  -0.16187981  0.11621032]
 [ 0.07727922 -0.226477    0.1491096  -0.03933501  0.31236103 -0.12963092
   0.10522162 -0.4815724  -0.2093935   0.34740582 -0.60979587 -0.15877807]
 [ 0.15371156  0.01244636 -0.09840634 -0.32093546  0.06523462  0.18934932
   0.38859126 -0.3261706  -0.05138849  0.42713478  0.49390993  0.37013963]] (3, 12)
kernal: [[-0.47606698 -0.43589187 -0.5371355  -0.07337284  0.30526626 -0.18241835
  -0.03675252  0.2873094   0.33218485  0.24838251  0.17765659  0.4312396 ]
 [ 0.4007727   0.41280174  0.40750778 -0.6245315   0.6382301   0.42889225
   0.11961156 -0.6021105  -0.43556038  0.39798307  0.6390712   0.16719025]] (2, 12)
biase:  [0. 0. 0. 1. 1. 1. 0. 0. 0. 0. 0. 0.] (12,)
[array([[2., 1., 1., 0., 0., 0., 0., 1., 1., 0., 1., 0.],
       [1., 1., 0., 1., 1., 0., 0., 1., 1., 0., 0., 0.]], dtype=float32), array([[1., 0., 0., 1., 2., 1., 0., 1., 2., 0., 1., 0.],
       [1., 1., 0., 0., 2., 1., 0., 1., 2., 2., 0., 0.],
       [1., 0., 1., 2., 0., 1., 0., 1., 1., 0., 1., 0.]], dtype=float32), array([3., 1., 0., 1., 1., 0., 0., 1., 0., 2., 0., 0.], dtype=float32)]
tf.Tensor([[[0. 4. 0.]]], shape=(1, 1, 3), dtype=float32)
tf.Tensor([[0. 4. 0.]], shape=(1, 3), dtype=float32)
tf.Tensor([[0. 4. 1.]], shape=(1, 3), dtype=float32)

It can be seen that h=[0,4,0], c=[0,4,1]

Guess you like

Origin blog.csdn.net/keeppractice/article/details/132132220