Bidirectional LSTM while writing code and learning at the same time

1. What is Bidirectional LSTM

Bidirectional LSTM (BiLSTM) is a recurrent neural network mainly used in natural language processing. Unlike standard LSTM, input flows in both directions, and it is able to leverage information from both sides. It is also a powerful tool for modeling sequential dependencies between words and phrases in both directions of the sequence.

To sum up, BiLSTM adds another layer of LSTM, thereby reversing the direction of the information flow. Simply put, this means that the input sequence flows backward through additional LSTM layers. We then combine the outputs of the two LSTM layers in various ways, such as averaging, summing, multiplication, or concatenation.

To illustrate this, the unfolded BiLSTM is shown below:

This type of architecture has many advantages in real-world problems, especially in NLP. The main reason is that each component of the input sequence contains information from the past and present. Therefore, BiLSTM can produce more meaningful outputs by combining LSTM layers in both directions.

For example this sentence:

Apple is something that…

It could be about apples as a fruit or about Apple as a company. Therefore, the LSTM doesn't know what "Apple" means because it doesn't know the future context.

Instead, it's most likely in these two sentences:

Apple is something that competitors simply cannot reproduce.

and

Apple is something that I like to eat.

BiLSTM will have a different output for each component (word) of the sequence (sentence). Therefore, the BiLSTM model is beneficial in some NLP tasks, such as sentence classification (sentence classification), translation ( translation) and entity recognition (entity recognition). In addition, it is also used in speech recognition (speech recognition) and protein structure prediction (protein structure prediction) and similar fields. handwritten recognition), handwriting recognition (

Finally, regarding the disadvantages of BiLSTM compared to LSTM, it is worth mentioning that BiLSTM is a much slower model and requires more training time. Therefore, it is recommended to use it only when truly necessary.

2. Experimental code

2.1. Bidirectional layer method introduction

tf.keras.layers.Bidirectional(
    layer, merge_mode="concat", weights=None, backward_layer=None, **kwargs
)

parameter

layer: keras.layers.RNN instance, such as keras.layers.LSTM or keras.layers.GRU. It can also be a keras.layers.Layer instance that:
becomes a sequence processing layer (accepts 3D+ input).
has a go_backwards, return_sequences, and return_state attributes (with the same semantics as the RNN class).
has an input_spec attribute.
Serialization is implemented through get_config() and from_config(). Note that the recommended way to create new RNN layers is to write a custom RNN unit and use it with keras.layers.RNN, rather than subclassing keras.layers.Layer directly. - When returns_sequences is true, the output for the masked time step will be zero regardless of the layer's original Zero_output_for_mask value.
merge_mode: Mode that combines forward and backward RNN outputs. One of {'sum', 'mul', 'concat', 'ave', None}. If not, the outputs will not be combined and they will be returned as a list. The default value is "concat".
back_layer: Optional keras.layers.RNN or keras.layers.Layer instance for handling backward input processing. If backward_layer is not provided, the layer instance passed as the layer parameter will be used to automatically generate the backward layer. Please note that the provided backward_layer layer should have attributes that match the layer parameter, in particular it should have the same stateful, return_states, return_sequences, etc. values. Additionally, backward_layer and layer should have different go_backwards parameter values. If these requirements are not met, a ValueError will be raised.

2.2. Build a model with only one layer of LSTM and Dense network.

def simple_lstm_layer():
    # Create a dense layer with 10 output neurons and input shape of (None, 20)
    model = Sequential()
    model.add(Bidirectional(LSTM(3, return_sequences=True), input_shape=(3, 2)))
    model.add(Dense(1))  # Output layer with one neuron
    print(model.summary())
if __name__ == '__main__':
    simple_lstm_layer()

output

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 bidirectional (Bidirectiona  (None, 3, 6)             144       
 l)                                                              
                                                                 
 dense (Dense)               (None, 3, 1)              7         
                                                                 
=================================================================
Total params: 151
Trainable params: 151
Non-trainable params: 0
_________________________________________________________________
None

2.3. Verify the logic in Bidirectional LSTM

 Suppose my input data is x = [1,0], 

forward_kernel = [[[2, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0],

              [1, 1, 0, 1, 1, 0, 0, 1, 1 ,0, 0, 0],]]

forward_recurrent_kernel = [[1, 0, 0, 1, 2,1,0,1,2,0,1,0],

                              [1, 1, 0, 0, 2,1,0,1,2,2,0,0],

                              [1, 0, 1, 2, 0,1,0,1,1,0,1,0]]

forward_bias = [3, 1, 0, 1, 1,0,0,1,0,2,0.0,0]

backward_kernel = [[[2, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0],

              [1, 1, 0, 1, 1, 0, 0, 1, 1 ,0, 0, 0],]]

backward_recurrent_kernel = [[1, 0, 0, 1, 2,1,0,1,2,0,1,0],

                              [1, 1, 0, 0, 2,1,0,1,2,2,0,0],

                              [1, 0, 1, 2, 0,1,0,1,1,0,1,0]]

backward_biase = [3, 1, 0, 1, 1,0,0,1,0,2,0.0,0]

Through the following hand calculation, the output is: [[[0. 4. 0. 4. 0.]]], the result of forward/backward memory_state is [[0. 4. 0.]], the result of forward/backward carry_state is [[0. 4. 1.]]. Note that there is no activation function.

 Code to verify the above results

def change_weight():
    # Create a simple Bidirectional LSTM layer
    lstm_layer = LSTM(units=3, input_shape=(3, 2), activation=None, recurrent_activation=None, return_sequences=True,
                      return_state= True)

    bi_lstm_layer = Bidirectional(lstm_layer, merge_mode='concat')

    # Simulate input data (batch size of 1 for demonstration)
    input_data = np.array([
                [[1.0, 2], [2, 3], [3, 4]],
                [[5, 6], [6, 7], [7, 8]],
                [[9, 10], [10, 11], [11, 12]]
        ])

    # Pass the input data through the layer to initialize the weights and biases
    bi_lstm_layer(input_data)

    kernel = np.array([[2, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0],
                       [1, 1, 0, 1, 1, 0, 0, 1, 1 ,0, 0, 0],])

    recurrent_kernel = np.array([[1, 0, 0, 1, 2,1,0,1,2,0,1,0],
                                 [1, 1, 0, 0, 2,1,0,1,2,2,0,0],
                                 [1, 0, 1, 2, 0,1,0,1,1,0,1,0]])

    biases = np.array([3, 1, 0, 1, 1,0,0,1,0,2,0.0,0])

    bi_lstm_layer.set_weights([kernel, recurrent_kernel, biases, kernel, recurrent_kernel, biases])
    print(bi_lstm_layer.get_weights())

    test_data = np.array([
        [[1,0.0]]
    ])

    output, memory_state, carry_state, backward_memory_state, backward_carry_state  = bi_lstm_layer(test_data)

    print('output = ',output.numpy())
    print('forward memory_state = ', memory_state.numpy())
    print('forward carry_state = ',carry_state.numpy())
    print('backward memory state = ', backward_memory_state.numpy())
    print('backward carry state = ',backward_carry_state.numpy())

if __name__ == '__main__':
    change_weight()

output

[array([[2., 1., 1., 0., 0., 0., 0., 1., 1., 0., 1., 0.],
       [1., 1., 0., 1., 1., 0., 0., 1., 1., 0., 0., 0.]], dtype=float32), array([[1., 0., 0., 1., 2., 1., 0., 1., 2., 0., 1., 0.],
       [1., 1., 0., 0., 2., 1., 0., 1., 2., 2., 0., 0.],
       [1., 0., 1., 2., 0., 1., 0., 1., 1., 0., 1., 0.]], dtype=float32), array([3., 1., 0., 1., 1., 0., 0., 1., 0., 2., 0., 0.], dtype=float32), array([[2., 1., 1., 0., 0., 0., 0., 1., 1., 0., 1., 0.],
       [1., 1., 0., 1., 1., 0., 0., 1., 1., 0., 0., 0.]], dtype=float32), array([[1., 0., 0., 1., 2., 1., 0., 1., 2., 0., 1., 0.],
       [1., 1., 0., 0., 2., 1., 0., 1., 2., 2., 0., 0.],
       [1., 0., 1., 2., 0., 1., 0., 1., 1., 0., 1., 0.]], dtype=float32), array([3., 1., 0., 1., 1., 0., 0., 1., 0., 2., 0., 0.], dtype=float32)]
output =  [[[0. 4. 0. 0. 4. 0.]]]
forward memory_state =  [[0. 4. 0.]]
forward carry_state =  [[0. 4. 1.]]
backward memory state =  [[0. 4. 0.]]
backward carry state =  [[0. 4. 1.]]

Guess you like

Origin blog.csdn.net/keeppractice/article/details/132461920