Combining CNN and RNN to process sequences (Tensorflow)

1. Reasons why convolution can be used to process sequences

As mentioned earlier:
the cyclic neural network is a kind of recursive neural network (recursive neural network) that takes sequence data as input, performs recursion in the evolution direction of the sequence, and all nodes (cyclic units) are connected in a chain.
LSTM, GRU, and BRNN (bidirectional recurrent neural network) are all designed to associate the word with surrounding words , and the correlation between words that are too far away is weak, so a lot of calculations will be wasted.

Each convolutional layer in the convolutional neural network is composed of several convolutional units, and the parameters of each convolutional unit are optimized through the backpropagation algorithm. Feature mapping is achieved by extracting different features of the input through convolution operations. Convolutional layers turn global patterns into local ones .
In vision processing, we use 2D convolutions :
insert image description here

So one-dimensional convolution can be used to extract sequence segments from sequences and identify local patterns in sequences .
insert image description here

Two, one-dimensional convolutional neural network

One-dimensional convolutional neural networks can be used for audio generation, machine translation, and more. Using a one-dimensional convolutional neural network to process sentences can recognize words and word segments very well, and a word-level one-dimensional neural network can learn word formation.

1. One-dimensional convolutional layer Conv1D:

The method of use is roughly the same as Conv2D.

kernel_size : An integer or a tuple/list of integers specifying the length of the 1D convolution window.
strides: Indicates the step size.
padding: Indicates the padding method: "SAME" means to use the padding method, which is simply understood as padding the edge with 0. When the stride is 1, the dimensions of the input and output are the same; "VALID" means to use no padding, and discard it redundantly. "CAUSAL" causes a causal (dilated) convolution, i.e. output[t] does not depend on input[t+1:]. Useful when modeling temporal data where the model should not violate the time order.

The input is (samples, time, features), and the convolution window is a one-dimensional window on the time axis.

tf.keras.layers.Conv1D(
filters, kernel_size, strides=1, padding=‘valid’,
data_format=‘channels_last’, dilation_rate=1, groups=1,
activation=None, use_bias=True, kernel_initializer=‘glorot_uniform’,
bias_initializer=‘zeros’, kernel_regularizer=None,
bias_regularizer=None, activity_regularizer=None, kernel_constraint=None,
bias_constraint=None, **kwargs
)

# The inputs are 128-length vectors with 10 timesteps, and the batch size
# is 4.
input_shape = (4, 10, 128)
x = tf.random.normal(input_shape)
y = tf.keras.layers.Conv1D(32, 3, activation='relu',input_shape=input_shape[1:])(x)
print(y.shape)

2. One-dimensional pooling MaxPool1D:

Extract a one-dimensional subsequence, output its maximum value (maximum pooling) or average value (average pooling), and reduce the input length.

Downsample the input representation using the maximum value over the window defined by pool_size. The window moves a lot of steps. The resulting output when using the "effective" padding option has the following shape: output_shape = (input_shape - pool_size + 1) / strides
When using the "same" padding option, the resulting output shape is: output_shape = input_shape/strides
pool_size: Integer, the size of the max pooling window.
strides : integer or None. Specifies how much the merge window is shifted for each merge step. If None, it will default to pool_size.
Fill with one of "valid" or "same" (case insensitive). "valid" means no padding. "same" causes uniform padding to the left/right or top/bottom of the input so that the output has the same height/width dimensions as the input.

tf.keras.layers.MaxPool1D(
pool_size=2, strides=None, padding=‘valid’,
data_format=‘channels_last’, **kwargs
)

x = tf.constant([1., 2., 3., 4., 5.])
x = tf.reshape(x, [1, 5, 1])
tf.keras.layers.MaxPooling1D(pool_size=2,strides=1, padding='valid')

3. Realize one-dimensional convolutional neural network

Stack the Conv1D layer and the MaxPooling1D layer , and finally do global pooling or use the Flatten layer to convert the 3D output into 2D . Dense layers can be added to the model for classification and regression.

Example:

import tensorflow as tf
from tensorflow import keras
from keras.datasets import imdb
from keras.preprocessing import sequence

#加载数据集
import numpy as np
max_features = 1000
max_len = 500

(x_train,y_train),(x_test,y_test) = imdb.load_data(num_words=max_features)
x_train = sequence.pad_sequences(x_train,maxlen=max_len)
x_test = sequence.pad_sequences(x_test,maxlen=max_len)

#一维卷积神经网络

model = keras.Sequential([
  keras.layers.Embedding(max_features,128,input_length=max_len),
  keras.layers.Conv1D(32,7,activation='relu'),
  keras.layers.MaxPooling1D(5),
  keras.layers.Conv1D(32,7,activation='relu'),
  keras.layers.GlobalMaxPooling1D(), #全局池化
  keras.layers.Dense(1)    
])

#model.summary()

model.compile(optimizer=tf.optimizers.RMSprop(lr=1e-4),loss=tf.losses.binary_crossentropy,metrics=['acc'])

history = model.fit(x_train,y_train,epochs=10,batch_size=128)

3. CNN and RNN combined processing sequence

A 1D convolutional neural network processes each input sequence segment separately, is insensitive to time , and learns a pattern at all locations, but does not know the temporal location of that pattern. New data points should be interpreted differently than data points from earlier times.
So, combine the speed and light weight of Convolutional Neural Networks with the order sensitivity of RNNs .
Convolution can be used in front of RNN as a preprocessing step

Guess you like

Origin blog.csdn.net/qq_43842886/article/details/113852939