TextCNN model and principle implementation method

Model principle

Apply the principle of convolutional neural network to text classification tasks, using multiple different size sizessize k e r n e l kernel Kernel to extract the key information in the sentence, (similar to the multi-window size ngram), so as to better capture the relevance of the sentence .

network structure

insert image description here

The detailed process of Tecnn is as follows

insert image description here

detailed process

  • Embedding: The first layer is the leftmost 7 × 5 7 \times 5 in the figure7×5 sentence matrix, each row is a word vector, dimension = 5, which can be similar to the original pixels in the image.
  • Convolution: Then after kernelsize = 2 , 3 , 4 kernel_size = {2,3,4}kernelsize=2,3,4 one-dimensional convolutional layers, each core has two output channelschannel channelchannel
  • MaxPolling: The third layer is a pooling layer, so that sentences of different lengths go through pooling poolingAfter the p oo ling layer , it can become a fixed-length representation.
  • FullConnection and Softmax: Finally, a layer of fully connected softmax softmaxThe so f t max layer outputs the probability of each category.

Channels:

  • Images can use R , G , BR,G,BR,G,B as a different channel.
  • The input channel of the text is usually a different embedding method. For example, word2vec or glove, in practice, static word vectors and fine − tunning fine-tunning are also usedfineThe t u nning word vector is used as a different channel approach.

One-dimensional convolution:

  • Images are two-dimensional data.
  • Text is one-dimensional data , so one-dimensional convolution is used in TextCNN convolution, (one-dimensional convolution at the word_level level, although the text is two-dimensional data after being expressed by word vectors, but at the embedding-level Two-dimensional convolution has no meaning) The problem caused by one-dimensional convolution is to obtain different widths of field of view by designing different kernel_size filters .

Pooling layer:

There are still many problems in text classification using CNN, such as this A Convolutional Neural Network for Modeling Sentences, the most interesting input is to change the pooling to (dynamic) k-max pooling , and keep kk in the pooling stageThe k largest information retains the global sequence information.
For example, in sentiment analysis scenarios, for example:

“I think the scenery of this place is not bad, but there are too many people”

Although the emotion reflected in the first half is positive, the global text expression is also biased towards negative information, using k − max − pooling k-max-poolingkmaxP ooling can capture this kind of information very well.

framework implementation

Here use keras kerask er a s library to implement textcnn:

import logging
from keras import Input
from keras.layers import Conv1D, MaxPool1D, Dense, Flatten, concatenate, Embedding
from keras.models import Model
from keras.utils import plot_model
def textcnn(max_sequence_length, max_token_num, embedding_dim, output_dim, model_img_path=None, embedding_matrix=None):
    """ TextCNN: 1. embedding layers, 2.convolution layer, 3.max-pooling, 4.softmax layer. """
    x_input = Input(shape=(max_sequence_length,))
    logging.info("x_input.shape: %s" % str(x_input.shape))  # (?, 60)
    if embedding_matrix is None:
        x_emb = Embedding(input_dim=max_token_num, output_dim=embedding_dim, input_length=max_sequence_length)(x_input)
    else:
        x_emb = Embedding(input_dim=max_token_num, output_dim=embedding_dim, input_length=max_sequence_length,
                          weights=[embedding_matrix], trainable=True)(x_input)
    logging.info("x_emb.shape: %s" % str(x_emb.shape))  # (?, 60, 300)
    pool_output = []
    kernel_sizes = [2, 3, 4] 
    for kernel_size in kernel_sizes:
        c = Conv1D(filters=2, kernel_size=kernel_size, strides=1)(x_emb)
        p = MaxPool1D(pool_size=int(c.shape[1]))(c)
        pool_output.append(p)
        logging.info("kernel_size: %s \t c.shape: %s \t p.shape: %s" % (kernel_size, str(c.shape), str(p.shape)))
    pool_output = concatenate([p for p in pool_output])
    logging.info("pool_output.shape: %s" % str(pool_output.shape))  # (?, 1, 6)
    x_flatten = Flatten()(pool_output)  # (?, 6)
    y = Dense(output_dim, activation='softmax')(x_flatten)  # (?, 2)
    logging.info("y.shape: %s \n" % str(y.shape))
    model = Model([x_input], outputs=[y])
    if model_img_path:
        plot_model(model, to_file=model_img_path, show_shapes=True, show_layer_names=False)
    model.summary()
    return model

feature

The word vector representation method used here has
a large amount of data : the embeddings can be directly randomly initialized, and then the embeddings can be updated and learned by training the model network based on the corpus .
Small amount of data : External corpus can be used to pre-train (pre_train) word vectors, and then input to the embedding layer , and the embedding is initialized with the pre-trained word vector matrix. (By setting weights=[embedding_matrix])
Static (static) method : The embeddings are no longer updated during the training process.
It is essentially transfer learning, especially when the target field or the amount of data is relatively small , the effect of using static word vectors is also good. Yes, (by setting trainable=False)
non-static (non-static) paradigm: updating and fine-tuning embeddings during training can speed up convergence, by setting trainable=True)

The structure diagram of the TextCNN model drawn by plot_model() is as follows:
insert image description here

Guess you like

Origin blog.csdn.net/kuxingseng123/article/details/129205906