Detailed explanation of NLP-TextCNN

What is TextCNN

When referring to CNN, it is usually considered to belong to the CV field and is a model used to solve computer vision direction problems. However, in 2014, Yoon Kim made some deformations for the input layer of CNN and proposed a text classification model TextCNN. Compared with the CNN network of traditional images, TextCNN has no change in the network structure (even simpler). From the figure below, it can be seen that TextCNN actually only has one layer of convolution, one layer of max-pooling, and finally connects the output to softmax. Do n classification.

Compared with the CNN network in the image, the biggest difference between TextCNN is the difference in input data:

The image is two-dimensional data, and the convolution kernel of the image slides from left to right and from top to bottom for feature extraction.

Natural language is one-dimensional data. Although a two-dimensional vector is generated through word embedding, it is meaningless to perform convolution on the word vector by sliding it from left to right. It is necessary to treat a single word vector as a whole for feature extraction. For example, for the vector [0, 0, 0, 0, 1] corresponding to "today", slide from left to right according to the window size of 1*2 to get [0,0], [0,0], [0,0], The four vectors [0, 1] correspond to the word "today", and this kind of sliding is not helpful.


Advantages of TextCNN

  1. The biggest advantage of TextCNN is its simple network structure. With such a simple model network structure, it still has a very good effect by introducing trained word vectors, surpassing benchmarks on multiple data sets.

  1. The simple network structure leads to a small number of parameters, a small amount of calculation, and a fast training speed. On a v100 machine with a single machine and a single card, training 1.65 million data, iterating 260,000 steps, can converge in about half an hour.


The detailed process of TextCNN

  1. The first is to complete the sequence: the len_max of the model in the above figure is set to 7, and the sentence I like this movie very much! The length is 7, and no additional <PAD> padding is required.

  1. 构建词库并进行word embedding:根据训练数据中的所有句子构建word_2_idex词库,将单词转换为对应的索引,通常词库的长度为所有句子中不同单词的总数加2,因为<PAD>使用索引0用于表示填充的单词,<UNK>使用索引1用于表示词库中没有的单词。再对每个单词进行word embedding处理,图中每个单词的embedding_num设置为5,得到了一个7x5的输入矩阵。

  1. 确定input_channel的大小:对于图像来说存在R、G、B三个色彩空间,其对应的输入通道数为3,通常文本的输入通道数为1,可以采用不同的embedding方式(比如word2vec、Bert等)扩充输入通道。实践中也有利用静态词向量和fine-tunning词向量作为不同输入通道的做法,但是实验证明,单通道的TextCNN 表现都要优于多通道的TextCNN。图中采用单通道数。

a. CNN-rand(随机词向量)
指定词向量的维度embedding_num后,文本分类模型对不同单词的向量作随机初始化, 后续有监督学习过程中,通过BP的方式更新输入层的各个词汇对应的词向量。
b. CNN-static(静态词向量)
使用预训练的词向量,即利用word2vec、fastText或者Glove等词向量工具,在开放领域数据上进行无监督的学习,获得词汇的具体词向量表示方式,拿来直接作为输入层的输入,并且在TextCNN模型训练过程中不再调整词向量, 这属于迁移学习在NLP领域的一种具体的应用。
c. CNN-non-static(非静态词向量)
预训练的词向量+ 动态调整 , 即使用word2vec训练好的词向量初始化, 训练过程中再对词向量进行微调。
d. multiple channel(多通道)
借鉴图像中的RGB三通道的思想, 这里也可以用 static 与 non-static 两种词向量初始化方式来搭建两个通道。
  1. 确定卷积核的大小和数量:前面提到,在对文本处理时,卷积核的只从上到下滑动,所以卷积核的一个维度大小为embedding_num(此处为5)固定不变。样例中的卷积核有三种,分别是2x5,3x5,4x5尺寸的窗口,通过与输入矩阵进行卷积操作,得到卷积后的矩阵。

a. 卷积层由不同窗口大小的卷积核构成,窗口大小其实就是识别n-gram信息,卷积核大小对模型性能有较大影响;
b. 卷积核个数可以作为超参数,由自己定义,卷积核的数量对模型的性能有重要的影响,增加卷积核数量会增加模型的训练时间;
c. 同一个卷积核参数共享,可以极大减少参数个数,因为参数共享,所以一个卷积核只能提取同一类特征, 一个卷积核就是一类特征识别区;
  1. 确定output_channel的大小:样例中输出通道的大小为2,即每个尺寸的卷积核有两个,它们分别与输入矩阵进行卷积。

  1. 池化层:TextCNN中使用的最大池化,1-max pooling在卷积得到的矩阵中选择最大值代表该矩阵。

a. 平均池化(average pooling):平均池化就是对每个通道的所有数值求均值;
b. 最大池化(max pooling):最大池化就是对每个通道的所有数值求最大值;
c. 在TextCNN中1-max池化的性能优于其他池化策略。
  1. 拼接操作:将上一步各个矩阵池化后的结果进行拼接,作为最终分类的输入。

  1. 全连接层:根据池化层的输出和分类类别数量,构建全连接层,再经过softmax,得到最终的分类结果,torch.nn.Linear(input_num, num_class)即可定义全连接层。

Guess you like

Origin blog.csdn.net/fzz97_/article/details/129065899