A Progressive Semantic Matching Method for Semi-Supervised Text Classification

This article is from NAACL 2022. It introduces a semi-supervised text classification method, using CSR (Class Semantic Representation) category semantic representation, which is similar to label embedding, to match the input text, generate pseudo-labels, and conduct semi-supervised training. At the same time, this During the process, the CSR will be continuously updated to improve the accuracy of pseudo-labeling.

Brief summary: bert trains the classifier, uses bert to encode to get CSR, which is what we often call label embedding, and finally performs semantic matching to generate pseudo-labels for iterative update. The authors of this article are from several classmates in Australia.

论文地址:Progressive Class Semantic Matching for Semi-supervised Text Classification | Papers With Code

Code address: HeimingX/PCM: Official implementation of PCM “Progressive Class Semantic Matching for Semi-supervised Text Classification” (github.com) The author has not yet opened source


1. Summary

Semi-supervised learning is an effective way to reduce the cost of annotation for text classification. Combined with a pretrained language model (PLM), such as BERT, recent semi-supervised learning methods achieve impressive performance. In this work, we further investigate the relationship between semi-supervised learning and pretrained language models. Different from existing methods that only use PLM for model parameter initialization, we explore the inherent topic matching ability inside PLM to build a more powerful semi-supervised learning method.

Specifically, we propose a joint semi-supervised learning procedure that gradually builds a standard K-way classifier and matching network for input text and class semantic representation (CSR). The CSR will be initialized from the given labeled sentences and gradually updated during training. Through extensive experiments, we show that our method not only brings significant improvements over baselines, but is also generally more stable, achieving state-of-the-art performance in semi-supervised text classification.

2. Introduction

In this article, we further explore the use of SSL by PLM. Specifically, we found that some PLMs, such as BERT, have the ability to coherently match sentences and class-related words due to their pre-trained pretext task (Devlin et al., 2019), and we developed a joint training process to gradually update Three components, a classifier that performs standard K-way classification, a Class Semantic Representation (CSR) that represents the semantics of each category, and a matching classifier that matches the input semantics to the CSR. These three components can help each other during training, that is, by jointly generating pseudo-labels with the matching classifier, the K-way classifier will receive more accurate pseudo-labels; the matching classifier will also be guided by the K-way classifier Next upgrade its matching ability. With the improvement of K-way classifier and matching 3003, CSR will become more accurate and comprehensive.

3. Related work

3.1 Semi-supervised learning

Semi-supervised learning is a long-standing research topic in machine learning. Existing methods exploit unlabeled samples in different ways, e.g., “conduction” models (Joachims, 2003; Gammerman et al., 2013), multi-view style approaches (Blum and Mitchell, 1998; Zhou and Li, 2005) and generative approach to models (Kingma et al., 2014; Springenberg, 2016). With the renaissance of deep neural networks, deep SSL methods based on consistency regularization (Laine and Aila, 2017; Tarvainen and Valpola, 2017; Miyato et al., 2018) have achieved impressive performance on various tasks. performance, our work builds heavily on such methods. The key idea of ​​these methods is to make the model consistent in the neighborhood of each sample in the input space. Specifically, the ∏ model (Laine and Aila, 2017), UDA (Xie et al., 2019b), and FixMatch (Sohn et al., 2020) directly add various perturbations to the input data, and the Mean teacher (Tarvainen and Valpola, 2017) Using a teacher model to simulate sample perturbations, virtual adversarial training (Miyato et al., 2018) cleverly constructs adversarial examples. More recently, a hybrid (Zhang et al., 2018) approach proposes another consistency constraint, requiring the input and output of the model to satisfy the same linear relationship. Based on this technique, many state-of-the-art methods are published, such as ICT (Verma et al., 2019b), MixMatch (Berth-elot et al., 2019b) and ReMixMath

3.2 Semi-supervised text classification

Semi-supervised learning has received extensive attention in the field of text classification. Many recent semi-supervised text classification methods focus on how to apply existing SSL methods to sentence inputs. (Miyato et al., 2017) apply perturbations to word embeddings to build adversarial and virtual adversarial training. (Clark et al., 2018) design auxiliary prediction modules with input-constrained views to encourage consistency across views. As PLM was developed, (Jo and Cinarel, 2019) self-trained between two sets of classifiers with different initializations, one with pretrained word embeddings and the other with random values. Both (Xie et al., 2019b) and (Chen et al., 2020) use pre-trained BERT to initialize the sentence feature extractor, where the former performs consistency regularization between the original sentence and its back-translated generated sentence, and the latter further Introducing manifold mixing (Verma et al., 2019a) to text classification. Although these methods may achieve good performance, we believe that they have not fully explored the inherent knowledge in PLM. Our work goes a step further in this direction.

4. Method

Build a procedure that can update three components jointly:

(1) Standard K-way classifier

(2) Matching classifiers that match text to class semantic representations

(3) Class Semantic Representation (CSR) itself. The update of each component will help other components, so that the classification performance can be guided iteratively.

We call our approach progressive class semantic matching (PCM).

insert image description here

5. Experiment

experiment settings

Consistent experimental settings with MixText, evaluating the effect of PCM on four data sets: AG News, DBpedia (Lehmann et al., 2015), Yahoo! Answers (Chang et al., 2008), and IMDB (Maas et al. , 2011) . At the same time, we use the mutual translation method for data enhancement, using the fairseq toolkit.

The learning rate uses 5e-6 for the bert encoder, 5e-4 for the classifier,

baseline settings:

  1. BERT-FT direct bert for fine-tune
  2. UDA Unsupervised Data Augmentation (Xie et al., 2019b)
  3. MixText (Chen et al., 2020)

insert image description here

Ablation experiment
  1. Importance of using two classifiers in PCM.

  2. If double loss is the key to success using dual classifiers?

  3. Prediction quality of K-way classifier and matching classifier

  4. Impact of Updating the CSR

    insert image description here

6. My thoughts

After looking at this article, I feel that it is not worth it. I think the method used is very simple. Use the result of bert [cls]text [sep]label[sep], first perform encoder, then learn fc classifier and matching classifier, and use matching classifier To generate pseudo labels for iterative update.

The comparison experiment set up by the author is not reasonable. Just comparing it with the previous mixtext does not prove the specific and effective part of the work, but it is also a method of generating pseudo-labels. In the future, competition tricks can be improved by using this method to generate Pseudo-label iteration to mention points.

Guess you like

Origin blog.csdn.net/be_humble/article/details/127685809