Examples of the dicarboxylic Chinese short text classification -FastText (Bag of Tricks for Ef fi cient Text Classi fi cation)

I. Introduction

        FastText ( Bag of Tricks for Ef of Fi cient the Text Classi Fi cation ) is a neural network Facebook AI Research presented, it is a simple and efficient linear classification model, it is possible to achieve mass text classification in a very short time, to support millions of data the amount.

        And, facebook has been elegantly realized in C ++ fasttext, built a lot of tricks, in some cases, it can even exceed some of the complex parameter adjustment of the depth of the neural network. Industrial production, high ease of use makes it possible to use as a baseline for text classification task.

        In fact, from the development process point of view depth learning and artificial intelligence, simplicity, accuracy, efficiency, solve a puzzle in which any algorithm, are likely to lead the revolution, is commendable. Simple, for example jieba ease of use make it the most popular word python is the most widely used, and if the recent hot mobile phone neural network model (moblie network), the key is to reduce the rise of the model parameters, keras and pytorch of, also illustrates this point; accuracy, this would not have said, the accuracy, which is what we are eternal pursuit, particularly examples of popular bert, CNN dominate in the image areas; efficiency, that is fast, this is also very important, speed is life, transfmer, accompanied by the rise of attention of the decline rnn, fasttext, word2vec applications.

        So, even though FastTest looks very simple, only a layer of the neural network structure, but it can not be underestimated. In particular, multi-various tricks.

        FastText classification code github Address: https://github.com/yongzhuo/Keras-TextClassification/tree/master/keras_textclassification

 

Two. FastText network

         FastText focus mainly includes three parts: N-gram features, SoftMax hierarchical, network model (fasttext model)

2.1 FastText model diagram

                                                  

        In fact, it is a simple linear classifier, y = softmax (Wx + b). Wherein characterizing fasttext text, features, is input, the word continuous bags (CBOW) and word2vec term vectors like tool. Which N short text classification, using the symbol for the classification softmax function f, the objective function is optimized under FIG negative log-likelihood.

                                                         

       Figure above symbols mean: Xn is the n th sample of the text data preprocessing features, Yn is the corresponding label category, A, B is a weight matrix.

2.2 level softmax

       Softmax also called hierarchical level softmax, to solve computationally intensive problems softmax classification. For example, know almost see the mountains cup text multi-label classification games in the L (1999) classes, each sample probability L (1999) classes of computing, d is the characterization of the dimension text, then calculate this time complexity is O (dL) = 300 * 1999.

       To solve this problem large computation, there are two methods, a sample is negative (Negative Sampling, NEG (Noise Contrastive Estimation) simplified version), is to take k (5-20,2-5 L-1 from the classes of ) Comparative samples like, so that computational complexity becomes O (dk) = 300 * 5, it will bring some loss calculation.

        Another approach is to sequence softmax (Hierarchical Softmax), that is, to build a Huffman tree label according to frequency of occurrence (at this time and the minimum weights, i.e. optimal binary tree), this time complexity becomes O (d * log2 (k)), k is the depth of the tree.

        Huffman coding is recommended, he should understand: https://www.cnblogs.com/kubixuesheng/p/4397798.html

2.3  N-gram

        N-gram extraction feature should not be difficult, but also easy to understand, it is to extract all the text n consecutive words. For example samples [ 'I', 'hi', 'joy', 'you'], n = 2, extracted is [( 'I', 'hi'), ( 'hi', 'joy'), ( 'Huan', 'you')].

 

III. Code implementation

3.1 code is not difficult to implement, it is one thing, is the n-gram, hierarchical-softmax complicated, but there are also a lot of information online friends.

3.2 github code address:  https://github.com/yongzhuo/Keras-TextClassification/tree/master/keras_textclassification

       The code does not implement n-gram and hierarchical-softmax, but should not be difficult.

       Facebook directly or fasttext tool like open source, but also simple optimization very well, than write their own effect much better.

3.3 key code:

    def create_model(self, hyper_parameters):
        """
            构建神经网络
        :param hyper_parameters:json,  hyper parameters of network
        :return: tensor, moedl
        """
        super().create_model(hyper_parameters)
        embedding = self.word_embedding.output
        x = GlobalMaxPooling1D()(embedding)
        output = Dense(self.label, activation=self.activate_classify)(x)
        self.model = Model(inputs=self.word_embedding.input, outputs=output)
        self.model.summary(120)

I hope for your help!

 

 

 

Published 96 original articles · won praise 72 · views 120 000 +

Guess you like

Origin blog.csdn.net/rensihui/article/details/91664030