[Paper Writing Analysis] Part 5 "FastText Short Text Classification Fusion Category Feature Extension and N-gram Subword Filtering"

[1] Reference paper information

  Paper name: "fastText short text classification with fusion of category feature expansion and N-gram subword filtering"

  Published journal: "Small Microcomputer System"

  Journal Information: CSCD Extension
insert image description here
  Thesis writing analysis summary:

  From an innovative point of view, this paper extracts and filters the unary grammar, binary grammar and ternary grammar of the text by three basic means of TFIDF, LDA and information entropy during text preprocessing, and then uses it as a FastText model. input to run. It doesn't seem to have any technical content.

  From my personal understanding, the three basic means of TFIDF, LDA, and information entropy are nothing more than feature extraction, so why not use CNN to extract key information? Or use attention mechanism to extract salient features? Because relatively speaking, the extraction ability of deep learning is better than that of ordinary machine learning algorithms. So I personally feel that the work of this paper is still quite controversial.

[Note]: In fact, if you use CNN to extract features first, and then use the FastText model, it is equivalent to directly using CNN for text classification. Because CNN does text classification originally, it will use multiple convolution kernels of different sizes, which is similar to FastText.

[2] Reference paper decomposition

  【Abstract part】
insert image description here
  Analysis:

  Mainly because of the good nouns. I only understood the abstract part after reading the full text. If you want to publish Chinese papers, you must have a high name.

  Using TDIDF and LDA for feature extraction, the paper is called "LDA category feature extraction method based on TF-IDF to improve the quality of category features" ; using information entropy to extract features from unary grammar, binary grammar, and ternary grammar, the paper It is called "N-gram subword filtering method based on lexical information entropy to filter low category discrimination contribution subwords in N-gram subwords" ; the text after feature extraction is fed into FastText, and the paper is called "Building a more focused EF-fastText short text classification model with high class discriminative contribution semantic feature learning" .

[Note]: Everyone is careful.


  【Introduction part】

  analyze:

  More conventional, I introduced the work of each paper and the main contribution of my own paper.


  [TFIDF+LDA part]

  analyze:

  TF-IDF was introduced. Then, the processing flow chart of the LDA category feature extraction method based on TF-IDF is given:
insert image description here

[Note]: I personally think that TFIDF+LDA should not be used as a separate chapter.


  [The information entropy part of N-gram]

  analyze:

  The concepts of information entropy and multi-grammar are introduced, and then the processing flow chart of the N-gram subword filtering method based on lexical information entropy is given:
insert image description here

[Note]: I still feel that the content here should not be regarded as a separate chapter.


  [Classification model part]

  analyze:

  The FastText model diagram after using the above two feature extraction is given:

insert image description here

[Note]: This FastText model has not been changed at all, but feature extraction has been added in front. . . .

Guess you like

Origin blog.csdn.net/qq_43592352/article/details/124425668