NLP practice (news text classification)-competition question understanding and ideas

Question understanding

data collection

This competition is an entry-level competition for Tianchi NLP, and the operation is the same as usual. Register first, then get the data.
Insert picture description here
Pay attention to the standards.

Competition questions

Since the data given in the question is anonymized, we cannot use operations such as word segmentation to extract keywords for simple prediction. What we can use is a classifier that extracts features from text or a deep learning classifier. In general, we have the following ideas :

  • Idea 1: TF-IDF + machine learning classifier: directly use TF-IDF to extract features from the text, and use the classifier to classify. In the choice of classifier, you can use SVM, LR, or XGBoost.

  • Idea 2: FastText: FastText is an entry-level word vector. Using the FastText tool provided by Facebook, you can quickly build a classifier.

  • Idea 3: WordVec +
    deep learning classifier: WordVec is an advanced word vector, and the classification is completed by constructing a deep learning classification. The network structure of deep learning classification can choose TextCNN, TextRNN or BiLSTM.

  • Idea 4: Bert word vector: Bert is a highly matched word vector, with powerful modeling learning capabilities.

We will implement them one by one in the future.

I also hope that I can stick to this competition

Guess you like

Origin blog.csdn.net/weixin_45696161/article/details/107475518