Algorithms commonly used in NLP

Detailed algorithm

(1) Classification algorithm

(1).LR (Logistic Regression, also called logical classification)
(2).SVM (Support Vector Machine)
(3).NB (Naive Bayes)

(4).KNN
(5).DT (Decision Tree, decision tree))

1) .C4.5
2) .ID3
3).CART

(5). Integrated algorithm

1).Bagging
2).Random Forest (Random Forest)
3).GB (Gradient boosting)
4).GBDT (Gradient Boosting Decision Tree)
5).AdaBoost
6).Xgboost

(6). Maximum entropy model -----------------------------------------?

(2) Regression algorithm

(1).LR (Linear Regression)
(2).SVR (Support Vector Machine Regression)
(3). RR (Ridge Regression)

(3). Clustering algorithm

(2). Kmeans algorithm
(3). Hierarchical clustering
(4). Density clustering

(4) Dimensionality reduction algorithm (model optimization calculation parameters)

(1).SGD (Stochastic Gradient Descent)

(2) Newton method, quasi-Newton method

(3) Coordinate axis descending method

(5) Probabilistic graph model algorithm

(1)
.Bayesian network (2).HMM
(3).CRF (Conditional Random Field)

(6) Text mining algorithm

(1). Model

1).LDA (Theme Generation Model, Latent Dirichlet Allocation)
2). Maximum entropy model

(2).Keyword extraction

1).tf-idf
2) .bm25
3) .textrank
4).pagerank
5). Left and right entropy: high left and right entropy as keywords
6). Mutual information:

(3). Lexical analysis

1). Word segmentation
-①HMM (Inmarkov)
-②CRF (Conditional Random Field)
2). Part-of-speech tagging
3). Named entity recognition

(4). Syntactic analysis

1). Syntactic structure analysis
2). Dependent syntax analysis

(5). Text vectorization

1).tf-idf
2).word2vec
3) .doc2vec
4).cw2vec

(6). Distance calculation

1). Euclidean distance
2). Similarity calculation

(7) Optimization algorithm

(1). Regularization

1). L1 regularization
2).L2 regularization

(8) Deep learning algorithm

(1) .BP
(2) .CNN
(3) .DNN
(3) .RNN
(4) .LSTM

2. Modeling

(1) Model optimization·

(1). Feature selection
(2). Gradient descent, coordinate axis descent, Newton method quasi-Newton method, convex optimization
(3). Cross validation
(4). Parameter tuning
(5). Model evaluation: accuracy, recall, F1, AUC, ROC, loss function

(2) Data preprocessing

(1). Standardization
(2). Outlier handling
(3). Binarization and Intervalization
(4). Missing value filling: support mean, median, specific value compensation, multiple imputation

Deep learning networks use less data preprocessing:
① Default value processing: regardless of machine or depth,
② Centralization (standardization, which is based on columns), regularization (by behavior units), and whitening (first SVD reduction) Dimension, in standardization)