00- Summary of Machine Learning Algorithms (Algorithms)

Algorithm focus

Algorithm classification and summary:

  1. Clustering : unsupervised learning, the learning results will generate several sets, and the elements in the sets are similar to each other;
  2. Classification (class) : Supervised learning, the learning results will generate several functions, which are divided into several sets by the functions, and the data objects are discrete values ;
  3. Regression (regression) : Supervised learning, the learning results will generate several functions, and the continuous results will be generated through the functions, and the data objects are continuous values ;

There are three ways of algorithm integration (bagging, Boosting, Stacking) , common integration algorithms :

  • Random Forest (Random Forest: bagging + decision tree):   Randomly extract the training set according to horizontal (random sampling) and column (random sampling features) with replacement, obtain n new training sets, and train n decisions tree, through which n trees vote to determine the classification result. The main parameters are n_estimators and max_features.
  • Adaboost (adaptive boosting: boosting + single-layer decision tree):  Each sample in the training data is assigned a weight, and these weights form the vector D. Initially, these weights are initialized to equal values. First, a weak classifier is trained on the training data and the error rate of the classifier is calculated, and then the classifier is trained on the unified data set. In the second training, the weight of those samples that were misclassified by the previous classifier is increased. Repeatedly, many classifiers are trained for weighted voting, and the weight of each classifier is calculated based on the error rate of the classifier.

  • GBDT (Gradient Boosting Decision Tree: boosting + decision tree): GBDT is similar to Adaboost. Multiple decision trees are trained repeatedly, and the weight of the training set is updated each time according to the direction of the negative gradient of the loss function. The process is more complicated, so I won't go into details here. Parameter description: n_estimators is the number of weak classifiers; max_depth or max_leaf_nodes can be used to control the size of each tree; learning_rate is a hyper-parameter with a value range of (0, 1.0], used to control overfitting and underfitting .

    • The decision tree also has a brother regression tree , and GBDT also has a brother GBRT for regression :

  • Design by yourselfAccording to bagging or boosting ideas, choose weak classifiers to integrate : such as knn integration algorithm

Recall and Precision:

  • Recall (Recall) : Indicates how many positive examples in the sample are predicted correctly (find all) and the proportion of all positive examples that are correctly predicted.  # Originally the correct value and the predicted value is the proportion of the correct value

    Purpose: Used to evaluate the detection coverage of the detector for all targets to be detected

  • Precision (Precision) : Indicates how many of the predicted positive samples are true positive samples (found correctly). The proportion of true positive examples in the prediction results.

    Purpose: Used to evaluate the accuracy of the detector based on the detection success

You can refer to the blogAnalysis and code of eight common regression algorithms


One clustering algorithm

Clustering algorithm is an algorithm of unsupervised learning, that is to say, there is no batch of labeled data for machine training model. So the algorithm is used to find hidden connections and differences between data in data. Several sets are formed after clustering, and the elements inside the sets have a high similarity. The measure of similarity can be calculated by Euclidean distance, probability distance, and weighted distance.

Common clustering algorithms are:

1. Clustering: K-means algorithm, k-medoids algorithm, K-pototypes algorithm, CLARANS algorithm

2. Hierarchical clustering: BIRCH algorithm, CURE algorithm,

3. Density clustering: DBSCAN algorithm , OPTICS algorithm, DENCLUE algorithm

4. Grid clustering: STING algorithm, CLIQUE algorithm, WAVE-CLUSTER algorithm

5. Hybrid clustering: Gaussian mixture model (clustering in line with normal distribution) , CLIQUE algorithm (algorithm for integrated density and grid)

A simple comparison of several clustering algorithms:

Two Classification Algorithms

Classification algorithms require training samples of data to be input to the model first, and a function or model describing the data of this type is extracted from the training samples. Using this model to predict and classify other data, a classification algorithm is a supervised learning algorithm that models or predicts discrete random variables while producing discrete results. For example, judging whether you have cancer in medical diagnosis, and performing customer ratings in the lending process.

Common classification algorithms:

1. Decision tree : ID3, C4.5 (C5.0), CART, PUBLIC, SLIQ, SPRINT algorithms;

2. Neural network: BP network, radial basis RBF network, Hopfield network, random neural network (Boltzmann machine), competitive neural network (Hamming network, self-organizing map network);

3. Bayesian: Naive Bayesian (Naive Bayes) algorithm, TAN algorithm;

4. Classification based on association rules: CBA algorithm, ADT algorithm, CMAR algorithm, ARCS algorithm;

5. Hybrid classification method: Bagging algorithm, Boosting algorithm

6. Support Vector Machine: SVM

7. K nearest neighbor: (K-Nearest Neighbor, KNN) knn

Three regression algorithms

The regression algorithm, like the classification algorithm, is a supervised learning algorithm, so it also needs to input training samples of data into the model first. However, the difference from the classification algorithm is that the regression algorithm is a supervised learning algorithm that predicts and models numerical continuous random variables, and the results are generally numerical.

For example, input a person's data into a trained regression model to judge the person's economic ability 20 years later, the regression result of the model is continuous, and a regression curve is often obtained. When the independent variable changes, the dependent variable shows a continuous change.

Common regression algorithms:

1. Linear regression/logistic regression (LogisticRegression) /polynomial regression: LR algorithm , LWLR algorithm (local weighting), LRCV algorithm (cross-validation), MLP algorithm (neural network); (credit card anti-fraud project lgr)

2. Gradual regression;

3. Ridge regression ;

4. LASSO regression;

5. ElasticNet regression;

Four data optimization processing methods

  • Normalization : To reduce the impact of different orders of magnitude of data on prediction, mainly to reduce the data of different attributes of data to an order of magnitude.
    • Normalization of maximum and minimum values: the advantage is that all values ​​can be normalized to between 0 and 1, and the disadvantage is greatly affected by outliers.
    • 0-mean standardization: The processed data conforms to the standard normal distribution, that is, the mean is 0, the standard deviation is 1, there are positive and negative.
  • Dimensionality reduction algorithm : Dimensionality reduction algorithm tries to analyze the internal structure of the data, but the dimensionality reduction algorithm is an unsupervised learning method, trying to use less information to summarize or explain the data. Such algorithms can be used to visualize high-dimensional data or to simplify data for supervised learning.
    • Common algorithms include: Principal Component Analysis (Principle Component Analysis, PCA ), Partial Least Square Regression (Partial Least Square Regression, PLS), Sammon mapping, Multi-Dimensional Scaling (Multi-Dimensional Scaling, MDS), Projection Pursuit (Projection Pursuit) wait.
  • Polynomial regression : Dimensionality of data is increased, and underfitting is prevented when data is insufficient. Usually, existing parameters are multiplied, or squared by itself to increase the amount of data.

    • Feature dimension enhancement using PolynomialFeatures
    • An advanced version of the trait itself:w_{0}x_{0}+w_{1}x_{1}+w_{2}x_{2}+w_{3}x_{1}^{2}+w_{4}x_{2}^{2}
    • Combination features between features:w_{0}x_{0}+w_{1}x_{1}+w_{2}x_{2}+w_{3}x_{1}^{2}+w_{4}x_{2}^{2}+w_{5}x_{1}^{2}x_{2}+w_{6}x_{1}x_{2}^{2}

Five algorithm integration

There are three ensemble algorithms ( bagging, Boosting, Stacking )

  • 1.  bagging: training multiple classifiers to take the average (the most typical is random forest )
    • Random: Data sampling is random, feature selection is random
    • forest: many trees
    • Random advantages:
      • Handle very high-dimensional (many features) data without feature selection
      • After training, it can give which features are more important (according to the degree of influence of the results of training based on noise values)
      • It is easy to make a parallel method, and the speed is relatively fast
      • It can be visualized and displayed for easy analysis
  • 2. Boosting : Starting from the weak learner (weak classifier) , training is carried out by weighting. First, the first tree is constructed, and the residual is calculated. Each time a tree is added, compared with the previous tree, the residual becomes smaller . (Typical representative is AdaBoost Xgboost)
    • AdaBoost : The weight of each data is the same for the first training. If a certain data is classified incorrectly, the next training will increase the weight of this data. (Handwritten digit recognition project)
    • Xgboost: Efficiently implements the GBDT algorithm and improves the algorithm and engineering. (JD.com purchase intention project)
    • LightGBM : This algorithm is also an improvement based on the GBDT algorithm , but compared with the GBDT and XGBoost algorithms, the LightGBM algorithm effectively solves the problem of processing massive data .   (Tmall repurchase project)

      • The basic idea of ​​the GBDT algorithm is to use the training residual of the previous round as the input of the next round of learner training, that is, each input data depends on the output of the previous training. Therefore, this iterative training process needs to traverse the entire data set multiple times. When the data set has many samples or the dimension is too high, it will increase the time cost of algorithm operation and consume higher memory resources.

        As an improvement of GBDT, the XGBoost algorithm is based on a pre-sorting idea to find the best segmentation point in the feature during training. This training method will also cause a large memory space consumption. For example, the algorithm not only needs to save The feature value of the data also needs to save the result of feature sorting; when traversing each split point, it is necessary to calculate the split gain, which consumes a lot of money, especially when the data volume is large, this method will consume Too much time.

        In order to optimize these problems, Microsoft proposed the LightGBM algorithm (Light Gradient Boosting Machine) in 2017, which is also an improvement based on the GBDT algorithm. However, compared with the GBDT and XGBoost algorithms, the LightGBM algorithm effectively solves the problem of processing massive data. problems and have achieved excellent results in practical applications. The LightGBM algorithm mainly includes the following features: histogram algorithm (finding the best split point, histogram difference acceleration), leaf-wise tree growth strategy, GOSS, EFB, support for category-type features, efficient parallelism, and Cache hit rate optimization, etc.

    • The end result: the weight of the trees is determined according to the accuracy of each tree
  • 3. Stacking : aggregate multiple classification or regression models
    • Perform new training based on the results of multiple previous algorithms
    • Phased: the first phase obtains the respective results, and the second phase uses the results of the previous phase to train
    • A wide variety of classifiers can be stacked ( KNN, SVM, RF, etc.)

Six natural language algorithms

  • Probabilistic graphical models: can be divided into directed graphical models and undirected graphical models .
    • Directed graphical model (also known as Bayesian network) , such as: hidden Markov model (Hidden Markov Model, HMM)
    • Undirected graphical models (also known as Markov networks) , such as: Conditional Random Fields (CRF)
  • Naive Bayesian : It can be regarded as a special case of Bayesian network: that is, there is no edge in the network, and each node is independent (the premise is the assumption of independence).
  • Bayesian network : So, when the assumption in Naive Bayesian: independent and identical distribution does not hold, how should it be solved? Bayesian networks can be used.
    • Hidden Markov:  HMM is a type of Bayesian network - although it has the same Markov in its name as "Markov network".

Seven best parameter screening

param_grid = {'C': [0.01,0.1, 1, 10, 100, 1000,],'penalty': [ 'l1', 'l2']}
# 确定模型LogisticRegression,和参数组合param_grid ,cv指定10折
grid_search = GridSearchCV(LogisticRegression(),param_grid,cv=10) 
grid_search.fit(X_train, y_train)    # 使用训练集学习算法

Guess you like

Origin blog.csdn.net/March_A/article/details/129036253
Recommended