Classification and main algorithms of machine learning

Machine learning is undoubtedly a hot topic in the current data analysis field. Many people use machine learning algorithms more or less in their daily work. In terms of scope, machine learning is similar to pattern recognition, statistical learning, and data mining. At the same time, the combination of machine learning and processing technologies in other fields has formed interdisciplinary subjects such as computer vision, speech recognition, and natural language processing. Therefore, generally speaking, data mining can be equivalent to machine learning. At the same time, the machine learning applications we usually talk about should be universal, not only limited to structured data, but also image, audio and other applications.

The four most important types of problems in machine learning
Prediction: regression model
clustering: K-means
classification: support vector machine
reduction to: principal component analysis

There are many algorithms for machine learning. Many times, people are confused. Many algorithms are a kind of algorithms, and some algorithms are extended from other algorithms. Here, we will introduce you from two aspects. The first aspect is the way of learning, and the second aspect is the similarity of algorithms.

1. Machine learning method

Depending on the data type, there are different ways to model a problem. In the field of machine learning or artificial intelligence, people first consider how algorithms are learned. In the field of machine learning, there are several main ways of learning. It is a good idea to classify algorithms according to learning methods, so that people can choose the most suitable algorithm based on the input data to obtain the best results when modeling and algorithm selection.

1.1 Supervised learning

Insert picture description here
  Under supervised learning, the input data is called "training data", and each set of training data has a clear identification or result, such as "spam" and "non-spam" in the anti-spam system, and recognition of handwritten digits. "1", "2", "3", "4" and so on. When building a predictive model, supervised learning establishes a learning process that compares the predictive results with the actual results of the "training data", and continuously adjusts the predictive model until the predictive result of the model reaches an expected accuracy rate. Common application scenarios for supervised learning are classification problems and regression problems. Common algorithms are Logistic Regression and Back Propagation Neural Network

1.2 Unsupervised learning

Insert picture description here
In unsupervised learning, the data is not specifically identified, and the learning model is to infer some internal structure of the data. Common application scenarios include the learning of association rules and clustering. Common algorithms include Apriori algorithm and k-Means algorithm.
Insert picture description here

  1. Data preparation, feature extraction, formation of feature vectors
  2. Model training
  3. Model test
  4. Data prediction

1.3 Semi-supervised learning

Insert picture description here
  In this learning method, part of the input data is identified and part is not. This learning model can be used to make predictions, but the model first needs to learn the internal structure of the data in order to organize the data reasonably to make predictions. Application scenarios include classification and regression. Algorithms include some extensions to commonly used supervised learning algorithms. These algorithms first try to model unidentified data, and then predict the identified data on this basis. Graph inference algorithm (Graph Inference) or Laplacian support vector machine (Laplacian SVM.) etc.

1.4 Reinforcement learning
Insert picture description here
In this learning mode, the input data is used as feedback to the model. Unlike the supervised model, the input data is only used as a way to check whether the model is right or wrong. Under reinforcement learning, the input data is directly fed back to the model. This must be adjusted immediately. Common application scenarios include dynamic systems and robot control. Common algorithms include Q-Learning and Temporal difference learning

2. Commonly used algorithms for machine learning

According to the similarity of the function and form of the algorithm, we can classify the algorithm, such as tree-based algorithm, neural network-based algorithm, and so on. Of course, the scope of machine learning is very large, and some algorithms are difficult to clearly classify into a certain category. For some categories, the algorithms of the same category can address different types of problems. Here, we try to classify commonly used algorithms in the easiest way to understand.

2.1 Regression algorithm (supervised learning)

Insert picture description here
  In most machine learning courses, regression algorithms are the first algorithms introduced. There are two reasons: 1. The regression algorithm is relatively simple, introducing it can make people smoothly migrate from statistics to machine learning. 2. The regression algorithm is the cornerstone of the following powerful algorithms. If you do not understand the regression algorithm, you cannot learn those powerful algorithms. There are two important subcategories of regression algorithms: linear regression and logistic regression.

Regression algorithm is a type of algorithm that tries to explore the relationship between variables using a measure of error. Regression algorithm is a powerful tool for statistical machine learning. In the field of machine learning, when people talk about regression, sometimes they refer to a type of problem and sometimes a type of algorithm, which often confuses beginners. Common regression algorithms include: Ordinary Least Square, Logistic Regression, Stepwise Regression, Multivariate Adaptive Regression Splines, and Locally Estimated Scatterplot Smoothing)

Linear regression is how to fit a straight line that best matches all my data? Generally use the "least squares method" to solve. The idea of ​​"least squares method" is this, assuming that the straight line we fit represents the true value of the data, and the observed data represents the value with error. In order to minimize the influence of errors, it is necessary to solve a straight line to minimize the sum of squares of all errors. The least squares method transforms the optimal problem into the problem of finding the extreme value of a function. In mathematics, the extreme value of the function generally uses the method of finding the derivative of 0. But this approach is not suitable for computers, it may not be solved, or it may be too computationally expensive.

Logistic regression is an algorithm very similar to linear regression, but, in essence, the type of problem handled by linear regression is inconsistent with logistic regression. Linear regression deals with numerical problems, that is, the final predicted result is a number, such as house prices. Logistic regression is a classification algorithm, that is to say, the prediction result of logistic regression is a discrete classification, such as judging whether the email is spam, and whether the user will click on the advertisement, etc.

In terms of implementation, logistic regression just adds a Sigmoid function to the calculation result of linear regression, and converts the numerical result into a probability between 0 and 1. (The image of the Sigmoid function is generally not intuitive. You only need to understand The larger the value, the closer the function is to 1, and the smaller the value is, the closer the function is to 0). Then we can make predictions based on this probability. For example, if the probability is greater than 0.5, the email is spam, or whether the tumor is malignant, etc. Intuitively speaking, logistic regression draws a classification line, and the classification line drawn by the logistic regression algorithm is basically linear (there are also logistic regressions that draw a nonlinear classification line, but such a model is processing a large amount of data The efficiency will be very low).

2.2 Regularization method

Insert picture description here
  The regularization method is an extension of other algorithms (usually regression algorithms), and the algorithm is adjusted according to the complexity of the algorithm. Regularization methods usually reward simple models and penalize complex algorithms. Common algorithms include: Ridge Regression, Least Absolute Shrinkage and Selection Operator (LASSO), and Elastic Net.

2.3 Instance-based algorithm

Write the picture here.
Insert picture description here
  Case-based algorithms are often used to model decision-making problems. Such models often first select a batch of sample data, and then compare the new data with the sample data based on some similarity. In this way, the best match is found. Therefore, instance-based algorithms are often referred to as "winner takes all" learning or "memory-based learning". Common algorithms include k-Nearest Neighbor (KNN), Learning Vector Quantization (LVQ), and Self-Organizing Map (SOM)

2.4 Decision tree algorithm

The decision tree algorithm uses a tree structure to establish a decision model based on the attributes of the data. The decision tree model is often used to solve classification and regression problems. Common algorithms include: Classification And Regression Tree (CART), ID3 (Iterative Dichotomiser 3), C4.5, Chi-squared Automatic Interaction Detection (CHAID), Decision Stump, Random Forest (Random Forest), multivariate Adaptive regression spline (MARS) and gradient boosting machine (Gradient Boosting Machine, GBM)

The general machine learning model considers at least two quantities: one is the dependent variable, which is the result we hope to predict. In this example, it is the judgment of whether Xiao Y is late or not. The other is the independent variable, which is the amount used to predict whether Little Y is late. Suppose I take time as an independent variable. For example, I found that all the days that Xiao Y is late are basically Fridays, and he is basically not late when it is not Friday. So I can build a model to simulate the probability of whether Little Y is late and whether the day is Friday. See the picture below:
Insert picture description here
  This picture is the simplest machine learning model, called a decision tree.

When we consider only one independent variable, the situation is simpler. If we add one more independent variable. For example, part of the situation when Xiao Y is late is when he is driving over (you can understand that his driving level is bad, or the road is blocked). So I can consider this information in association. Build a more complex model that contains two independent variables and one dependent variable. To be more complicated, Xiao Y’s lateness and the weather also have certain reasons. For example, when it rains, I need to consider three independent variables at this time.

If I want to be able to predict the specific time when Xiao Y is late, I can build a model of his time of lateness, the amount of rainfall, and the independent variables considered earlier. So my model can predict the value, for example, he might be a few minutes late. This can help me better plan my time to go out. In this case, the decision tree cannot be well supported, because the decision tree can only predict discrete values. We can use linear regression to build this model.

If I leave these model building processes to the computer. For example, input all the independent variables and dependent variables, and then let the computer generate a model for me. At the same time, let the computer give me a suggestion of whether I need to go out late, and a few minutes later based on my current situation. Then the process by which the computer performs these auxiliary decisions is the process of machine learning.

2.5 Bayesian method

Write picture description here
  Bayesian method algorithm is a kind of algorithm based on Bayes' theorem, mainly used to solve classification and regression problems. Common algorithms include: Naive Bayes algorithm, Averaged One-Dependence Estimators (AODE), and Bayesian Belief Network (BBN).

2.6 Core-based algorithm (supervised learning)

Write picture description here
  The most famous of kernel-based algorithms is the support vector machine (SVM). The kernel-based algorithm maps the input data to a high-order vector space. In these high-order vector spaces, some classification or regression problems can be solved more easily. Common kernel-based algorithms include: Support Vector Machine (SVM), Radial Basis Function (RBF), and Linear Discriminate Analysis (LDA), etc. Next, I will focus on SVM

Support vector machine SVM
  support vector machine algorithm is a classic algorithm that was born in the field of statistical learning and also shines in the field of machine learning.
  In a sense, the support vector machine algorithm is an enhancement of the logistic regression algorithm: by giving the logistic regression algorithm more stringent optimization conditions, the support vector machine algorithm can obtain a better classification boundary than the logistic regression. But if there is no certain kind of function technology, the support vector machine algorithm is at best a better linear classification technology.

However, by combining with the Gaussian "kernel", the support vector machine can express very complex classification boundaries, so as to achieve a good classification effect. "Kernel" is actually a special function. The most typical feature is that it can map a low-dimensional space to a high-dimensional space.

The SVM method uses a nonlinear mapping p to map the sample space to a high-dimensional or even infinite-dimensional feature space (Hilber space), so that the non-linearly separable problem in the original sample space is transformed into a feature space Linearly separable problem. Upgrading is to map samples to high-dimensional space. Under normal circumstances, this will increase the complexity of calculations and even cause "dimensionality disaster", so people rarely care about it. However, for classification and regression problems, it is very likely that the sample set that cannot be processed linearly in the low-dimensional sample space can be linearly divided (or regression) through a linear hyperplane in the high-dimensional feature space. The general upscaling will bring about the complexity of calculation. The SVM method cleverly solves this problem: applying the expansion theorem of the kernel function, there is no need to know the explicit expression of the nonlinear mapping; because it is established in the high-dimensional feature space The linear learning machine, compared with the linear model, not only hardly increases the complexity of calculation, but also avoids the "curse of dimensionality" to some extent. All this is due to the expansion and calculation theory of the kernel function.

Different kernel functions can be selected to generate different SVMs. The commonly used kernel functions are as follows:
  -K(x,y)=x·y
  -Polynomial kernel K(x,y)=[(x· y)+1]d
  -direction basis function K(x,y)=exp(-|xy| 2/d 2)
  -layer neural network core function K(x,y)=tanh(a(x·y)+ b)

As shown in the figure below, how do we divide a circular classification boundary on a two-dimensional plane? It may be difficult on a two-dimensional plane, but through the "core" you can map the two-dimensional space to the three-dimensional space, and then use a linear plane to achieve a similar effect. In other words, the nonlinear classification boundary line divided by the two-dimensional plane can be equivalent to the linear classification boundary line of the three-dimensional plane. Therefore, we can achieve the effect of nonlinear division in the two-dimensional plane by performing simple linear division in the three-dimensional space.
Insert picture description here
Classified data
  Support vector machine is a machine learning algorithm with a strong mathematical component (in contrast, neural networks have biological science components). In the core step of the algorithm, there is a step to prove that mapping data from low-dimensional to high-dimensional will not bring about the increase in the final computational complexity. Therefore, through the support vector machine algorithm, the calculation efficiency can be maintained, and a very good classification effect can be obtained. Therefore, support vector machines have been occupying the most core position in machine learning in the late 1990s, and have basically replaced neural network algorithms. It is only now that neural networks have re-emerged through deep learning that there has been a subtle balance shift between the two.

2.7 Clustering algorithm (unsupervised learning)

Insert picture description here
  Clustering, like regression, sometimes people describe a type of problem, and sometimes a type of algorithm. Clustering algorithms usually merge the input data according to the central point or hierarchical way. To put it simply, the clustering algorithm is to calculate the distance in the population and divide the data into multiple groups according to the distance. All clustering algorithms try to find the internal structure of the data in order to classify the data according to the greatest common point. . Common clustering algorithms include k-Means algorithm and Expectation Maximization (EM).

2.8 Association rule learning

Write picture description here
  Association rule learning finds useful association rules in a large number of multivariate data sets by finding the rules that best explain the relationship between data variables. Common algorithms include Apriori algorithm and Eclat algorithm.

2.9 Artificial Neural Network ANN (Supervised Learning)

Neural network (also known as artificial neural network, ANN) algorithm is a very popular algorithm in the machine learning world in the 1980s, but declined in the middle of the 1990s. Now, with the trend of "deep learning", neural networks have reinstalled and returned to become one of the most powerful machine learning algorithms.

The birth of neural networks originated from the study of the working mechanism of the brain. Early biological scholars used neural networks to simulate the brain. Machine learning scholars use neural networks to conduct machine learning experiments and found that the effects of visual and speech recognition are quite good. After the birth of the BP algorithm (a numerical algorithm that accelerates the training process of neural networks), the development of neural networks has entered a boom. One of the inventors of the BP algorithm is Geoffrey Hinton, the machine learning man introduced earlier.

Artificial neural network is a huge branch of machine learning, there are hundreds of different algorithms, usually used to solve classification and regression problems. (Deep learning is one of the algorithms, we will discuss it separately). Important artificial neural network algorithms include: Perceptron Neural Network, Back Propagation, Hopfield Network, Self-Organizing Map ( Self-Organizing Map, SOM). Learning Vector Quantization (LVQ)

Specifically, what is the learning mechanism of neural networks? Simply put, it is decomposition and integration. In the famous Hubel-Wiesel experiment, scholars studied the visual analysis mechanism of cats like this.
Write picture description here
  For example, a square is broken down into four polylines to enter the next layer of visual processing. Each of the four neurons processes a polyline. Each polyline is further broken down into two straight lines, and each straight line is broken down into two black and white faces. As a result, a complex image becomes a large number of details that enter the neuron, and the neuron is processed and then integrated, and finally it is concluded that the square is seen. This is the mechanism of visual recognition in the brain and the mechanism of neural network work.

Let us look at the logical architecture of a simple neural network. In this network, it is divided into input layer, hidden layer, and output layer. The input layer is responsible for receiving the signal, the hidden layer is responsible for the decomposition and processing of the data, and the final result is integrated into the output layer. A circle in each layer represents a processing unit, which can be thought of as simulating a neuron. Several processing units form a layer, and several layers form a network, which is a "neural network".
Write picture description here
  The figure above describes a neural network with the most mature Shallow structure (a structure containing only a single layer of hidden neurons). The first layer is the input layer, the second layer is called the hidden layer, and the last layer is the output layer. Neurons start from the lower layer and are connected by a directed edge of the upper neuron, and each edge has its own weight. Each neuron is a calculation unit. For example, in Feed-forward neural network, except for the input layer neuron, each neuron is a calculation unit, which can be represented by a calculation function f(). The specific form of the function You can define it yourself. Nowadays, perceptron computing neurons are more commonly used. If you know something about perceptrons, it will be much easier to understand. The energy value of the neuron can be calculated at this time. When the value exceeds a certain threshold, the state of the neuron will change. The neuron has only two states, activated or inactive. In the actual artificial neural network, a probabilistic way is generally used to indicate whether the neuron is in the activated state, which can be represented by h(f), f represents the energy value of the neuron, and h(f) represents the energy value What is the probability that the state of the neuron changes, and the greater the energy value, the higher the probability of being in the active state. At this part, you have come into contact with a few basic terms about neural networks, which are represented by more standardized symbols below. The activation value of a neuron (activations) f(), which means calculating the energy value of the neuron, and the activation state of the neuron h(f), h represents the activation function.

In the neural network, each processing unit is in fact a logistic regression model. The logistic regression model receives the input from the upper layer and transmits the predicted result of the model as the output to the next layer. Through this process, the neural network can complete very complex nonlinear classification.
  The following figure will demonstrate a well-known application of neural networks in the field of image recognition. This program is called LeNet, which is a neural network built on multiple hidden layers. LeNet can recognize a variety of handwritten digits, and achieve high recognition accuracy and good robustness.
Write picture description here
  The square at the bottom right shows the image input to the computer, and the red word "answer" above the square shows the output of the computer. The three vertical image columns on the left show the output of the three hidden layers in the neural network. It can be seen that as the layers deepen, the deeper the layers, the lower the details, for example, the basic processing of layer 3 has been It's the details of the line. The inventor of LeNet is Yann LeCun, the big cow of machine learning introduced earlier.

About 20 or 30 years ago, Neural Network used to be a particularly hot direction in the ML field, but it did slowly fade out. In the 1990s, the development of neural networks entered a bottleneck period. The main reason is that despite the acceleration of the BP algorithm, the neural network training process is still very difficult. Therefore, in the late 1990s, the support vector machine (SVM) algorithm replaced the neural network.
  
  The reasons include the following aspects:
  1. It is easier to train and the parameters are difficult to determine;
  2. The training speed is relatively slow, and the effect is not better than other methods when there are fewer levels (less than or equal to 3);

So there is about 20 years of time between, and the neural network has been paid little attention. This time is basically dominated by SVM and Boosting algorithms. However, Hinton persisted and eventually (with Bengio, Yann.lecun, etc.) proposed a practical Deep Learning framework.

2.10 Deep learning

Write picture description here
  Although the four words of deep learning sound quite lofty, the concept is very simple, that is, the traditional neural network has developed to a situation with multiple hidden layers.

As mentioned above, neural networks have been dead for a while since the 1990s. But Geoffrey Hinton, the inventor of the BP algorithm, has not given up on neural networks. Since the neural network is expanded to more than two in the hidden layer, its training speed will be very slow, so the practicality has been lower than that of the support vector machine. In 2006, Geoffrey Hinton published an article in the scientific journal "Science", which demonstrated two views:

1. The neural network with multiple hidden layers has excellent feature learning capabilities, and the learned features have a more essential characterization of the data, which is conducive to visualization or classification;
2. The difficulty of training deep neural networks can be achieved by "layer by layer". Initialization" to effectively overcome.

Through this discovery, not only solves the computational difficulty of neural networks, but also illustrates the superiority of deep neural networks in learning. Since then, neural networks have once again become the mainstream and powerful learning technology in the machine learning world. At the same time, neural networks with multiple hidden layers are called deep neural networks, and learning research based on deep neural networks is called deep learning.

Due to the important nature of deep learning, it has received great attention in all aspects. According to the time axis, there are the following four landmark events worth mentioning:

1. In June 2012, the "New York Times" disclosed the Google Brain project. This project is co-led by Andrew Ng and the inventor of Map-Reduce, Jeff Dean. It uses a parallel computing platform with 16000 CPU Cores to train a kind of "deep layer". The machine learning model of "neural network" has achieved great success in the fields of speech recognition and image recognition. Andrew Ng is the big cow of machine learning introduced at the beginning of the article.
  2. In November 2012, Microsoft publicly demonstrated a fully automatic simultaneous interpretation system at an event in Tianjin, China. The speakers used English to speak, and the background computer automatically completed speech recognition, English-Chinese machine translation, and Chinese speech in one go. The synthesis, the effect is very smooth, and the key technology
  behind it is deep learning; 3. In January 2013, at Baidu’s annual meeting, founder and CEO Robin Li announced the establishment of Baidu Research Institute in a high-profile manner. The first focus is on depth Study, and set up the Institute of Deep Learning (IDL) for this purpose.
  4. In April 2013, the "MIT Technology Review" magazine listed deep learning as the top ten breakthrough technology in 2013 (Breakthrough Technology).

The development boom of deep learning

Hinton, a leader in the field of neural network research, proposed the neural network Deep Learning algorithm in 2006, which greatly improved the capabilities of neural networks and challenged support vector machines. Deep Learning assumes that the neural network is multi-layered. First, the Restricted Boltzmann Machine (unsupervised learning) is used to learn the structure of the network, and then the weights of the network are learned through Back Propagation (supervised learning).

Deep learning algorithms are the development of artificial neural networks. It has gained a lot of attention recently, and deep learning attempts to build a much larger and more complex neural network. Many deep learning algorithms are semi-supervised learning algorithms, which are used to process large data sets with a small amount of unidentified data. Common deep learning algorithms include: Restricted Boltzmann Machine (RBN), Deep Belief Networks (DBN), Convolutional Network (Convolutional Network), and Stacked Auto-encoders.

In short, deep learning can get a better representation of the features of the data. At the same time, because the model has many levels and parameters, and the capacity is sufficient, the model has the ability to represent large-scale data, so it is not obvious for the features of image and voice (requires manual design And many problems that have no intuitive physical meaning) can achieve better results on large-scale training data. In addition, from the perspective of pattern recognition features and classifiers, the deep learning framework combines features and classifiers into one framework, uses data to learn features, and reduces the huge workload of manually designing features in use (this is the current industry The area where engineers put the most effort), therefore, not only the effect can be better, but also there are many conveniences to use.

The similarities and differences between Deep Learning and traditional neural networks:

Similarities: Deep Learning uses a similar hierarchical structure of neural networks. The system consists of a multi-layer network consisting of an input layer, a hidden layer (multi-layer), and an output layer. Only the nodes of adjacent layers are connected. There is no connection between layer nodes, and each layer can be regarded as a Logistic Regression model; this layered structure is closer to the structure of the human brain.

Difference: In order to overcome the problems in neural network training, DL uses a very different training mechanism from neural network. In traditional neural networks, Back Propagation is used. Simply put, iterative algorithms are used to train the entire network, randomly set initial values, calculate the output of the current network, and then go according to the difference between the current output and the label. Change the parameters of the previous layers until convergence (the whole is a gradient descent method). And DeepLearning as a whole is a Layer-Wise training mechanism. The reason for this is that if the Back Propagation mechanism is used, for a Deep Network (above 7 layers), the residual propagation to the frontmost layer has become too small, and the so-called Gradient Diffusion appears.

2.11 Reduced dimensionality algorithm (unsupervised learning)

Write picture description here
  Like the clustering algorithm, the dimensionality reduction algorithm attempts to analyze the internal structure of the data, but the dimensionality reduction algorithm is an unsupervised learning method that attempts to use less information to summarize or interpret the data. This type of algorithm can be used to visualize high-dimensional data or to simplify data for supervised learning. Common algorithms include: Principle Component Analysis (PCA), Partial Least Square Regression (PLS), Sammon mapping, Multi-Dimensional Scaling (MDS), Projection Pursuit (Projection Pursuit) Wait.

Its main feature is to reduce the data from high-dimensional to low-dimensional. Here, the dimension actually represents the size of the feature quantity of the data. For example, the house price includes the four features of the length, width, area and number of rooms of the house, that is, data with a 4-dimensional dimension. It can be seen that the length and width actually overlap with the information represented by the area, for example, area=length×width. Through the dimensionality reduction algorithm, we can remove redundant information and reduce the features to two features of area and number of rooms, that is, compressing data from 4 dimensions to 2 dimensions. So we reduce the data from high-dimensional to low-dimensional, which is not only conducive to presentation, but also can bring acceleration in calculation.

The dimensions reduced in the dimensionality reduction process just mentioned belong to the level that is visible to the naked eye, and at the same time compression will not bring information loss (because the information is redundant). If it is not visible to the naked eye or there are no redundant features, the dimensionality reduction algorithm can also work, but this will bring some information loss. However, the dimensionality reduction algorithm can be proved mathematically, that the information of the data is preserved to the greatest extent from the high-dimensional compression to the low-dimensional. Therefore, there are still many benefits to using dimensionality reduction algorithms.

The main function of dimensionality reduction algorithms is to compress data and improve the efficiency of other machine learning algorithms. Through the dimensionality reduction algorithm, data with thousands of features can be compressed to several features. In addition, another benefit of the dimensionality reduction algorithm is the visualization of the data, for example, the 5-dimensional data is compressed to 2 dimensions, and then it can be visualized with a two-dimensional plane. The main representative of the dimensionality reduction algorithm is the PCA algorithm (that is, the principal component analysis algorithm).

2.12 Integrated algorithm

Write picture description here
  The ensemble algorithm uses some relatively weak learning models to independently train the same samples, and then integrates the results for overall prediction. The main difficulty of the integrated algorithm lies in which independent weaker learning models are integrated and how to integrate the learning results. This is a very powerful algorithm and also very popular. Common algorithms include: Boosting, Bootstrapped Aggregation (Bagging), AdaBoost, Stacked Generalization (Stacked Generalization, Blending), Gradient Boosting Machine (GBM), Random Forest (Random Forest).

Guess you like

Origin blog.csdn.net/qq_30868737/article/details/108713684