A brief introduction to machine learning classification

1. Classification from the perspective of machine learning problems

Let's start with the classification of the machine learning problem itself, we can divide it into the following types of algorithms.


Supervised learning

A large part of the problems in machine learning belong to the category of supervised learning, which is simply explained in colloquialism. In this kind of problem, in a given training sample, the input x of each sample corresponds to a certain result y, and we need to train the A model (mathematically speaking is a mapping relationship f of x→y), given an unknown sample x', we can predict the result y'.

If the prediction result here is a discrete value (in many cases it is a category type, such as spam/normal mail in the mail classification problem, such as whether a user will/will not buy a certain product), then we call it a classification problem; If the prediction results are continuous values ​​(such as house prices, stock prices, etc.), then we call it a regression problem.

There are a series of machine learning algorithms used to solve supervised learning problems , such as the most classic Naive Bayes for classification problems, logistic regression, support vector machines, etc.; for example, linear regression for regression problems and so on.


unsupervised learning

There is another type of problem, the samples given to us do not give a "label/standard answer", which is a series of samples. What we need to do is to extract general rules from some samples. This is called unsupervised learning. A range of machine learning algorithms, including association rules and clustering algorithms, fall into this category.


semi-supervised learning

Some of the training data given by this type of problem are labeled, and some are not. We want to learn the organization structure of the data, but also make corresponding predictions. The corresponding machine learning algorithms for such problems include Self-Training, Transductive Learning, Generative Model and so on.


In general, the first two types of problems are the most common, and some machine learning algorithms for the first two types of problems are as follows:


Algorithm classification


Second, from the functional point of view of the algorithm classification

We can also classify machine learning algorithms in terms of their commonalities (such as functions, how they work). Below we classify them according to their commonalities. However, it should be noted that our following classification methods may have a strong tendency towards classification and regression, and these two types of problems are also the most frequently encountered.


Regression Algorithms

Internet pictures, intrusion and deletion

Regression algorithms are a class of algorithms that get the best combination of input features by minimizing the difference between the predicted value and the actual outcome value. For continuous value prediction, there are linear regression, etc., and for discrete value/category prediction, we can also regard logistic regression as a kind of regression algorithm. Common regression algorithms are as follows:

  • Ordinary Least Squares Regression (OLSR)
  • Linear Regression
  • Logistic Regression
  • Stepwise Regression
  • Locally Estimated Scatterplot Smoothing (LOESS)
  • Multivariate Adaptive Regression Splines (MARS)


Instance-based Algorithms

Internet pictures, intrusion and deletion

The so-called instance-based algorithm here, I refer to the model we finally built, which still has a strong dependence on the original data sample instance. When making prediction decisions, such algorithms generally use some kind of similarity criterion to compare the similarity between the sample to be predicted and the original sample, and then give the corresponding prediction result. Common instance-based algorithms are:

  • k-Nearest Neighbour (kNN)
  • Learning Vector Quantization (LVQ)
  • Self-Organizing Map (SOM)
  • Locally Weighted Learning (LWL)


Decision Tree Algorithms

Internet pictures, intrusion and deletion

Decision tree algorithms will build a tree containing many decision paths based on the original data features. The prediction phase chooses the path to make a decision. Common decision tree algorithms include:

  • Classification and Regression Tree (CART)
  • Iterative Dichotomiser 3 (ID3)
  • C4.5 and C5.0 (different versions of a powerful approach)
  • Chi-squared Automatic Interaction Detection (CHAID)
  • M5
  • Conditional Decision Trees


Bayesian Algorithms

The Bayesian algorithm mentioned here refers to the algorithm that implicitly uses the Bayesian principle in classification and regression problems. include:

  • Naive Bayes
  • Gaussian Naive Bayes
  • Multinomial Naive Bayes
  • Averaged One-Dependence Estimators (AODE)
  • Bayesian Belief Network (BBN)
  • Bayesian Network (BN)


Clustering Algorithms

Internet pictures, intrusion and deletion

What the clustering algorithm does is to cluster the input samples into "data clusters" around some centers to discover some regularities in the data distribution structure. Commonly used clustering algorithms include:

  • k-Means
  • Hierarchical Clustering
  • Expectation Maximization (EM)


Association Rule Learning Algorithms

Internet pictures, intrusion and deletion

Association rule algorithms are a class of algorithms that try to extract the rules that best explain the association between observed training samples, that is, to obtain knowledge of dependencies or associations between an event and other events, common association rule algorithms have:

  • Apriori algorithm
  • Eclat algorithm


Artificial Neural Network Algorithms

Internet pictures, intrusion and deletion

This is a class of algorithms inspired by the way neurons in the human brain work. One thing that needs to be mentioned is that I singled out "deep learning". The artificial neural network mentioned here is biased towards more traditional perception algorithms, mainly including:

  • Perceptron
  • Back-Propagation
  • Radial Basis Function Network (RBFN)


Deep Learning Algorithms

Internet pictures, intrusion and deletion

Deep learning is a very popular field of machine learning in recent years. Compared with the artificial neural network algorithms listed above, it usually has a deeper level and a more complex structure. Such algorithms are widely used in computer vision.

  • Deep Boltzmann Machine (DBM)
  • Deep Belief Networks (DBN)
  • Convolutional Neural Network (CNN)
  • Stacked Auto-Encoders


Dimensionality Reduction Algorithms

Internet pictures, intrusion and deletion

To a certain extent, the dimensionality reduction algorithm is actually similar to clustering, because it is also trying to discover the inherent structure of the original training data, but the dimensionality reduction algorithm is trying to summarize with less information (lower dimensional information) and describe most of the original message.

Interestingly, dimensionality reduction algorithms generally play a significant role in data visualization or reducing data computing space. As a machine learning algorithm, it is often used to process data first, and then pour it into other machine learning algorithms to learn. The main dimensionality reduction algorithms include:

  • Principal Component Analysis (PCA)
  • Principal Component Regression (PCR)
  • Partial Least Squares Regression (PLSR)
  • Sammon Mapping
  • Multidimensional Scaling (MDS)
  • Linear Discriminant Analysis (LDA)
  • Mixture Discriminant Analysis (MDA)
  • Quadratic Discriminant Analysis (QDA)
  • Flexible Discriminant Analysis (FDA)


Ensemble Algorithms

Internet pictures, intrusion and deletion

Strictly speaking, this is not a machine learning algorithm, but more like an optimization method/strategy, which usually combines multiple simple weak machine learning algorithms to make more reliable decisions. Taking the classification problem as an example, the intuitive understanding is that the classification of a single classifier may be wrong and unreliable, but if multiple classifiers vote, the reliability will be much higher. Commonly used model fusion enhancement methods include:

  • Random Forest
  • Boosting
  • Bootstrapped Aggregation (Bagging)
  • AdaBoost
  • Stacked Generalization (blending)
  • Gradient Boosting Machines (GBM)
  • Gradient Boosted Regression Trees (GBRT)


Three, machine learning algorithm decision tree

In order to achieve the purpose of quickly selecting the algorithm, a decision tree is specially made for the commonly used algorithm. Each set of conditions corresponds to a path, and some relatively suitable solutions can be found, as shown in the following figure:


First of all, if the sample size is very small, in fact, all machine learning algorithms have no way to "learn" general rules and patterns from it, so getting more data is king. Then, according to the problem of supervised/unsupervised learning and continuous value/discrete value prediction, it is divided into four method categories: classification, clustering, regression and dimension reduction. Each category has different processing methods according to the specific situation. .

Author: You Paiyun
Link : https://www.zhihu.com/question/27306416/answer/281031045
Source: Zhihu The
copyright belongs to the author. For commercial reprints, please contact the author for authorization, and for non-commercial reprints, please indicate the source.

Naive Bayes Classification

Naive Bayes classification is a classification method based on Bayes theorem and the assumption of independence of feature conditions. It originated from classical mathematical theory and has a stable mathematical foundation and classification efficiency. It is a very simple classification algorithm, and of course simplicity is not necessarily bad. By calculating the probability of occurrence of each category for the given item to be classified, to determine which category the item to be classified belongs to, and in the absence of redundant conditions, Naive Bayesian classification will select the probability under known conditions. largest category.

The essence of Bayesian classification algorithm is the formula for calculating conditional probability. Under the condition that event B occurs, the probability of event A occurring is represented by P(A | B).

<img src="https://pic1.zhimg.com/50/v2-aea8116a5c34605b9cc4de92d247f426_hd.jpg" data-caption="" data-size="normal" data-rawwidth="501" data-rawheight="322" class="origin_image zh-lightbox-thumb" width="501" data-original="https://pic1.zhimg.com/v2-aea8116a5c34605b9cc4de92d247f426_r.jpg">

The probability of P(A|B) is

<img src="https://pic4.zhimg.com/50/v2-de5b63eccef28c04015322ed894e4653_hd.jpg" data-caption="" data-size="normal" data-rawwidth="143" data-rawheight="43" class="content_image" width="143">


In daily applications, we can often directly obtain P(A|B), but it is difficult to obtain P(B|A) directly. By Bayes' theorem, we can obtain P(B|A) through P(A|B). ).

The formal definition of Naive Bayes classification is as follows:

<img src="https://pic4.zhimg.com/50/v2-ba7a2c453ef3f3378f9fd0872b3c7a70_hd.jpg" data-caption="" data-size="normal" data-rawwidth="683" data-rawheight="199" class="origin_image zh-lightbox-thumb" width="683" data-original="https://pic4.zhimg.com/v2-ba7a2c453ef3f3378f9fd0872b3c7a70_r.jpg">

The Naive Bayes algorithm is very effective in performing text classification and other tasks. For example, the Naive Bayes algorithm is often used in filtering and classifying spam.

SVM algorithm

Support Vector Machine (SVM for short) is a supervised learning method that can be widely used in statistical classification and regression analysis. Support vector machine is a generalized linear classifier, which can minimize the empirical error and maximize the geometric edge region at the same time, so the support vector machine is also called the maximum edge region classifier.

At the same time, the support vector machine maps the vectors to a higher dimensional space, and a maximum interval hyperplane is established in this space. Two parallel hyperplanes are built on both sides of the hyperplane separating the data, and the separating hyperplane maximizes the distance between the two parallel hyperplanes. It is assumed that the larger the distance or gap between parallel hyperplanes, the smaller the total error of the classifier.

<img src="https://pic3.zhimg.com/50/v2-2b8624274f9c2f0a0318a7482aa3574e_hd.jpg" data-caption="" data-size="normal" data-rawwidth="485" data-rawheight="523" class="origin_image zh-lightbox-thumb" width="485" data-original="https://pic3.zhimg.com/v2-2b8624274f9c2f0a0318a7482aa3574e_r.jpg">

Although the SVM algorithm has problems that are difficult to train and explain, it performs very well in nonlinear separable problems, and the SVM algorithm is often selected in nonlinear separable problems.

KNN-based algorithm

K-Nearest Neighbor algorithm, referred to as KNN (k-Nearest Neighbor), it is also a relatively simple classification and prediction algorithm. For selecting the K training data most similar to the data to be classified and predicted, the results or classification labels of the data to be classified and predicted are obtained by averaging or taking the mode of the results or classification labels of the K data.

<img src="https://pic3.zhimg.com/50/v2-90dade7d278aaa6a60262c1228fa08c1_hd.jpg" data-caption="" data-size="normal" data-rawwidth="190" data-rawheight="171" class="content_image" width="190">

The K-nearest neighbor algorithm is shown in the figure above. There are two different types of sample data, which are represented by small blue squares and small red triangles, and the data marked by the green circle in the middle of the figure is the data to be classified. . Without knowing which category the green data in the middle belongs to (small blue square or small red triangle), we can judge from its adjacent samples.

If K=3, the nearest 3 neighbors of the green dot are 2 small red triangles and 1 small blue square, and the minority belongs to the majority. Based on statistical methods, it is determined that the green point to be classified belongs to the class of red triangles. .

If K=5, the nearest 5 neighbors of the green dot are 2 red triangles and 3 blue squares, or the minority belongs to the majority. Based on statistical methods, it is determined that the green point to be classified belongs to the blue square one type.

As we can see from the above, when it is impossible to determine which category the current to-be-classified point belongs to in the known classification, we can look at its location characteristics according to statistical theory, measure the weight of its surrounding neighbors, and put the It is classified (or assigned) to the category with a larger weight, which is the core idea of ​​the K-nearest neighbor algorithm.

The KNN algorithm is also simpler than other algorithms, and is easy to understand and implement without parameter estimation and training. It is suitable for classification of rare events and multi-classification problems, in which KNN algorithm performs better than SVM.

Artificial Neural Network Algorithm

Artificial neural network, short for neural network or neural-like network, is a mathematical model or computational model that imitates the structure and function of biological neural network, and is used to estimate or approximate functions. Neural networks are calculated by connecting a large number of artificial neurons. In most cases, artificial neural network can change the internal structure on the basis of external information, which is an adaptive system.

The following figure is a schematic diagram of the artificial neural network. The artificial neural network consists of many layers. The first layer is called the input layer, the last layer is called the output layer, and the middle layer is called the hidden layer, and each layer has many nodes. Nodes are connected by edges, and each edge has a weight. For text, the input value is each character, and for images, the input value is each pixel.


<img src="https://pic1.zhimg.com/50/v2-ffeb59eb5b88142eeb277c7dfd7b972f_hd.jpg" data-caption="" data-size="normal" data-rawwidth="599" data-rawheight="375" class="origin_image zh-lightbox-thumb" width="599" data-original="https://pic1.zhimg.com/v2-ffeb59eb5b88142eeb277c7dfd7b972f_r.jpg">


How do artificial neural networks work?

1. Forward propagation: For an input value, the output of the previous layer is calculated with the weight of the next layer, and the offset value of the latter layer is added to obtain the output value of the latter layer. The output value is passed to the next layer as a new input value, and the final output value is passed down layer by layer.

2. Backpropagation: Forward propagation will get the predicted value, but this predicted value is not necessarily the real value. The function of backpropagation is to correct the error, and correct the weight and bias of the forward propagation by comparing with the real value. .

Artificial neural networks have shown excellent performance in various application scenarios such as speech, pictures, videos, and games, but there is a problem that a large amount of data is required for training to improve accuracy.


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325865073&siteId=291194637