Ten novice machine learning algorithm tour

Ten novice machine learning algorithm tour

Machine learning, there is a "known as No as Free Lunch " theorem.
In simple terms, this theorem is particularly associated with the supervised learning, it stated that there is no universal algorithm, it is to use an algorithm can solve every problem.

For example, neural networks can not always say better than the decision tree, and vice versa. There are many factors at work, such as size and structure of the data sets.

Well, if I encounter a problem, you should first try a variety of different algorithms to solve problems while using the reservation "test set" data to evaluate the performance of the algorithm and choose the best, and then search for the best GitHub code changes ha ha ha ha.

Of course, if you need to clean the house, you can use the vacuum cleaner, broom or mop. You would not use a shovel, right?
So try the algorithm must be suitable problem to be solved, this is the correct source selected machine learning tasks.

image1

The basic principle

There is a universal principle is the basis for predictive modeling all supervised machine learning algorithms.

Machine learning algorithm is described as a learning objective function f (f) , which is preferably a function of the input variables X (X) mapped to the output variable Y Y = f X (Y):Y = f(X)

This is a simple learning task, given that we want a new input variable X (X) in the case of, for Y (Y) prediction. But I do not know the function f (f)

The most common type of machine learning is learning map Y = f X Y = f(X) may be new X X predicted Y Y , the goal is the most accurate predictions possible.

For entering the field of machine learning, no novices the basics of machine learning, I have ten common machine learning algorithms do a brief introduction.

1- Linear Regression

Linear regression is probably one of the most famous statistical and machine learning algorithms and most easy to understand.

The main error or minimized model to make predictions about as accurate as possible, but at the expense of interpretability. We will learn in many different areas (including statistics).

Linear regression is a representation of the equation, which is called by finding coefficients B (B) , the input variables described x (x) and output variables Y (Y) line relationship between.

image2

E.g: Y = B 0 + B 1 x y = B0 + B1 * x FML, this is not a function? Haha

Given input x x , we will forecast Y Y , target linear regression coefficient learning algorithm is to find B 0 B0 and B 1 B1 value, e.g. OLS and gradient descent optimization for linear algebra solutions.

2-LOGISTIC regression

Logistic regression is another machine learning technique from the field of statistics "borrowing". It is the preferred method for binary classification (class value having two problems) is.

Logistic regression is similar to linear regression, because the goal is to find a weight value for each input variable coefficients. Different linear regression, using the predicted output is called a logarithmic function of the nonlinear function transform.

Logic function looks like a large S S , it is converted to any value in the range 0 to 1. Because we can apply rules to output logic function is 0 and 1 (e.g., if the IF is less than 0.5, the output 1) and the prediction class value.

image3

As with linear regression, when removing the independent output variable attributes and associated attributes, the better logistic regression. This is a quick and effective learning binary classification model

3- Linear Discriminant Analysis

Logistic regression is limited to two types of classification of the traditional classification algorithms. If the multi-classification algorithm is linear discriminant analysis (LDA) it is a very important algorithms.

LDA representation is very simple, it is composed by the statistical properties of the data, these properties are calculated for each category. For a single input variable, which includes:

  1. The average value of each category.

  2. Calculating the variance in all categories

image4

It is predicted by calculating a score for each class region and has a maximum predicted categories.
The algorithm is provided: data has a Gaussian distribution (bell curve), it is best to remove outliers from the data before the operation.

4- classification and regression trees

A decision tree is a predictive modeling important type of machine learning algorithm.
Representation of the decision tree model is a binary tree. This is a binary tree from algorithms and data structures, nothing fancy. Each node represents an input variable (x) and the variable dividing point (assuming a numeric variable).

image5

Tree comprises leaf nodes for predicting the output variable Y (Y) . And the value of the leaf node output class be predicted until the leaf node is reached by traversing the tree split.

Tree learning fast, speed is also very quick to make predictions. They are usually very accurate for many problems, the data does not need to do any special pretreatment.

5- Naive Bayes

Naive Bayes (Naive Bayes) is a simple but powerful predictive modeling algorithms.

: The model consists of two types of probability composition can be calculated directly from your training data illustrating
probabilities for each category 1);
2) the conditional probability of each category of each given value of x.

Began to calculate the probability model using Bayes' theorem can be used to predict new data. When your data is a real value, usually assume a Gaussian distribution (bell curve), so that you can easily estimate these probabilities.

image6

Naive Bayes is known as simple, because it assumes that each input variables are independent. This is a strong assumption for real data is unrealistic, however, this technique is very effective for a large number of complex issues.

6 - K neighbors

KNN algorithm is very simple and very effective. KNN model represents the entire training data set. Simple, right?

We can predict the new data points by searching the entire training set of the K most similar case summary and examples of the K output variables.
For regression problems, this may be the average output variables for classification problems, this may be the most common category value.

The trick is how to determine the similarity between the data instances. If you have the same proportion of the properties (e.g., all distance data), the easiest method is to use Euclidean distance, you can calculate a number directly from the difference between each of the input variables.

image7

KNN may require a lot of memory or space to store all the data because it treats all data traversed. You can also time as training data in order to maintain the accuracy of the prediction.

The concept of distance or closeness may be broken down into a very high dimension (many input variables), this could have a negative impact on the problem of algorithm performance. So it is vital to input data.

7- Vector Learning

Recent K drawback is the need for the entire training data set. Learning vector quantization algorithm (referred LVQ) is an artificial neural network algorithm allows to select the number of training examples required.

image8

LVQ is a representation of a set of vectors. These are at the beginning of the beginning of a randomly selected for the training data set is best summed up in a number of iterative learning algorithm.
After learning, it can be used like K-nearest neighbor, the use of data to predict. The distance between the vector and each new data calculation, and data can find the most similar vector (vector best match). Then returns the best matching class value as a prediction. Remember data normalization, get better effect.

8- SVM

SVM is probably one of the most popular machine learning algorithms.

Hyperplane variable is the line dividing the input space.
In the SVM, to select a hyperplane by category (category 0 or Type 1) is preferably separated from the input variable space points.
In the two-dimensional map, you can be visualized as a line, and assumes that all inputs can be completely separated from this line. SVM learning algorithm to find the hyperplane optimal separation coefficient classes.

image9

The distance between the hyperplane and the nearest data point is called the margin. You can separate these two categories of best or optimal hyperplane is the largest margin line.
Only the configuration of these points and the definition hyperplane classifier. These points are called support vectors.
In practice, the use of optimization algorithms to find the coefficients of headroom to maximize value.

SVM is probably one of the most powerful ready to use classifier, a high frequency of use.

9-BAGGING and Random Forests

Random Forests are the most popular and one of the most powerful algorithms of machine learning. This is called clustering or BAGGING Bootstrap integrated machine learning algorithms.

You need a large number of data samples, calculating the average, and then averaged for all the mean, in order to better estimate the true average.

In bagging, use the same method, but for the entire statistical model to estimate (the most common is a decision tree). Obtaining multiple samples of the training data, then build models for each data sample. When you need to predict the new data, each model would predict, and are averaged to predict better estimate of the true output value.

image10

Random Forests are adjusted to such a method, the method will create the decision tree, instead of selecting the best split point.

Therefore, compared with the original model to model each data sample created greater difference. Will combine their predictions can better estimate the true basis of the output value.

10-BOOSTING和ADABOOST

image11

Boosting is an integrated technology, try to create a strong classifier from several weak classifiers. This is the first model to try to correct the errors by building a model from the training data, and then create a second model to complete. Adding models to predict until the perfect training set or add the maximum number of models so far.

AdaBoost is the first truly successful enhancement algorithms developed for the binary classification. This is the best starting point for understanding enhanced. Modern enhancement method based on AdaBoost, most notably the stochastic gradient enhanced machine.

You can see the details of this video .

AdaBoost used in conjunction with a decision tree.
After creating the first tree, the tree will be used in the performance of each training instance weighted to create a tree should be noted that the degree of concern for each training instance.
Right difficult to predict higher weight training data, while easy to predict right instances lower weight. In order to create models, each model will be updated on an instance of heavy training weights, these weights will affect the learning sequence in a tree executed. After completion of all the trees constructed, new data to predict, and weighted according to the accuracy of the performance of each tree training data.

Since the algorithm put a lot of effort in terms of correcting errors, so removing outliers and data de-noising is very important.

to sum up

Beginners face a variety of machine learning algorithms, usually a typical question to ask is "Which algorithm should I use?" The answer to this question depends on many factors, including:
(1) the size, quality and nature of the data;
(2) the available computing time;
; urgency (3) task
(4) how to handle the data.

Translate

Published 90 original articles · won praise 149 · Views 250,000 +

Guess you like

Origin blog.csdn.net/sinat_36458870/article/details/104302249