Machine Learning: Top 10 Machine Learning Algorithms

Machine learning algorithms can be roughly divided into three categories:

  • Supervised learning algorithm (Supervised Algorithms) : During the supervised learning training process, a pattern (function/learning model) can be learned or established from the training data set, and new instances can be inferred based on this pattern. The algorithm requires specific inputs/outputs, and first you need to decide what kind of data to use as an example. For example, a handwritten character in a text recognition application, or a line of handwritten text. The main algorithms include neural network, support vector machine, nearest neighbor method, naive Bayes method, decision tree, etc.
  • Unsupervised Learning Algorithms (Unsupervised Algorithms) : These types of algorithms do not have a specific target output and the algorithm divides the dataset into different groups.
  • Reinforcement learning algorithm (Reinforcement Algorithms) : Reinforcement learning has strong universality and is mainly based on decision-making for training. The algorithm trains itself according to the success or error of the output result (decision). The optimized algorithm will be able to give better results through a lot of experience training. Prediction. Similar organisms gradually form anticipation of stimuli under the stimulation of rewards or punishments given by the environment, and produce habitual behaviors that can obtain the greatest benefits. In the context of operations research and cybernetics, reinforcement learning is called "approximate dynamic programming" (ADP).

Basic Machine Learning Algorithms:

  • Linear Regression Algorithm Linear Regression
  • Support Vector Machine (SVM) algorithm
  • Nearest neighbor/k-nearest neighbor algorithm (K-Nearest Neighbors, KNN)
  • Logistic Regression Algorithm Logistic Regression
  • Decision Tree Algorithm
  • K-Means Algorithm K-Means
  • Random Forest Algorithm Random Forest
  • Naive Bayes Algorithm
  • Dimensional Reduction
  • Gradient Boosting

1. Linear Regression Algorithm

Regression analysis is a statistical data analysis method, the purpose is to understand whether two or more variables are correlated, the direction and strength of the correlation, and to establish a mathematical model to observe specific variables to predict changes in other variables.

The linear regression algorithm (Linear Regression) modeling process is to use data points to find the best fit line. The formula, y = m*x + c, where y is the dependent variable and x is the independent variable, finds the values ​​of m and c using the given data set. Linear regression is divided into two types, namely simple linear regression (simple linear regression) , only one independent variable; multivariate regression (multiple regression) , at least two independent variables.

Here is a linear regression example: described based on the Python scikit-learn toolkit.

from sklearn import linear_model, datasets

#digit dataset from sklearn
digits = datasets.load_digits()
#create the LinearRegression model
clf = linear_model.LinearRegression()

#set training set
x, y = digits.data[:-1], digits.target[:-1]
#train model
clf.fit(x, y)

#predict
y_pred = clf.predict([digits.data[-1]])
y_true = digits.target[-1]

print(y_pred)
print(y_true)

2. Support Vector Machine (SVM) algorithm

Support Vector Machines/Network Algorithms (SVMs) are classified algorithms. The SVM model represents instances as points in space, and a line will be used to separate the data points. It should be noted that SVMs need to fully label the input data and are only directly applicable to two-class tasks, the application reduces the multi-class task needs to a few binary problems.

from sklearn import svm, datasets

#digit dataset from sklearn
digits = datasets.load_digits()

#create the  Support Vector Classifier
clf = svm.SVC(gamma = 0.001, C = 100)

#set training set
x, y = digits.data[:-1], digits.target[:-1]

#train model
clf.fit(x, y)

#predict
y_pred = clf.predict([digits.data[-1]])
y_true = digits.target[-1]

print(y_pred)
print(y_true)

3. Nearest neighbor/k-nearest neighbor algorithm (K-Nearest Neighbors, KNN)

The KNN algorithm is an instance-based learning, or a local approximation and lazy learning that defers all computations until after classification. Use the nearest neighbors (k) to predict unknown data points. The value of k is a key factor in prediction accuracy. Whether it is classification or regression, it is very useful to measure the weight of neighbors, with closer neighbors having more weight than distant neighbors.

The disadvantage of the KNN algorithm is that it is very sensitive to the local structure of the data. It is computationally intensive and requires normalizing the data so that each data point is in the same range.


from sklearn import datasets
from sklearn.neighbors import KNeighborsClassifier

#digit dataset from sklearn
digits = datasets.load_digits()

#create the  KNeighborsClassifier
clf = KNeighborsClassifier(n_neighbors=6)

#set training set
x, y = digits.data[:-1], digits.target[:-1]

#train model
clf.fit(x, y)

#predict
y_pred = clf.predict([digits.data[-1]])
y_true = digits.target[-1]

print(y_pred)
print(y_true)

Extension: One disadvantage of KNN is that it relies on the entire training dataset, Learning Vector Quantization (LVQ) is a supervised learning human neural network algorithm that allows you to select training instances. LVQ is driven by data, searches for the two nearest neurons to it, draws the same type of neurons, and repels the different types of neurons, and finally obtains the distribution pattern of the data. If a better dataset classification effect can be obtained based on KNN, using LVQ can reduce the storage size of the training dataset. Typical learning vector quantization algorithms are LVQ1, LVQ2 and LVQ3, especially LVQ2 is the most widely used.

4. Logistic regression algorithm Logistic Regression

The Logistic Regression algorithm is generally used in scenarios that require a clear output, such as the occurrence of certain events (predicting whether rainfall will occur). Usually, logistic regression uses some function to compress probability values ​​to a certain range. For example, the Sigmoid function (S-function) is a function with an S-shaped curve for binary classification. It converts the probability value of an event to a 0, 1 range representation.

Y = E ^ (b0 + b1 * x) / (1 + E ^ (b0 + b1 * x))

The above is a simple logistic regression equation, B0, B1 are constants. These constant values ​​are calculated to ensure minimal error between predicted and actual values.

5. Decision Tree Algorithm

A decision tree is a special tree structure that consists of a decision graph and possible outcomes (such as cost and risk) to aid decision making. In machine learning, a decision tree is a predictive model. Each node in the tree represents an object, each bifurcation path represents a possible attribute value, and each leaf node corresponds to the root node to the leaf node. The value of the object represented by the traversed path. Decision trees have only a single output, and usually the algorithm is used to solve classification problems.

A decision tree contains three types of nodes:

  • Decision node: usually represented by a rectangular box
  • Opportunity node: usually represented by a circle
  • Termination point: usually represented by a triangle

An example of a simple decision tree algorithm to determine who in a crowd prefers to use a credit card. Considering the age and marital status of the population, people are more likely to choose a credit card if they are 30 years old or married, and less if they are married. This decision tree can be further extended by identifying suitable attributes to define more categories. In this example, if a person is married and he is over 30, they are more likely to have a credit card (100% preference). Test data is used to generate decision trees.

Note : For data with inconsistent numbers of samples in each category, the results of information gain in decision trees are biased towards those features with more numerical values.

6. K-Means Algorithm K-Means

The k-means algorithm (K-Means) is an unsupervised learning algorithm that provides a solution to the clustering problem. The K-Means algorithm divides n points (which can be an observation or an instance of a sample) into k clusters, so that each point belongs to the cluster corresponding to the nearest mean (ie, the cluster center, centroid). . Repeat the above process until the center of gravity does not change.

7. Random Forest Algorithm

The name of the random forest algorithm (Random Forest) comes from random decision forests proposed by Bell Labs in 1995. As its name says, random forest can be regarded as a collection of decision trees. Each decision tree in a random forest estimates a classification, a process called "voting". Ideally, we choose the class with the most votes for each vote for each decision tree.

8. Naive Bayes Algorithm

Naive Bayes is based on the Bayes theorem of probability theory and has a wide range of applications, from text classification, spam filters, medical diagnosis, and more. Naive Bayes is suitable for independent scenarios between features, such as predicting flower type using the length and width of petals. The connotation of "simplicity" can be understood as the strong independence between features and features.

A concept closely related to the Naive Bayesian algorithm is Maximum likelihood estimation, and most of the maximum likelihood estimation theory in history has also been greatly developed in Bayesian statistics. For example, to establish a population height model, it is difficult to have the human and material resources to count the height of everyone in the country, but it is possible to obtain the height of some people through sampling, and then obtain the mean and variance of the distribution through maximum likelihood estimation.

Naive Bayes is called naive because it assumes that each input variable is independent.

9. Dimensional Reduction

In the field of machine learning and statistics, dimensionality reduction refers to the process of reducing the number of random variables under limited conditions to obtain a set of "irrelevant" main variables, which can be further subdivided into two major methods: feature selection and feature extraction.

Some datasets may contain many intractable variables. Especially in the case of abundant resources, the data in the system will be very detailed. In this case, the dataset may contain thousands of variables, most of which may also be unnecessary. In this case, it is nearly impossible to identify the variables that have the greatest impact on our predictions. At this point, we need to use a dimensionality reduction algorithm, and other algorithms may also be used in the process of dimensionality reduction, such as borrowing random forests and decision trees to identify the most important variables.

10. Gradient Boosting

Gradient Boosting uses multiple weak algorithms to create more powerful accurate algorithms. Instead of using a single estimator, it uses multiple estimators to create a more stable and robust algorithm. There are several gradient boosting algorithms:

  • XGBoost — uses linear and tree algorithms
  • LightGBM — uses only tree-based algorithms Gradient Boosting algorithms are characterized by high accuracy. Additionally, the LightGBM algorithm is incredibly high-performance.

Further reading: "The Machine Learning Master"

For more exciting content, scan the code and follow the official account: RiboseYim's Blog WeChat public account

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324393578&siteId=291194637