Top 10 Machine Learning Algorithms! Getting started is enough to see this~

Basic machine learning algorithms:

  • Linear Regression Algorithm Linear Regression
  • Support Vector Machine Algorithm (Support Vector Machine, SVM)
  • Nearest neighbor/k-nearest neighbor algorithm (K-Nearest Neighbors, KNN)
  • Logistic Regression Algorithm Logistic Regression
  • Decision Tree Algorithm Decision Tree
  • k-means algorithm K-Means
  • Random Forest Algorithm Random Forest
  • Naive Bayes Algorithm Naive Bayes
  • Dimensional Reduction algorithm Dimensional Reduction
  • Gradient Boosting Algorithm Gradient Boosting

1. Machine learning algorithms can be roughly divided into three categories:

1. Supervised Algorithms

       In the process of supervised learning training, a model (function/learning model) can be learned or established from the training data set, and new instances can be inferred based on this model. The algorithm requires specific inputs/outputs, starting with deciding what kind of data to use as an example. For example, a handwritten character, or a line of handwritten text in a text recognition application. The main algorithms include neural network, support vector machine, nearest neighbor method, naive Bayesian method, decision tree, etc.

2. Unsupervised Algorithms

       This type of algorithm does not have a specific target output, and the algorithm divides the data set into different groups.

3. Reinforcement Algorithms

       Reinforcement learning has strong universality and is mainly trained based on decision-making. The algorithm trains itself according to the success or error of the output result (decision), and the optimized algorithm through a large amount of experience training will be able to give better predictions. Under the stimulus of rewards or punishments given by the environment, similar organisms gradually form expectations of the stimulus and produce habitual behaviors that can obtain the greatest benefit. In the context of operations research and cybernetics, reinforcement learning is called "approximate dynamic programming" (ADP).

Second, the basic machine learning algorithm:

1. Linear Regression Algorithm Linear Regression

       Regression Analysis (Regression Analysis) is a statistical data analysis method, the purpose is to understand whether two or more variables are related, the direction and strength of the correlation, and establish a mathematical model to observe specific variables to predict changes in other variables.

        The modeling process of the linear regression algorithm (Linear Regression) is to use the data points to find the line of best fit. The formula, y = m x + c, where y is the dependent variable and x is the independent variable, uses the given data set to find the values ​​of m and c.
Linear regression is divided into two types, namely
 simple linear regression (simple linear regression) , with only one independent variable; *multivariate regression (multiple regression) , with at least two sets of independent variables.

        Here is an example of linear regression: based on the Python scikit-learn toolkit description.

2. Support Vector Machine algorithm (Support Vector Machine, SVM)

       Support vector machine/network algorithm (SVM) belongs to classification algorithm. SVM models represent instances as points in space, and a straight line will be used to separate the data points. It should be noted that support vector machines require full labeling of the input data and are only directly applicable to two-class tasks, and applications reduce the need for multi-class tasks to a few binary problems.

3. Nearest neighbor/k-nearest neighbor algorithm (K-Nearest Neighbors, KNN)

        The KNN algorithm is an example-based learning, or lazy learning that is local approximation and defers all calculations until classification. Use nearest neighbors (k) to predict unknown data points. The k value is a key factor for prediction accuracy, whether it is classification or regression, it is very useful to measure the weight of neighbors, and the weight of closer neighbors is greater than that of distant neighbors.

       The disadvantage of the KNN algorithm is that it is very sensitive to the local structure of the data. The calculation is heavy, and the data needs to be normalized so that each data point is in the same range.

        Extension: One disadvantage of KNN is that it depends on the entire training data set. Learning Vector Quantization (LVQ) is a supervised learning human neural network algorithm that allows you to select training examples. LVQ is driven by data, searches for the two neurons closest to it, draws in neurons of the same type, and repels neurons of different types, and finally obtains the distribution pattern of the data. If a better data set classification effect can be obtained based on KNN, using LVQ can reduce the storage size of the storage training data set. Typical learning vector quantization algorithms include LVQ1, LVQ2 and LVQ3, especially LVQ2 is the most widely used.

4. Logistic Regression Algorithm

        The Logistic Regression algorithm (Logistic Regression) is generally used in scenarios that require a clear output, such as the occurrence of certain events (predicting whether rainfall will occur). Typically, logistic regression uses some function to compress probability values ​​to a certain range.
For example, the sigmoid function (S-function) is a function with an S-shaped curve for binary classification. It converts the probability value of an event to a range representation of 0, 1.

Y = E ^(b0+b1 x)/(1 + E ^(b0+b1 x ))

       The above is a simple logistic regression equation, B0, B1 are constants. These constant values ​​will be calculated to ensure that the error between predicted and actual values ​​is minimized.

5. Decision Tree Algorithm Decision Tree

        A decision tree is a special tree structure consisting of a decision graph and possible outcomes (such as cost and risk) to assist decision-making. In machine learning, a decision tree is a predictive model. Each node in the tree represents an object, and each forked path represents a possible attribute value, and each leaf node corresponds to the path from the root node to the leaf node. The value of the object represented by the path traversed. Decision trees have only a single output, and typically this algorithm is used to solve classification problems.

A decision tree contains three types of nodes:

  • Decision node: usually represented by a rectangular box
  • Opportunity nodes: usually represented by circles
  • Termination point: usually represented by a triangle

       Example of a simple decision tree algorithm to determine who in a population likes to use credit cards. Considering the age and marital status of the crowd, if people are 30 years old or married, people are more inclined to choose credit cards, and vice versa.
This decision tree can be further extended by identifying suitable attributes to define more categories. In this example, if a person is married, they are more likely to have a credit card if they are over 30 (100% preference). The test data is used to generate a decision tree.

Share a teaching video of station b

[AI can still learn this way] Detailed explanation of "100" knowledge points of deep learning and machine vision, neural network + data set +..._哔哩哔哩_bilibili

Note : For those data with inconsistent sample sizes, the results of information gain in the decision tree are biased towards those features with more values.

6. k-average algorithm K-Means

       The k-means algorithm (K-Means) is an unsupervised learning algorithm that provides a solution to the clustering problem.
The K-Means algorithm divides n points (which can be an observation or an instance of a sample) into k clusters (clusters), so that each point belongs to the cluster corresponding to the nearest mean (that is, the cluster center, centroid) . Repeat the above process until the center of gravity does not change.

7. Random Forest Algorithm Random Forest

         The name of Random Forest algorithm (Random Forest) comes from the random decision forests proposed by Bell Labs in 1995. As its name says, Random Forest can be regarded as a collection of decision trees.
Each decision tree in a random forest estimates a class, a process called "voting". Ideally, based on each vote in each decision tree, we choose the class with the most votes.

8. Naive Bayes Algorithm Naive Bayes

        Naive Bayes algorithm (Naive Bayes) is based on the Bayes theorem of probability theory, which is widely used, from text classification, spam filter, medical diagnosis and so on. Naive Bayes is suitable for scenarios where features are independent of each other, such as using the length and width of petals to predict the type of flower. The connotation of "simple" can be understood as the strong independence between features and features.

A concept closely related to the naive Bayesian algorithm is maximum likelihood estimation (Maximum likelihood estimation). Most of the maximum likelihood estimation theories in history have also been greatly developed in Bayesian statistics. For example, to establish a population height model, it is difficult to have manpower and material resources to count the height of each person in the country, but it is possible to obtain the height of some people through sampling, and then obtain the mean and variance of the distribution through maximum likelihood estimation.

Naive Bayes is called naive because it assumes that each input variable is independent.

9. Dimensional Reduction

        In the field of machine learning and statistics, dimensionality reduction refers to the process of reducing the number of random variables under limited conditions to obtain a set of "irrelevant" main variables, and can be further subdivided into two methods: feature selection and feature extraction.

        Some datasets may contain many unmanageable variables. Especially in the case of abundant resources, the data in the system will be very detailed. In this case, the dataset may contain thousands of variables, most of which may also be unnecessary. In this case, it is nearly impossible to identify the variables that have the most influence on our predictions. At this point, we need to use a dimensionality reduction algorithm, and other algorithms may also be used in the dimensionality reduction process, such as borrowing random forests and decision trees to identify the most important variables.

10. Gradient Boosting

        Gradient Boosting uses multiple weak algorithms to create a more powerful exact algorithm. Instead of using a single estimator, using multiple estimators creates a more stable and robust algorithm. There are several gradient boosting algorithms:

  • XGBoost — use linear and tree algorithms
  • LightGBM — uses only tree-based algorithms
    Gradient boosting algorithms are characterized by high accuracy. Additionally, the LightGBM algorithm is incredibly performant.

Share some of the artificial intelligence learning materials I have compiled for you for free. It has been compiled for a long time and is very comprehensive. Including some common AI framework actual combat videos, image recognition, OpenCV, NLQ, machine learning, pytorch, computer vision, deep learning and neural network and other videos, courseware source code, well-known domestic and foreign elite resources, popular AI papers, industry reports, etc.

In order to better systematically learn AI, it is recommended that you collect a copy.

The following are some screenshots, and the free download method is attached at the end of the article.

1. Artificial intelligence must-read books

2. Artificial intelligence free video courses and projects

3. Collection of Papers on Artificial Intelligence

4. Artificial intelligence industry report

To learn artificial intelligence well, you need to read more books, do more hands-on work, and practice more. If you want to improve your level, you must learn to calm down and learn systematically slowly, so that you can gain something in the end.

Click on the business card below, scan the code to download the information in the text for free.

Guess you like

Origin blog.csdn.net/gp16674213804/article/details/125685387