[AI Bottom Logic] - Chapter 5 (Part 1): Regression & Classification of Machine Learning Algorithms

Table of contents

introduce

1. What is machine learning

1. Set rules and learn rules

2. Definition of algorithm

2. Machine Learning Algorithms

1. Common learning methods

2. Return

3. Classification

Continuation of the next article...

Past highlights:


introduce

At the beginning of the computer invention, experts sorted out professional knowledge and experience into rules and input them into computer programs, but this could not keep up with the speed of knowledge updating, and it was time-consuming and laborious! So a "lazy" method was born - machine learning , computers can automatically learn the rules from the data, it is not mysterious, in fact it is a set of rigorous calculation logic (data processing logic). Algorithms are like the engine of a machine, allowing the computer to become an "expert" in a certain field based on data without human intervention.

1. What is machine learning

What is "learning"? According to Professor Simon, the winner of Turing Award and Nobel Prize in Economics , the core of learning is to improve performance . "Machine learning" was first proposed by Samuel. His checkers program is the world's first self-playing program. Tom Mitchell defines "machine learning" this way: is a computer program, measured by performance (P) on some task (T), that learns to improve through experience (E) . Professor Ng Enda believes that machine learning is a science that enables computers to operate without being explicitly programmed .

1. Set rules and learn rules

Traditional programming is based on rules, and the purpose is to quickly find the answer. The rules here refer to the data structures and calculation methods that programmers are more familiar with. They are the core of computer programs. After the rules are set, the input data will get the corresponding answer.

The biggest difference between machine learning and traditional programming is that what the computer outputs is a process of turning input into output , not the result of the answer . In machine learning, the algorithm tries to find the "association" between input and output data , and constantly updates and iterates this "association". Once it passes the test, the set of association rules becomes a mathematical model.

From a mathematical point of view, machine learning has three elements : model, algorithm, and strategy .

①Model: It defines what to learn , and transforms an actual business problem into a problem that can be expressed quantitatively by mathematics .

② Algorithm: It defines how to learn , which is essentially a mathematical operation method . For example, when solving the least squares problem mathematically, the iterative method of gradient descent is used, which is the algorithm.

③Strategy: It defines when to end learning , that is, the goal and criterion of learning optimization. Mathematically, it is usually expressed as a loss function to evaluate the gap between the predicted value and the theoretical value, and transform machine learning into an optimization problem that minimizes the loss function.

Note: "Machine learning" is not the same as "machine learning algorithm". Machine learning is a complete process of learning models from data, which is an engineering problem; while machine learning algorithms are mathematical operations used during the process, which is a mathematical problem.

In terms of scope, AI>machine learning>deep learning, machine learning is a method to realize AI, and deep learning is a technology to realize machine learning. Deep learning is a series of models based on artificial neural networks; in addition to the neural network structure, machine learning can be classified into the category of machine learning as long as it is a model that can be trained through data; AI, in addition to computer applications such as machine learning, will also be extended to biology, cognition, and psychology.

2. Definition of algorithm

An algorithm is a pre-defined calculation method, a set of steps or rules that a computer follows, and an algorithm is the "cooking process". In the computer, the algorithm is expressed as a program instruction, such as bubble algorithm, MD5 encryption algorithm, hash algorithm, etc. They give a set of general solutions and have a strong problem migration ability!

In the past, algorithmic instructions were just a set of rules that determined the basic execution logic in advance, and how "smart" it was was up to the programmer. Later, machine learning algorithms emerged to automatically summarize data laws and solve more complex problems!

Today, almost 90% of programmers do not directly deal with algorithms. These algorithms are packaged into software by specialized algorithm engineers, which is like a "black box" to other programmers. Some simple algorithms can directly call the software package, while complex algorithms need to be purchased specially. "Algorithm" has become a commodity!

2. Machine Learning Algorithms

At the beginning of the design, the algorithm is very simple and concise, just to solve a certain problem. The reason why it is difficult to understand is that it is very different from the way humans think about problems . To build a machine learning algorithm model, there are several core issues that need to be solved:

①Modeling problem. Using mathematical functions to express real-world functions is the most difficult and most important. The defects/limitations of AI are not because computers cannot solve mathematical equations, but because they cannot be well described by mathematical functions.

②Assessment problems. A set of evaluation criteria evaluates the pros and cons of the function model , and the "familiar phenomenon" in business (such as the price seen by old customers is higher than that of new customers) is due to the artificial use of evaluation functions with specific inclination goals.

③ optimization problem. Find a function that performs best .

1. Common learning methods

According to the learning method - unsupervised learning, supervised learning, reinforcement learning .

① Unsupervised learning: The input data has no dimensional labels , and the output is usually automatically aggregated labels of different categories . Classification is carried out automatically, as long as there is data, similar features will be found, and its typical algorithm is clustering algorithm .

例子,让计算机将一篮水果中的同类别水果归到一起,不知道水果种类(标签),
首先它需要得到各个水果的特征数据,并表示为数学向量,假设此向量包含
了颜色、味道、形状等特征。然后将相似向量(距离较近)的水果归为一类。

②Supervised learning: The category (label) of the input data is known . It can predict the labels of unknown data based on the labels of known data . Typical application scenarios are recommendation and prediction , which is the most widely used in machine learning.

还是上面的水果问题,这次知道水果标签(苹果、香蕉等),
计算机学习这些标签和特征数据之间的联系,如发现红色、甜的、
圆的很有可能是苹果,黄的、甜的、长条的很有可能是香蕉。学习好
以后,就得到一个可判断水果类别的模型。

③Reinforcement learning: The input is the state of some data, actions and feedback from the environment interaction, and the output is the best action in the current state. The purpose is to maximize the long-term reward return and constantly pursue better.

Compared with the former two, reinforcement learning is a dynamic learning process , with no clear goal and no precise measurement standard for the results. With decision-making attributes, select some behaviors continuously (without any labels and data to tell the computer what to do), you can only try to do some actions and then improve based on feedback .

It's a bit like closed-loop control, and yes, many control and decision-making problems are reinforcement learning problems, such as achieving stable flight of a drone or getting an artificial intelligence to achieve a high score in a video game.

2. Return

Regression is a supervised learning algorithm , a method of analyzing the relationship between variables. Such as housing prices and housing area, climate warming and carbon emissions.

The term "regression" was first proposed by Darwin, whether it is pea size or human height, there is a phenomenon of "regression to the mean". Mathematically, "regression" is often associated with data prediction , but in fact the word itself does not have a predictive meaning, but it is reserved for some reasons.

Algorithms generally have two purposes: ① Explain existing laws. Find the appropriate equation expression with known data; ②Predict the unknown and the future. Mathematical equations not only represent associations, but also make predictions about data samples. Let's take a linear regression as an example:

Univariate linear regression only studies the relationship between one independent variable X and one dependent variable Y. Suppose there is a set of data with two variables X and Y, draw these data on the function graph, you can get a scatter plot, you can see that these data points seem to gather near the straight line, this hidden straight line is the regression equation to be solved.

①Assume that Y and X are linearly related , $Y=\theta _0+\theta_1X+\varepsilon $. where $\varepsilon $is the random error, which is the sum of the influence of all uncertain factors, and its value is usually unobservable. Mathematically, it is regarded as random noise, assuming that it obeys a normal distribution; as for the parameter sum $\theta _0$, $\theta _1$the idea is to find a straight line (if the data is multidimensional, you need to find a plane), so that the sum of the squares of the distance between the sample data and the straight line is as small as possible - the least squares fitting method (least square method ) .

After determining the expressions of Y and X, it is necessary to perform a hypothesis test on the regression equation , because their linear relationship is assumed before calculation. With the help of statistical methods, hypothesis testing is performed on the parameters of the regression equation to verify whether Y and X are really linear.

expand:

Linear regression is a machine learning process that finds the relationship between two sets of variables. But the actual scene is multi-dimensional . To predict housing prices, in addition to housing price data, per capita income, housing area, distance from urban areas, medical and educational resources, etc. are required. There are multivariate effects— multiple linear regression . The mathematical approach is similar to linear regression with one variable .

Multiple linear regression may encounter the problem of multicollinearity . Multicollinearity means that there are some variables with strong linear correlation in the sample data. For example, the building area and usable area of ​​the house are in this relationship . When predicting housing prices, it is best not to use both of them at the same time, which will increase the computational complexity, deteriorate the interpretation, and may cause the solution of the equation to be unstable.

Based on the improvement and optimization of the linear regression algorithm , ridge regression, Lasso regression, elastic network regression, etc. have been developed .

3. Classification

Classification is a supervised learning algorithm , the most widely used in machine learning, which classifies the input data into a limited number of pre-defined categories according to the characteristics of the input data , and the input and output are all discrete variables. Taking text classification as an example, the input is the feature vector of the text, and the output is the category of the text.

Common classification algorithms: artificial neural network algorithm, decision tree algorithm, support vector machine, K-nearest method, naive Bayesian, logistic regression algorithm, etc.

① Classification evaluation method

The first step in building a classification model is to find a way to evaluate the classification performance . For a classification model, people sometimes value its ability to judge correctly, and sometimes pay more attention to the risk of its wrong decision . This requires the definition of two important indicators - precision and recall .

Precision, also known as accuracy and precision, is used to measure the accuracy of the classification itself . The recall rate , also known as the recall rate, is used to measure the coverage of the correct classification. For a binary classification problem (the classification results are only "true" and "false"), the precision rate indicates how many "true" are actually "true" among the samples considered "true"; the recall rate indicates how many "true" are actually found among all "true" samples. Sometimes, the harmonic mean of precision and recall is also used as a comprehensive evaluation index.

In general, different scenarios have different emphasis. For example, the police should focus on the precision rate in catching criminals, and the disaster forecast should focus more on the recall rate. These two indicators are usually not compatible.

②Classification Algorithm 1——K Neighborhood Algorithm

The K neighbor algorithm is the KNN algorithm, which is classified by "measurement distance". Mathematically, "distance" is often used to distinguish data. Here it refers to " Euclidean distance ". On a two-dimensional plane, Euclidean distance represents the geometric plane distance between two points.

It can select K data points closest to the "new unknown samples" in the feature space , and then analyze their categories. If most points belong to a certain category, then this new sample also belongs to this category. The number of " K " here is self-defined (usually an integer <20), and the selected "K" neighbors are objects that have been correctly classified. The algorithm only determines the category of the new sample to be classified based on the K nearest neighbor categories —in layman's terms, the classification is based on "the minority obeys the majority" .

Examples can be seen in the previous OpenCV blog: [OpenCV-Python] - Machine Learning kNN Algorithm & SVM Algorithm & k-Means Clustering Algorithm & Deep Learning Image Recognition & Object Detection

Advantages: The logic is simple, easy to understand, no need to estimate parameters, and no data training is required, and it is more suitable for multi-classification problems.

The disadvantage is that ① the algorithm adopts the classification method of majority voting , which may have defects when the distribution of the categories of the original samples (these original neighbors) is skewed —for example, when people make decisions, they always like to hear the opinions of the elites, but the elites are often a minority, and the surrounding people will bring you a greater impact of misjudgment. All data sets must be saved , which requires a large storage space , and the distance must be calculated for each data sample, which requires a large amount of calculation .

③Classification Algorithm 2—Support Vector Machine Algorithm

The design and implementation of Support Vector Machine (SVM) is ingenious, and it is also recognized as an algorithm that elegantly combines engineering problems and mathematical problems. Suppose we have the following sample data, which are represented by black dots and white dots. Now we need to divide them into two categories. There are several ways to divide them, but which one is the best?

The algorithm tells us that the optimal dividing line is the straight line with the largest gap distance to the points on both sides, as shown in the left figure. However, not all problems can be divided by a straight line when dividing the data set, as shown in the right figure.

The secret sauce is the use of a mathematical transformation known as the "nuclear trick" . It can remap the data from the sampling space to the hyperspace that is easier to be separated, which is essentially a kind of data "dimension-enhancing" processing . The mathematical process involved is relatively complicated: nonlinear programming solution, Lagrange multiplier method, dual problem optimization, sequential minimum optimization algorithm (SMO), kernel function method, etc., do not expand! To put it simply, the SVM algorithm puts classification problems that cannot be solved in low-dimensional space into high-dimensional space to solve . Mathematics proves that this "dimensional increase" action is feasible and will not bring additional computational overhead (this ensures the efficient operation of the algorithm).

Its logic is such that the linear inseparable problem that cannot be solved in two-dimensional space is mapped to three-dimensional space, and the original classification boundary on two-dimensional space becomes the projection of classification function on three-dimensional space on two-dimensional space. Similarly, for the linear inseparability problem in three-dimensional space, the function can be mapped to four-dimensional space, and the hyperplane of four-dimensional space can be found to complete the segmentation task, and so on.

A simpler example, three points (-1, 0, 1) on one dimension cannot be divided into three categories with only one breakpoint, and its process can be as follows:

SVM is an elegant algorithm, which not only has a mathematical beauty, but also explains engineering problems very well, and also contains philosophical ideas-when you master more dimensional data and observe things from a higher perspective, everything will become so easy!

④ Classification Algorithm 3 - Decision Tree Algorithm

The tree shape is also the basic law of human thinking. This structure is easy to memorize, speak, speak, write, etc. Every choice a person makes can be decomposed into a series of small decisions, and one small decision after another forms a huge decision tree. The German mathematician Leibniz once said: "Life can be decomposed into a series of continuous binary decisions. In a sense, life is a decision tree algorithm! What a wonderful explanation!

Tree structures are good at classification and retrieval. The data table index in the database is generally a tree structure, and the memory and process management of the operating system also use the tree, which can greatly increase the search rate with a small overhead.

Decision tree algorithm is a decision algorithm with tree structure. Decision tree can be regarded as a collection of if-then rules in a programming grammar. The whole decision process is intuitive and easy to understand, with good explanation logic and good visualization!

Core logic - splitting a very complicated matter into efficient sub-decision items. For example, the game of turtle soup guesses the truth of things by asking "whether or not questions".

The purpose of the algorithm - to generate a decision tree with the strongest generalization ability as possible , so that it can handle unseen data well - to find amethod of dividing branches and leaves that can best eliminate uncertainty .

Realization - After each decision , compare the information entropy before and after dividing the branches and leaves, and calculate the amount of information gain. The greater the amount of information gain, the higher the uncertainty eliminated. This division is also the best this time. Of course, there are many methods for feature selection in the actual construction of decision trees (such as Gini index, etc.), but no matter which purpose is to select the best features, through the selection of features to make the original

Pruning problem - the sample data in the decision tree algorithm is easy to be over-divided. Pruning is to cut off redundant branches and leaves for a decision tree to avoid overfitting . Generally, there are two methods: ①Pre-pruning , in the process of generating the decision tree, estimate the effect of each node after division, and decide whether to accept the current division; ②Post- pruning , first generate a complete decision tree, and then examine each non-leaf node (the bottom is the leaf) one by one from bottom to top, and evaluate whether they can be replaced by leaf nodes. This method belongs to post-pruning.

Defects and solutions - the decision tree model is a single predictor , the biggest problem is instability , sometimes the training sample set changes very little, but the generated decision tree will be very different. For this reason, people combine hundreds of decision tree models and randomly select tree models for each classification, which is called the random forest algorithm . The number of layers of each tree model in the random forest is small, and the classification accuracy will not be too high, but the combination will also show good prediction results .


Continuation of the next article...

The two types of regression and classification are introduced above, and the next part will continue with algorithms such as clustering, dimensionality reduction, and time series!

Past highlights:

【AI Bottom Logic】—Chapter 3 (Part 2): Information Exchange & Information Encryption & Decryption & Noise in Information

【AI Bottom Logic】—Chapter 3 (Part 1): Data, Information and Knowledge & Shannon Information Theory & Information Entropy

[Machine Learning]—Continued: Convolutional Neural Network (CNN) and Parameter Training

[AI Bottom Logic]——Chapter 1&2: Statistics and Probability Theory & Data "Trap"

Guess you like

Origin blog.csdn.net/weixin_51658186/article/details/131452269