Machine Learning - A First Look

Accumulation of a few steps leads to a thousand miles, accumulation of laziness leads to an abyss

Note: This article mainly refers to Zhou Zhihua's "Machine Learning" when finishing.

main content

Provide a brief introduction to machine learning and introduce the basic concepts of machine learning.

What is Machine Learning (MachineLearning)

"The morning glow does not go out, the sunset travels a thousand miles"

People have learned from the experience accumulated in the past and concluded that if there are red clouds in the morning, it indicates that there will be rain today, and if there are red clouds in the evening when the sun goes down, it indicates that it will be sunny tomorrow. Similarly, we give computers (here we understand Machine as a computer) the ability to make effective decisions through the use of experience, which is called machine learning.

How to give computers the ability to "learn"

Definition: For experience E (experience) and a series of tasks T (tasks) and a certain performance measurement P, if the accumulation of experience E, the performance P can be improved for the defined task T, it is said that the computer has the ability to learn.

In computer systems, "experience" usually exists in the form of "data" . By designing a "learning algorithm", the computer can generate potential patterns and connections between data (for example: morning sun and today's rain, sunset and tomorrow's sunny weather), and we call this system of laws and connections a "model" . First, we provide the computer with empirical data, and the computer can generate a model based on these data and learning algorithms and then make effective decisions; when faced with a new situation (such as the sunrise), the model will provide us with corresponding judgments (such as today will rain).

Machine Learning Algorithms

Let us first understand the two basic means of scientific reasoning: induction and deduction . The former is a process of “generalization” from the specific to the general, that is, general laws are derived from specific facts; the latter is a process of “ specialization” from the general to the special, that is, deduced from basic principles specific situation.

Suppose we have collected a batch of data about watermelons, such as (color = green, root = curled, knocking sound = turbid), (color = black, root = slightly curled, knocking sound = dull), (color = light White, root = stiff, knock sound = crisp)..., each pair of brackets is a record, "=" means "value".

The process of machine learning is actually the induction process above. Let's take watermelon as an example.

The process of machine learning is the process of "generalization" from a specific data set, that is, the ability to judge unseen melons by learning the melons in the training set. We can think of the learning process as a process of searching in the space composed of all hypotheses. The goal of the search is to find a hypothesis that "fits" with the training set, that is, the hypothesis that can judge the correct melon in the training set .

For example, on the training set in Table 1.1, we can find a list of hypotheses that match the training samples, shown below:

(color = *, root = curled up, knock = *) -> good melon (1)

(color = *, root = *, knocking sound = turbid sound) -> bad melon (2)

(color = *, root = curled up, knocking sound = turbid sound) -> bad melon (3)

Algorithm selection basis

From the above, we can know that when matching on the same training set, multiple hypotheses may be matched, so what is the basis for the selection of algorithms in the specific process of machine learning?

In real problems, we often face a large hypothesis space, and the learning process is based on a limited sample training set. Therefore, there may be multiple hypotheses consistent with the training set, that is, there is a "hypothesis set" consistent with the training set. , we call it "version space", and the above hypothesis (1) (2) (3) is the hypothesis space "generalized" on the watermelon data set.

So when we encounter a newly harvested melon (color = green, root = curled up, knock = dull), which hypothesis should we use to judge? If assumption (1) is used, this is a good melon, but if assumption (2) (3) is used, this is a bad melon.

The selection preference at this time is called "inductive preference", and any effective machine learning algorithm must have its inductive preference, otherwise it must be confused by the equivalent hypothesis in the hypothesis space, and cannot produce definite learning results.

A commonly used and most basic "correct" preference principle in natural sciences is the "Occam's razor" principle

Occam's Razor: If multiple hypotheses are consistent with observations, choose the simplest one

In fact, inductive preferences correspond to assumptions made by the learning algorithm itself about what model is better. In specific real-world problems, whether the inductive preference of the algorithm matches the problem itself, most of the time, directly determines whether the algorithm can achieve good performance.

For example, in Figure 1.3 of regression learning below, each training sample is a point in the graph. To learn a model consistent with the training set is equivalent to finding a curve that passes through all the training sample points. Obviously, there are many such curves. If using Occam's razor preference principle, the smoother curve A will be better than the curve B.

But in reality, curve A is not necessarily better than curve B, because the samples used for training are only a part of the full amount of data, and it is impossible to know whether the real data is closer to curve A or curve B. As shown in Figure 1.4, the real data is that both situations are possible. In other words, for a learning algorithm a, if it is better than learning algorithm b on some problems, there must be other problems where b is better than a.

But the NFL theorem has an important premise: all "problems" have the same chance, or all problems are equally important. But that's not the case, and a lot of the time, we just focus on the problem we're trying to solve. For example, to find an algorithm to quickly get from place A to place B, if we consider that place A is Nanjing Drum Tower and place B is Nanjing Xinjiekou, then "biking" is a good solution; The situation of Nanjing Drum Tower and B site is obviously bad at Xinjiekou in Beijing, but we don't care about it.

So, the most important implication of the NFL theorem is that it makes us realize that it is meaningless to talk about "what learning algorithm is better" outside the specific problem, because all algorithms are equally good when all potential problems are considered. Choosing a specific algorithm and inductive preference for a specific problem is the right thing to do.

basic concepts

1) Basic terminology

Dataset : For this set of data; (ie, data about watermelons)

Example or sample : each record describes an event or object; (eg, (color=black, root=slightly curled, knocking=dull))

Attribute or characteristic : a matter that reflects the appearance or nature of an event or object in some way; (i.e., "color", "root", "knocking")

Attribute space : the space formed by attributes; (that is, if we take "color", "root", and "knocking" as the three coordinate axes, they form a three-dimensional space for describing watermelon, also called "sample". space" or "input space")

Feature vector : the coordinate vector corresponding to each sample in the attribute space;

Learning or training : The process of learning a model from data, accomplished by executing a learning algorithm;

Training data : the data used in the training process;

Training samples : each sample in the training data;

Training set : a collection of training samples;

If you want to learn a model that can help us judge whether the uncut is a "good melon", the previous example data is obviously not enough. To build such a model about "prediction" , we need to obtain the "result" information of the training samples, such as "((color=green, root=curled, knocking=loud), good melon)".

label : information about the result of the example; (eg, good melon)

Example (example) : an example with tag information;

Testing : the process of making predictions with a learned model;

Test sample : the predicted sample;

2) Algorithm classification

Classification : The target is marked as a category value (category); (such as "good melon", "bad melon")

Regression : The target is marked as a continuous numeric value; (eg watermelon ripeness 0.95, 0.37)

Supervised learning : It uses samples of known categories (that is, labeled samples with known corresponding categories), adjusts the parameters of the classifier, and trains an optimal model to achieve the required performance. , and then use this trained model to map all the inputs to the corresponding outputs, and make simple judgments on the outputs, so as to achieve the purpose of classification, so that the unknown data can be classified.

In layman's terms, we give the computer a bunch of multiple-choice questions (training samples) and provide their standard answers at the same time. The computer tries to adjust its own model parameters, hoping that the answers it speculates are as consistent as the standard answers, so that the computer can learn How to do this kind of question. Then let the computer do the multiple-choice questions (test samples) for which no answers are provided for us.

Unsupervised learning : To realize unlabeled and classified samples, we need to directly model the input data set, such as clustering. The most direct example is what we often say "people are divided into groups, Things cluster together.” We only need to put things with high similarity together. For new samples, after calculating the similarity, we can classify them according to the similarity.

Generally speaking, we give the computer a bunch of multiple-choice questions (training samples), but do not provide standard answers . The computer tries to analyze the relationship between these questions and classify the questions. The computer does not know the answers to these questions. nothing, but the computer thinks the answers to the questions within each category should be the same.

Reinforcement learning : The so-called reinforcement learning is the learning of the intelligent system from the environment to the behavior mapping, so as to maximize the value of the reward signal (reinforcement signal) function. Reinforcement learning is different from the supervised learning in connectionist learning. It is mainly manifested in teacher signals. Reinforcement learning The reinforcement signal provided by the environment is an evaluation (usually a scalar signal) of the quality of the generated action, rather than telling the reinforcement learning system RLS (reinforcement learning system) how to generate the correct action.

In layman's terms, we give the computer a bunch of multiple-choice questions (training samples), but do not provide standard answers . The computer tries to do these questions. As a teacher, we correct the correctness of the computer . The computer struggles to adjust its own model parameters in the hope that the answers it conjectures will be rewarded more. Strictly speaking, it can be understood as unsupervised and then supervised learning.

Transfer learning: Considering that most of the data or tasks are related, through transfer learning, we can share the learned parameters to the new model to speed up and optimize the learning of the model without learning from zero as before. Transfer the learned and trained model parameters to the new model to help the new model train the dataset.

Machine Learning Applications

Speech recognition, autonomous driving, language translation, computer vision, recommender systems, drones, identifying spam

Summarize

1. If computer science is the study of "algorithms", then similarly, it can be said that machine learning is the study of "learning algorithms".

2. The essence of machine learning is to build a relationship model between input and output, and use this relationship model to solve unknown situations.

3. The learning process of machine learning is the generalization process of the data set.

4. There is no absolute good machine learning algorithm. It is meaningless to talk about "what learning algorithm is better" without a specific problem, because if all potential problems are considered, all algorithms are equally good.

5. Selecting specific algorithms and inductive preferences for specific problems is the correct approach.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325414808&siteId=291194637