Introduction Machine Learning Chapter 1

introduction



转载请标明出处,本篇文章允许转载,禁止抄袭

foreword

With the continuous development of artificial intelligence, the technology of machine learning is becoming more and more important. Many people have started to learn machine learning. This article introduces the basic content of machine learning.

1. What is machine learning?

     To put it simply, given a bunch of data, a function F(x) is obtained through the set code, and when the unlearned data is given, the prediction result we want is obtained.
     For example, if a child is given a bunch of fruits (10 apples, 10 pears, 8 plums, and 12 peaches), the child can be regarded as a learner. After learning this bunch of fruits, he is given another apple. Although he has never seen this apple before, he can judge it as an apple with a high probability.

     Concept: Generally, the prediction task is to establish a mapping f from the input space X to the output space Y by learning the training set {(x 1 , y 1 ), (x 2 , y 2 ),..., (x m , y m )}: X->Y. (This sentence will be easier to understand when combined with the above two sentences and the second point)

     Goal: Make the learned model work well on "new samples".

     For binary classification tasks, usually Y={-1, +1} or {0,1}; for multi-classification tasks, |Y|>2 (greater than two categories); for regression tasks, Y∈R, R is a set of real numbers. (The result of the classification task is to classify the data, and the regression task generally outputs a value)

2. Basic terms

data :

  •      Form: (x1,x2,...,xn) - a data with n attributes
         For example: (name = Zhang San, gender = male, profession = extrajudicial lunatic) inside the brackets is a record , "=" means "value"
         Each record is a description of an event or object , called an " example " or " sample ". Things that reflect the performance or nature of an event or object in a certain aspect, such as "name", "gender", etc., are called " attributes " or " features ", and the values ​​​​on the attributes are called attribute values .

Dataset :

  • A collection of multiple records is called a " data set ", which is often represented by D.
  • D={x 1 ,x 2 ,...,x m }, D has m records, each record has n features, which can be understood as a column vector, and the data and features are expanded into a matrix form.
    Please add a picture description

mark:

  • The result information of the sample, sometimes called the real value, is often represented by y,
    such as: a watermelon ((color = green; root = curled up; knocking sound = turbid sound), good melon), where the good melon is the mark

Example:

  • Example with tag information
  • Form: data (x, y)
    such as: a watermelon ((color = green; root = curled up; knocking sound = muddy sound), good melon)

Sample space :

  • The space formed by attributes is called attribute space , sample space , or input space .
         An attribute corresponds to a dimension, such as name as the x-axis, gender as the y-axis, and profession as the z-axis, forming a sample space for describing people. Everyone can find their own coordinates in this space
    Mark space

Label space, output space:
Generally, the i-th example is represented by (xi, y i ), where y iY is the label of example xi , and Y is the set of all labels, also called label space or output space.

Eigenvectors :

  •      Each point in space corresponds to a coordinate vector, so we also refer to an instance as a feature vector .

dimension:

  • The number of attributes per instance or sample.
    For example: Zhang San's example has three attributes, and the dimension is 3.

Learning and training:

  • The process of learning a model from data, which is accomplished by executing a learning algorithm.

training data:

  • Data used during training.

Training samples:

  • Each sample used during training is called a training sample

Training set:
     a collection of training samples

Assuming that
     the learned model corresponds to some potential law about the data, this potential law itself is called "truth" or "truth", and the learning process is to find out or approach the truth

Learner:
     model, which can also be seen as an instantiation of an algorithm on a given data and parameter space

Classification:

  • The learning task wants to predict discrete values, such as good melons and bad melons, then this classification task is called classification
  • Binary classification task:
    a classification task involving only two categories, usually one of which is called a positive example and the other is a negative example
  • Multi-classification tasks:
    Classification tasks involving multiple classes

return:

  • The learning task will predict continuous values, such as the sweetness of watermelon, 0.92, 0.84, 0.48, and so on.

test:

  • After the model is learned, the process of using the model to make predictions is testing

Test sample:

  • During testing, the samples to be predicted are called test samples

Clustering:

  • The watermelons in the training set are divided into several groups , and each group is called a "cluster"; these automatically formed clusters may correspond to some potential concept divisions, for example, for watermelons, concepts such as light-colored melons, dark-colored melons, local melons, and foreign melons that do not appear in the attributes of the sample.
  • In clustering learning, the underlying concepts are unknown to us in advance, and the training samples used in the learning process usually do not have labeled information.

Classification of learning tasks: (roughly can be divided into two categories)

  • Supervised Learning:
    Representation: Classification and Regression
  • Unsupervised Learning:
    Representation: Clustering

Generalization:

  • The ability of the learned model to apply to new samples

3. Hypothetical space

Hypothesis space :
     As the name suggests, it is the space composed of all hypotheses.

     Once the hypothetical representation is determined, the hypothesis space and its size are determined.

     We can regard the learning process as a process of searching in the space composed of all hypotheses. The search goal is to find a hypothesis that "matches" the training set, that is, a hypothesis that can judge the samples in the training set to be correct.

     In a sample training set, the form of a watermelon sample is as follows
     Watermelon ((color; root; knocking sound), good melon)
     If each attribute has three values, then assume that the space size = 3x3x3
     But it is necessary to consider the possibility of an attribute no matter what value it takes, here it is represented by a wildcard " * ", and the size is equal to 4x4x4. At this time, the size is equal to 4x4x4. There is no concept of a good melon, that is, an empty set, represented by ∅, the total size is 4x4x4+
     1

     Many strategies can search this hypothesis space, such as top-down, from general to special, or bottom-up, from special to general. During the search process, hypotheses that are inconsistent with positive examples and those that are consistent with negative examples can be continuously deleted. In the end, a hypothesis that is consistent with the training set (that is, able to make correct judgments on all training samples) will be obtained, and this is the result we have learned.

It should be noted that in practical problems, we often face a large hypothesis space, but the learning process is based on a limited sample training set, so there may be multiple hypotheses consistent with the training set, that is, there is a " hypothesis set
     " consistent with the training set, which we call "version space " .

4. Inductive Preferences

     Inductive preference:
          The preference of a machine learning algorithm for a certain type of hypothesis during the learning process is called "inductive preference", or simply "preference".

     For some training samples, it is impossible to say which hypothesis is "better". However, for a specific algorithm, a specific model must be generated. At this time, the "preference" of the learning algorithm itself will play a key role.

     Any effective machine learning algorithm must have its inductive preference, otherwise it will be confused by the seemingly "equivalent" assumptions in the training set, and thus cannot produce definite learning effects.

Inductive preferences can be thought of as heuristics or " values "      for the learning algorithm itself to choose among hypotheses in a potentially large space of hypotheses .

  • Among them, " Occam's razor " is a common and most basic principle in natural science research, that is, "if there are multiple hypotheses consistent with observations, choose the simplest one".
  • However, Occam's razor is not the only principle that works. Moreover, there are different interpretations of Occam's razor itself, and the principles of use are not trivial. For example: Given two hypotheses for a certain problem, which one is simpler? The question itself is not simple. Need to use other mechanisms to solve.

     In fact, the inductive preference corresponds to the "assumptions about what kind of model is better" made by the learning algorithm itself. In specific practical problems, whether this assumption is true, that is, whether the inductive preference of the algorithm matches the problem itself, most of the time directly determines whether the algorithm can achieve good performance.
     Given two learning algorithms (one smart and one clumsy), according to mathematical derivation (omitted here, see "Machine Learning" Zhou Zhihua, p8 for details), the total error of all samples outside the training set has nothing to do with the learning algorithm. That is: no matter how far apart the two learning algorithms are, their expected performance is the same . This is the " No Free Lunch Theorem" (NoFree Lunch Theorem, referred to as NFL theorem).      The " No Free Lunch" theorem
     has an important premise: all problems have the same plan, or all problems are equally important. But the actual situation is not the case. In many cases, we only pay attention to the problem we are trying to solve. As for whether this problem is a good solution in other aspects, we don't care. Matching often plays a decisive role .

Summarize

     This chapter mainly explains some basic concepts in machine learning, where the basic terminology needs attention, and the format of the data set, which will be easier to understand and write code in the future.


转载请标明出处,本篇文章允许转载,禁止抄袭

Guess you like

Origin blog.csdn.net/G_Shengn/article/details/127349154