Watermelon book study notes - Chapter 1: Introduction

1.1 Introduction

  • Definition of machine learning ([, 1997 Mitchell] proposed by): Suppose that P to evaluate the performance of a computer program on a task class T, if a program by using the experience E to obtain improved performance in the T tasks, then we He said on T and P, the program E was learning.

The term substantially 1.2

  • data set( D D ): the collection of records
  • Example (instance): the data set D in each record
  • Property (attribute), features (feature): reflect events or subject matter in a particular aspect of performance or nature (watermelon color, size)
  • Attribute space, the sample space, the input space: the space spanned by the attribute (for an example, each of its properties as the coordinate axes, the attribute space is formed due to its different examples of different attribute values, corresponding to different points. So we have an example also called "feature vector")
  • Training: process data obtained from the model school
  • Hypothesis (hypothesis): that is, to learn model (with our intentions to learn some underlying model law on data approximation, it is learned, also known as model assumptions)
  • Mark (label): information on the results of an example (such as: good melon)
  • The sample (example): have an example of tag information
  • Identify the type of learning tasks:
    Regression and Classification
    Supervised learning and unsupervised learning
  • Machine Learning Objectives: learn to make good use of the new model of the sample, not just perform well on the training samples
  • Generalization: The ability to learn model for new samples. Model has strong generalization ability is well suited for the entire sample space
    training set is usually just a small sample space sample (sample space can be large real task, such as 20 properties, 10 of the sample space the size of the property value reached 10 ^ 20), we want the training set can well reflect the characteristics of the sample space, or garbage in garbage out.

Generally assumed sample spaceAll obey an unknown sample distributionEach sample we get is independently sampling from this distribution obtained (ie, "independent and identically distributed")

In general: The more training samples -> get aboutThe unknown distributionThe more information -> more likely to have strong generalization ability of the model

1.3 hypothesis space

Induction (induction) and deduction (deduction) are the two basic means of scientific reasoning

  • induction:Special -> GeneralThe "generalization" (generalization) process (attributed to a general law from particular facts)
  • Deduction:General -> SpecialThe "specialization" (specialization) process (from basic principles to deduce the specific situation)

eg mathematical axiom system:

  • Based on a set of axioms and inference rules deduced therewith just the theorems -> deductive
  • Learning from examples in -> (generalized) inductive learning (inductive learning)

Narrow inductive learning: training data obtained from school concept, also known as concept learning (due to the good performance and the generalization to learn the semantics of a clear concept is too difficult, there is less learning technology research and application of the concept of basic yes.Boolean concept learning, That is "yes", "no" target concept of learning)

Learning process: inHypothesis spaceDuring the search, the search goal is to find matching (fit) training set of assumptions that will be able to focus on the training of melon determine the correct assumption.

Hypothesis space: the "(? Attr1 =) ∧ (? Attr2 =) ∧ (? Attr3 =) ∧ ... ∧ (? Attrn =)" possible values ​​assume the form of composition.

Hypothesis space search, the search process can continue to delete inconsistent with positive examples and counterexamples assumptions and assumptions consistent with, and ultimately get the same training set (ie allTraining samplesAble to assume the right to judge), and this is what we learn of the results.
Because space is generally assumed that large, but the learning process based on a limited sample set of training conducted, so there may be multiple hypotheses (form a "set of assumptions", that is,Version space) Is consistent with the training set.

Hypothesis space size: n-known attributes,To become a good melonEach attribute may have values of m, it is assumed that the size of the space is m ^ n + 1 (if the value of each attribute may have a mi, compared m1 * m2 * ... + 1, mi is equal to that precedes m ^ n + 1)
Why to +1 it?
First, explain the 1 refers to this extreme situation that there is no "good melon."
Secondly, we can see that the size of talking about here is the size of the hypothesis space before the sample space is very similar. Here the hypothesis space, the sample space, space to do some conceptual distinctions version:

  • Hypothesis space: inCase of the known values ​​of the attributes and their possibleA situation == all possible to meet the target (good melon) == an exhaustive set of hypotheses. (Which assumes *** is a good melon)
  • Sample space: the set of all possible and reasonable, the situation. itNot to assume that as a starting point, It is the set of all samples. The size of the sample space compared m ^ n
  • Space version (version space): training set consistent set of assumptions (Reference: https://blog.csdn.net/csucsgoat/article/details/79598803 the interpretation of the version space)

1.4 induction preferences

Version brought us the problem space is different from the model that corresponds to each hypothesis in the face with a new sample, is likely to produce different output, the use of which model is good? At this point the learning algorithm itself preferences will play a key role.

Preference induction (inductive bias): machine learning algorithms in the learning processThe assumption of some kindPreferences (the algorithm can be seen as their "values")
assume eg may preferences:

  • Possible special assumptions (applicable as little as possible, the property is * less)
  • As general assumptions (applicable as much as possible, the property is * more)
  • Suppose a property in line with the preference of the algorithm (such as for roughly two assumptions in the model's algorithm attract girls prefer high apperance properties that hypothesis)

Summed up the importance of preference:
Induction preference algorithm matches the problem itself, most of the time directly determines the algorithm can achieve good performance
(As in the model to attract girls, the two roughly equivalent to the assumption, the assumption that if the algorithm prefer lower apperance properties, that the "values" of this algorithm is not consistent with the values ​​of most of the girls)

"Smart" algorithm is always better than a "clumsy" algorithm b?

"There is no free lunch (No Free Lunch theorem, NFL)": In the same opportunity to all "problem" appears on the premise (assuming that the uniform distribution of the objective function), regardless of the learning algorithm "smart" or not, they have the same expectations . (See proof materials)

But the reality is, we are only concerned about the problem (such as a specific application tasks) are trying to solve their own solutions to this end we have made, we do not care about its role on other issues.

Therefore it emphasized before or back:
Induction preference algorithm matches the problem itself, often play a decisive role

Guess you like

Origin blog.csdn.net/shichensuyu/article/details/90235279