Machine learning the basics summarizing

Finishing on some basic concepts of machine learning

1 Introduction

1. Machine learning is a means of working on how to calculate and use the experience to improve their system performance disciplines. In 1997 Mitchell gives a more formal definition, it is assumed by P to evaluate the performance of a computer program on a task type T, if a program experiences through the use of E get improved performance on tasks T, then we say about T and P, the program E was learning.

  1. The main contents of the study machine learning algorithm is about to produce "model" from the data on a computer, that is the learning algorithm (learning algorithm). "Data" is the real-life "experience."
  2. Generating a model of the (model), the results can be obtained from the data refers to school. But there are different versions. Hand mentions in an article published in 2001 in the literature, is a global model results (such as a decision tree), mode refers to localized results (such as a rule).

2. The term substantially

Cited embodiment, it is assumed to glean data on watermelon (color = green; pedicle = curled; knocking sound = voiced sound), (color = black; pedicle = slightly curled; knocking sound = boring), (color = pallor; = pedicle stiffness; = knocking sound crisp), ......

Cited in the above-described embodiment, each pair of brackets on watermelon (may also be any event or object) one record, there are:

A collection of records: data set (data set)

Example / Sample (instance / sample): are each of the above-described recording, it may be referred Example / Sample

Property / feature (attribute / feature): reflect the performance of the matter or the nature of the event or an object in a certain area. Color above example, pedicle, knocking sound

Attribute value (attribute value): on the property values, for example cyan, black, etc.

Feature vector (feature vector): each point opposite the origin of coordinates on the vector space defined attributes

Number of each sample (sample) contained in different attributes: the dimension of the sample (sample dimensionality)

Data obtained from the process model high school: learning / training (learning / training)

The training data (training data): Training data used during

Training sample (training sample): training data sample

Training set (training set): a set of training samples

Hypothesis (hypothesis): learn model is worth. Because this model corresponds to a certain underlying laws about data

Truth / real (ground-truth): Potential law itself

Mark (label): information on the results of the training sample

The sample (example): have an example of tag information. In general, with ( \ (x_i, y_i \) ) represented by the i-th sample. Wherein \ (y_i \) is an example of \ (x_i \) labeled

Mark space / output set (label space): the set of all marked

Attribute space / sample space / input space (attribute / sample space): attribute space spanned. For example, if the color, pedicle, the three axes as knocking sound, they span a three-dimensional space for describing watermelon, watermelon each can find their place.

Classification (classification): In regard to the use of predictive models to learn the training set, the predictive value of discrete values. Such a learning task called classification

Regression (regression): the classification of similar, but the predicted values ​​are continuous values

\ ({\ Color {red} add that :( prediction task is to hope that through the training set {(x_1, y_1), (x_2, y_2), ...} to learn, to create a space from input to output space Mapping relations)}\)

Binary classification task (binary classification): predictions are only two categories. Typically referred to as a positive type in which (positive class), known as the anti another class (negative class). Involving multiple category is called multi-classification (multi-class) mission

Test (testing): learn later model, using the process to predict

Test sample (testing sample): predicted sample, e.g., in the learn \ (\ mathit {f} \ ) after the test instance \ (x_i \) to obtain the prediction flag \ (y = f (x_i) \)

Clustering (clustering): the sample in the training set into a plurality of groups, each called a cluster (cluster). These clusters may be formed automatically some potential corresponding to the concept of classification, for example, in the present embodiment can be divided into light melon watermelon, melon ... in the dark clustering, these light melon, melon dark prior concept is not know, learning and training samples used in the process is usually no tag information

Learning tasks can be roughly divided into two categories:

(1) supervised learning (supervised learning): training data with a mark, such as classification and points return

(2) unsupervised learning (unsupervised learning): training data is not marked, such as clustering

Generalization: the ability to learn new model is applicable to samples

(Still assuming all current machine learning samples in the sample space to meet the independent and identically distributed assume, therefore classical probability theory is applicable)

3. hypothesis space

Introduction: induction (induction) and hypothesis (deduction) are the two basic means of scientific reasoning. The former is from the particular to the general generalization (generalization), that is attributed to a general law from concrete facts; the latter is from the general to the particular specialization (specialization) process, that is deduced from the basic principle of the specific situation. In machine learning, "learning from examples of" clearly a process of induction. Therefore, also known as inductive learning (inductive learning)

Generalized inductive learning: the equivalent of learning from examples in

Narrow inductive learning: get the concept (concept) data from the training school. Therefore, also known as "the concept of learning" or "concept formation." The concept is a Boolean learning the most basic concepts of learning, that is, or is not the result of the expression

Space consisting of all assumptions: Suppose space (hypothesis space)

We can be seen as a learning process in the course of a search space consisting of all the assumptions, the search target is to find the matching set of assumptions and training, being able to assume a training set of samples to determine correct assumption that once determined , assuming that space and its size is determined.

There can be many strategies to hypothesis space search, the search process can continue to delete inconsistent with the positive examples of assumptions or assumptions that are consistent with the counter-example. The final will be consistent with the assumption that the training set, which is to learn the results.

Space version (version space): In real life, we often face a lot of hypothesis space, but the learning process is based on a limited sample of the training set, and therefore, there may be multiple hypotheses consistent with the training set, and that there is a training set consistent set of assumptions, this set of hypotheses is the version space

4. induction preferences

Preference induction (inductive bias): machine learning algorithms preferences for certain types of assumptions in the learning process. Preferences to work, took place at the time of the formation of the version space. The learning algorithm must also produce a model. Thus, preference will work

\ ({\ color {red} any valid machine learning algorithms must have its preferences, or it will be assumed that the space appears on the training set of assumptions equivalent confused, unable to produce the correct learning outcomes} \ ) for summarized view preferences: induction preferences can be seen as a learning algorithm itself may be very large hypothesis space selection heuristic assumptions or values. There is a general principle to guide the algorithm to establish the right of preference, namely

Occam's razor (Occam's razor): If more than one observation is consistent with the hypothesis, then choose the simplest one. \ ({\ color {red} principles are not only available but the principle of Occam} \)

In fact, summed up the corresponding preference on "what kind of model assumptions better" learning algorithm itself made. In specific real-world problems, this assumption is satisfied that the induction preference algorithm matches the problem itself, most of the time directly determines the algorithm can achieve good performance.

Suppose learning algorithm \ (\ zeta_a \) is based on a model produced by some kind of induction preferences, learning algorithm \ (\ zeta_b \) is generated based on another model preference summarized. For learning algorithm \ (\ zeta_a \) if it is on some issues than learning algorithm \ (\ zeta_b \) is good, then there must be a problem, \ (\ zeta_a \) than \ (\ zeta_b \) was good? According to NFL theorem, no matter how produced two algorithms, both mistakes Introduction expectations are the same (NFL theorems, but the premise is the same opportunity to all problems, or all the problems are equally important, but we just need to focus on the actual situation we are currently trying to the problem can be solved, and therefore \ (\ zeta_a \) and $ \ zeta_b $ theorems there are differences .NFL actually want to explain, from the specific issues discussed learning algorithm is good or bad is meaningless. inductive learning algorithm own preferences and question whether the match, often play a decisive role.)

Guess you like

Origin www.cnblogs.com/my-python-learning/p/11827852.html