table of Contents
Example data set (sample feature vector)
Training samples and the training set
N-type and anti-type (negative type)
The difference between classification and clustering
Supervised learning and unsupervised learning
Independent and identically distributed
Machine learning is defined
Machine learning is such a subject, it means working on how to calculate and use the experience to Mei good performance of the system itself in a computer system, "experience" is usually present in the "data" form, so? Machine learning study the main content is about have a "model" (model) from the data on a computer algorithm, namely "learning algorithm (learning algorithm). With the learning algorithm, we have empirical data available to it, it can be based on these data generating a model; in the face of new circumstances (for example, did not see a cut watermelon), the model will provide us with the appropriate judgments (such as good melon) If computer science is the study of knowledge about the "algorithm", then. similar can be said of machine learning is the study of knowledge about "learning algorithm".
Model and Pattern
"Model" refers to data obtained from the results of high school, or a global result (e.g. a decision tree), and a "mode" refers to locally stay junction (e.g., a rule).
Example data set (sample feature vector)
"Data set" that is a collection of data
"Exemplary" is also called a sample, the feature vector, that is, the data set in each record is a description of the event or object.
//// further relates attribute, attribute value, the attribute value, the sample space (input space) a simple stop here, see the book watermelon P2
dimension
I.e., the number of attributes of the feature vectors
Learning (training)
That was the model data from the high school process.
Training samples and the training set
"Training samples" that the training data for each sample, "training set" consisting of a set of training samples, ie.
Learning model (learner)
Learning algorithm can be viewed as instances of a given learning algorithm on the data and the spatial parameter to be set usually parameters, using different parameter values, and (or) the training data, to produce different results.
Testing and test samples
After the learned model, using the parent row prediction process is called "test" (Testing) , the sample to be predicted is called "test sample" (Testing Sample). For example, in learn f after, the test cases x obtained prediction tag = Y F (X).
Assumptions and Truth (true)
"Hypothesis" that learn a model corresponding to the potential of certain laws about data. "Truth" that this potential law itself. Learning process is to find out the truth or approximation.
Mark with sample
When the "tag" that is, information about the sample results, such as a watermelon is good or bad judgment, "good melon" is a mark. "Sample" that have an example of tag information.
Marking space (output space)
Classification and Regression
"Classification" that is to be predicted is a discrete value, such as "good melon" "bad melon" this type of learning task.
"Return" Jiyu predicted continuous value such as watermelon maturity 0.95 0.37 this type of category learning task.
N-type and anti-type (negative type)
Involving only two categories of "binary" (binary Cl sification) mission, which is usually called a class is "positive type" (positive class)
Another class of "anti-Class" (negative class).
Clustering
Upcoming training set samples into a plurality of groups known as a "cluster" (cluster). For example, in identifying problems in the watermelon into watermelon "light melon" "dark melon" even "in the present sweet potato" "outer potato "in clustering, a" light-colored melon "concepts such as" this sweet potato "we do not know in advance, and the training samples used in the learning process usually does not have tag information.
The difference between classification and clustering
Category: according to some known classes given sample labels, some kind of training to learn the machine (ie, some sort of objective function), to enable it to classify samples of unknown category. It belongs to supervised learning.
Clustering: refers to the category labels did not know in advance of any sample, sample hopes to put a group of unknown class is divided into several classes by some algorithm, clustering, we do not care what a class is, we need to implement the goal is simply to put similar things come together, belonging to the unsupervised learning.
Supervised learning and unsupervised learning
Supervised learning: from a given training data set out a learning function (model parameters), when the arrival of new data, the results can be predicted based on this function. Supervised learning the training set requirements include input and output, it can be said characteristics and objectives. The goal of the training set is marked by the people. Supervised learning is the most common classification (clustering attention and distinction) and regression is a common technique to train the neural network and decision tree.
Unsupervised Learning: the input data is not marked, nor to determine the outcome. Sample data type is unknown, the similarity between samples required sample set clustering (Clustering) according to attempt to minimize the gap between the classes, to maximize the gap between the classes. Popular point is the actual application, many cases can not be known in advance sample of the label, which means that no training samples corresponding category, which can only sample sets from the original sample labels do not start learning classifier design. As for the difference between the two in detail can be found in https://blog.csdn.net/zb1165048017/article/details/48579677
Generalization
The ability to learn model for new samples.
Independent and identically distributed
That is generally assumed that the sample space of all samples subject to unknown "distribution" (Distribution's) D, we get each sample is independent from this sampling distribution obtained. In general, the more training samples, we get about the more information, the more likely to get this model with high generalization ability by learning.
Rote learning
"Remember," training samples, so-called "machine learning" or "rote learning."
Version space
The real problem we often face a lot of hypothesis space, but the learning process is carried out based on a limited sample of the training set, and therefore, there may be multiple hypotheses consistent with the training set, that there is a consistent set of training "set of assumptions" we call it "version of the space."
Induction preferences
The machine learning algorithms in the learning process on the assumption that some type of preference. That is more attention to what kind of situation.
Occam's Razor
I.e. "If multiple hypotheses consistent with the observation, then select the most simple." There are two curves below with limited training set consistent, easy because A smoother curve A indicates generally selected.
No Free Lunch Theorem (NFL)
For a learning algorithm a, if it is on some issues better than learning algorithm b, then there must be another set of problems, where the island is better than a ratio b. Interestingly, this conclusion any algorithms were established. That no matter how clever a learning algorithm, learning algorithms b and more awkward, actually they have the same stringent performance expectations! However, please note! ! ! NFL theorem is an important premise: the same opportunity to all "problems" arise or all issues are equally important, but the case was not so.
The most important implication NFL theorem is clear to us, from the specific issues, vague talk about "what better learning algorithm" meaningless, because consideration of all potential questions, Tony. All learning algorithms are just as good