Watermelon books: The first chapter

1.1 The term substantially
assume we collect data about a number of watermelon, for example:
(color = green; = curled pedicle; = voiced sound loud knock),
(color = black; pedicle: slightly curled; knocking sound boring =),
(= the shallow color; t pedicle stiffness; = knocking sound crisp),
......,

Each pair of parentheses is a record,

  • Set the recording is referred to as a " data set " (data set);

  • Each time a record of the event or object (referred to herein as watermelon) description, referred to a " sample " (instance) or " sample " (Sample);

  • Matters or properties reflect the performance event or object in a certain area, called a " property " (attribute) or a " feature " (Feature); embodiment, "color", "pedicle"

  • Attribute value referred to as " attribute value " (attribute value), Example, "green", "black";

  • The space spanned by the properties referred to as " attribute space " (attribute space), " sample space " (sample space) or " input space ." For example, the "color", "pedicle", "knocking sound" as the three axes, they span a three-dimensional space for describing watermelon, watermelon each can find their place in this coordinate space;

  • Since every point corresponds to a coordinate vector, and therefore also the one example is called a " feature vector " (feature vector);

  • Example have a flag information, referred to as " sample " (Example);

If the prediction is a discrete value, such learning task called " Category " (classification), such as "good melon", "bad melon";

If the predicted continuous value, such a learning task called " regression " (regression), such as watermelon maturity 0.95,0.37;

Can do watermelon " clusters " (Clustering), watermelon upcoming training set is divided into several groups, each group is called a " cluster " (Cluster); these clusters may correspond to automatically form a number of potential concept of classification, such as "shallow color melon "," dark melon. " This learning process will help us to understand the inherent law of data, to build a foundation for more in-depth analysis of the data. Note that clustering, the "light-colored melon", "dark melon" These concepts are not known in advance, and the training samples used in the learning process usually no tag information. Of course, there are exceptions.

Is there a training data based on the tag information, learning tasks can be broadly divided into two categories: " supervised learning " (supervised learning) and " unsupervised learning " (unsupervised learning), classification and regression on behalf of the former, the latter clustering representatives .

Get the ability to learn new model is applicable to samples, called " generalization " (generalization) capacity, with strong generalization ability of the model is well suited for the entire sample space.

发布了70 篇原创文章 · 获赞 29 · 访问量 4万+

Guess you like

Origin blog.csdn.net/LOVEYSUXIN/article/details/104055142
Recommended