Machine Learning 1 - Basic Concepts

Reference: https://www.jianshu.com/p/cbe8e0fe7b2c

data set

(色泽=青绿；根蒂=蜷缩；敲声=浊响）
(色泽=墨绿；根蒂=稍蜷；敲声=沉闷）
(色泽=浅白；根蒂=硬挺；敲声=清脆）
······

basic concept:

1. Sample - each pair of parentheses in this batch of data.
2. Data set - a collection of samples.
3. Characteristics, attributes - observable aspects that reflect the essence of things, such as color, root, and knock.
4. Attribute Values - Youth Hostel, Dark Green, Curled Up, Loud Sound, etc. are the values of attributes.
5. Attribute space, sample space, input space - the space formed by attributes. Take the attribute as the coordinate axis to form a space, then the sample is a point in this space. For example, taking "color", "root", and "knocking" as the coordinate axes, a three-dimensional space is generated, and each watermelon is a point in this space.
6. Dimensionality – The number of features in the dataset. The dimension in this example is 3.
7. Hypothesis - also known as hypothesis function, refers to a function (prediction model) obtained by the computer after learning.
8. Marking - information about the result of the sample, such as a watermelon (color = green; root = curled up; knocking sound = murky sound) is a good melon, then the "good melon" is (color = green; root = curled up) ;knocking=voicing) The label for this sample.
9. Sample - a sample with a mark, such as ((color = green; root = curled up; knocking sound = turbid sound), good melon)
10. Mark space, output space - a collection of all marks. In this case, it means {good melon, bad melon}.
11. Generalization - If a model (hypothetical function) trained with a sample of a data set can be applied to new sample data, the model is said to have generalization ability. The more new data the model can apply to, the better its generalization ability.

12. Hypothetical space----The space composed of all hypotheses, there are 2 kinds of hypothetical colors (dark green, light green), 2 kinds of roots (curled, slightly curled), and 2 kinds of knocking sounds (turbid, dull) , then it is possible that a good melon has nothing to do with the color, so there are 3 possibilities for the color, and so on. It does not hold at all, so there are 3*3*3+1 hypotheses in total

Generalization: Suppose a thing has 2 features, and each feature has x and y attributes respectively, then the hypothesis space has (x+1)*(y+1)+1

Classification: Predict discrete values, such as good and bad melons

Regression: predicting continuous values, such as the relationship between house prices and area

Inductive preference: an algorithm's preference for a certain type of hypothesis

Overfitting: Machine learning needs to train a model from the training set, which can be well applied to new samples. When the model regards some features in the training set as a general rule, it may not be applicable to the new set, resulting in overfitting

Machine Learning 1 - Basic Concepts

Guess you like