Python3 fun machine learning (1)

Fundamental Concepts of Machine Learning

data

  • Famous iris data https://en.wikipedia.org/wiki/lris_flower_data_set

lris setossa    lris versicolor    lris verginica

          lris setossa lris versicolor lris verginica

 

Here is the data for iris:

 

  • The data as a whole is called a data set
  • Each row of data is called a sample
  • Except for the last column, each column expresses a feature of the sample
  • The last column, called the label

The i-th sample row is written  , also called the feature vector. The jth eigenvalue of  the ith sample The label of the ith sample is written as

For the convenience of visualizing the features, we only extract the first two features in the features, in which the length of the sepal is taken as the horizontal axis, and the width of the sepal is taken as the vertical axis.

Draw the following image:

For each sample, a point will be represented in the coordinate system. Assuming we have three features, we can represent it in three-dimensional space. Similarly, if there are 1000 features, we can represent it in 1000-dimensional space, and This space for drawing samples is called the feature space .

After drawing the sample points visually, we can easily draw a straight line, the red samples are on one side of the line and the blue samples are on the other side of the line.

The essence of the classification task is to segment in the feature space, and the same is true in the high-dimensional space.

The iris has 4 features, which should be analyzed in a 4-dimensional feature space.

Features can be very abstract

  • Image, each pixel is a feature
  • A 28*28 image has 28*28=784 features
  • If it is a color image feature more

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324603735&siteId=291194637