Starfruit Python Machine Learning 2-Labels and Features

My CSDN blog column: https://blog.csdn.net/yty_7

Github address: https://github.com/yot777/

 

Well, machine learning is on the topic, let's start with the examples around you.

What is a label

Labels , in layman's terms, are an empirical classification of something .

Everyone knows that people are tall, short, fat , and thin , so how is height defined?

I checked, and the explanation of "high" in the dictionary is: the distance from bottom to top is large; far from the ground (as opposed to "low"). Why do you want to explain "high" and "low"?

In order to figure out "high", I checked "low" again: the distance from bottom to top is small; close to the ground (as opposed to "high").

I really want to use the famous quote of Chen Meijia in the love apartment to say to the dictionary: "I'll spray you with salt soda!"

In short, it is impossible to figure out "high" from the dictionary. But each of us really feels that some people grow tall and some people grow low (short), why?

What are the characteristics

If you ask Yao Ming, who do you think is tall? He might say: 2 meters or more is tall!

If you ask beauty models, what kind of men do you think are tall? She may say that it is taller than 1 meter 75!

If you ask ordinary men, what kind of women are tall? He may say that it is taller than 1 meter 6!

If you ask a tribe of a small country in Africa, he might say that one meter or more is a giant!

do you understand? In fact, everyone has a steelyard in their hearts! Behind the label, there is some kind of index that can be quantified as a number to support, this is called a feature .

Matching of features and tags

Okay, explained the features and tags, now we can match the features and tags. Then the above topic:

Humans in Yao Ming's eyes:

  Features (height, in meters) label
A 1.51 short
B 1.61 short
C 1.76 short
D 2.1 high

Humans in the eyes of beautiful models:

  Features (height, in meters) label
A 1.51 short
B 1.61 short
C 1.76 high
D 2.1 high

Humans in the eyes of ordinary men:

  Features (height, in meters) label
A 1.51 short
B 1.61 high
C 1.76 high
D 2.1 high

Humans in the eyes of a tribe in Africa:

  Features (height, in meters) label
A 1.51 high
B 1.61 high
C 1.76 high
D 2.1 high

It can be seen that in the eyes of different people, even the same label represents different meanings. Below we will further discuss with "human beings in the eyes of ordinary men".

We already know that the standard of human height in the eyes of ordinary men is: 1 meter 6 or more is tall.

Then there are people of other heights, we can easily get the labels of others, as follows:

  Features (height, in meters) label
A 1.51 short
B 1.61 high
C 1.76 high
D 2.1 high
E 1.58 short
F 1.68 high 

In order to facilitate computer recognition, we usually change the label to a number like 0/1/2 . In this example, we use 0 for "short" and 1 for "high", so the above table is simplified as

  Features (height, in meters) label
A 1.51 0
B 1.61 1
C 1.76 1
D 2.1 1
E 1.58 0
F 1.68 1

This forms a characteristic of human height -the label matrix . Generally speaking, the labels are located in the last column of the matrix .

Python implementation features-label matrix

code show as below:

import numpy as np
S = np.array([[1.51,0],[1.61,1],[1.76,1],[2.1,1],[1.58,0],[1.68,1]])
print(S)

#运行结果:
[[1.51 0.  ]
 [1.61 1.  ]
 [1.76 1.  ]
 [2.1  1.  ]
 [1.58 0.  ]
 [1.68 1.  ]]

Remove labels and features:

import numpy as np
S = np.array([[1.51,0],[1.61,1],[1.76,1],[2.1,1],[1.58,0],[1.68,1]])
#原始特征——标签矩阵
print("原始特征——标签矩阵是\n",S)
#取出标签
print("标签是\n",S[:,-1])
#取出特征
print("特征是\n",S[:,0:-1])

运行结果:
原始特征——标签矩阵是
 [[1.51 0.  ]
 [1.61 1.  ]
 [1.76 1.  ]
 [2.1  1.  ]
 [1.58 0.  ]
 [1.68 1.  ]]
标签是
 [0. 1. 1. 1. 0. 1.]
特征是
 [[1.51]
 [1.61]
 [1.76]
 [2.1 ]
 [1.58]
 [1.68]]

Please note that the labels are located in the last column of the matrix , so we have used the column (array) column writing method described earlier:

[] In colon comma (i.e. :, ) at the beginning, taking represents column element , and therefore S [:, - 1] is from the right beginning of the first number of columns , i.e. label .

再看本例中取特征的写法是S[:,0:-1],表示开始数直到最右边第1列的左列为止(如果不理解,请回看:杨桃的Python进阶讲座16——数组array(六)一维数组和二维数组的索引和取值),貌似直接写S[:,0]也没问题?

需要说明,本例只有一个特征列,实际应用中往往是多个特征列对应一个标签,因此这里取特征的写法S[:,0:-1]是通用的写法。

 

总结

标签,是对某人某事物的凭经验的分类 。

某种可以量化为数字的指标,叫做特征

通常把标签改为0/1/2这样的数字便于计算机处理。

可以把标签和特征配对形成:特征——标签矩阵,一般来说,标签都是位于该矩阵的最后一列。

利用Python实现特征——标签矩阵S,S[:,-1]表示标签S[:,0:-1]表示特征。

 

本人CSDN博客专栏:https://blog.csdn.net/yty_7

Github地址:https://github.com/yot777/

如果您觉得本篇本章对您有所帮助,欢迎关注、评论、点赞!Github欢迎您的Follow、Star!

发布了55 篇原创文章 · 获赞 16 · 访问量 6111

Guess you like

Origin blog.csdn.net/yty_7/article/details/105003781