My CSDN blog column: https://blog.csdn.net/yty_7
Github address: https://github.com/yot777/
Well, machine learning is on the topic, let's start with the examples around you.
What is a label
Labels , in layman's terms, are an empirical classification of something .
Everyone knows that people are tall, short, fat , and thin , so how is height defined?
I checked, and the explanation of "high" in the dictionary is: the distance from bottom to top is large; far from the ground (as opposed to "low"). Why do you want to explain "high" and "low"?
In order to figure out "high", I checked "low" again: the distance from bottom to top is small; close to the ground (as opposed to "high").
I really want to use the famous quote of Chen Meijia in the love apartment to say to the dictionary: "I'll spray you with salt soda!"
In short, it is impossible to figure out "high" from the dictionary. But each of us really feels that some people grow tall and some people grow low (short), why?
What are the characteristics
If you ask Yao Ming, who do you think is tall? He might say: 2 meters or more is tall!
If you ask beauty models, what kind of men do you think are tall? She may say that it is taller than 1 meter 75!
If you ask ordinary men, what kind of women are tall? He may say that it is taller than 1 meter 6!
If you ask a tribe of a small country in Africa, he might say that one meter or more is a giant!
do you understand? In fact, everyone has a steelyard in their hearts! Behind the label, there is some kind of index that can be quantified as a number to support, this is called a feature .
Matching of features and tags
Okay, explained the features and tags, now we can match the features and tags. Then the above topic:
Humans in Yao Ming's eyes:
Features (height, in meters) | label | |
A | 1.51 | short |
B | 1.61 | short |
C | 1.76 | short |
D | 2.1 | high |
Humans in the eyes of beautiful models:
Features (height, in meters) | label | |
A | 1.51 | short |
B | 1.61 | short |
C | 1.76 | high |
D | 2.1 | high |
Humans in the eyes of ordinary men:
Features (height, in meters) | label | |
A | 1.51 | short |
B | 1.61 | high |
C | 1.76 | high |
D | 2.1 | high |
Humans in the eyes of a tribe in Africa:
Features (height, in meters) | label | |
A | 1.51 | high |
B | 1.61 | high |
C | 1.76 | high |
D | 2.1 | high |
It can be seen that in the eyes of different people, even the same label represents different meanings. Below we will further discuss with "human beings in the eyes of ordinary men".
We already know that the standard of human height in the eyes of ordinary men is: 1 meter 6 or more is tall.
Then there are people of other heights, we can easily get the labels of others, as follows:
Features (height, in meters) | label | |
A | 1.51 | short |
B | 1.61 | high |
C | 1.76 | high |
D | 2.1 | high |
E | 1.58 | short |
F | 1.68 | high |
In order to facilitate computer recognition, we usually change the label to a number like 0/1/2 . In this example, we use 0 for "short" and 1 for "high", so the above table is simplified as
Features (height, in meters) | label | |
A | 1.51 | 0 |
B | 1.61 | 1 |
C | 1.76 | 1 |
D | 2.1 | 1 |
E | 1.58 | 0 |
F | 1.68 | 1 |
This forms a characteristic of human height -the label matrix . Generally speaking, the labels are located in the last column of the matrix .
Python implementation features-label matrix
code show as below:
import numpy as np
S = np.array([[1.51,0],[1.61,1],[1.76,1],[2.1,1],[1.58,0],[1.68,1]])
print(S)
#运行结果:
[[1.51 0. ]
[1.61 1. ]
[1.76 1. ]
[2.1 1. ]
[1.58 0. ]
[1.68 1. ]]
Remove labels and features:
import numpy as np
S = np.array([[1.51,0],[1.61,1],[1.76,1],[2.1,1],[1.58,0],[1.68,1]])
#原始特征——标签矩阵
print("原始特征——标签矩阵是\n",S)
#取出标签
print("标签是\n",S[:,-1])
#取出特征
print("特征是\n",S[:,0:-1])
运行结果:
原始特征——标签矩阵是
[[1.51 0. ]
[1.61 1. ]
[1.76 1. ]
[2.1 1. ]
[1.58 0. ]
[1.68 1. ]]
标签是
[0. 1. 1. 1. 0. 1.]
特征是
[[1.51]
[1.61]
[1.76]
[2.1 ]
[1.58]
[1.68]]
Please note that the labels are located in the last column of the matrix , so we have used the column (array) column writing method described earlier:
[] In colon comma (i.e. :, ) at the beginning, taking represents column element , and therefore S [:, - 1] is from the right beginning of the first number of columns , i.e. label .
再看本例中取特征的写法是S[:,0:-1],表示从左开始数直到最右边第1列的左列为止(如果不理解,请回看:杨桃的Python进阶讲座16——数组array(六)一维数组和二维数组的索引和取值),貌似直接写S[:,0]也没问题?
需要说明,本例只有一个特征列,实际应用中往往是多个特征列对应一个标签,因此这里取特征的写法S[:,0:-1]是通用的写法。
总结
标签,是对某人某事物的凭经验的分类 。
某种可以量化为数字的指标,叫做特征。
通常把标签改为0/1/2这样的数字便于计算机处理。
可以把标签和特征配对形成:特征——标签矩阵,一般来说,标签都是位于该矩阵的最后一列。
利用Python实现特征——标签矩阵S,S[:,-1]表示标签,S[:,0:-1]表示特征。
本人CSDN博客专栏:https://blog.csdn.net/yty_7
Github地址:https://github.com/yot777/
如果您觉得本篇本章对您有所帮助,欢迎关注、评论、点赞!Github欢迎您的Follow、Star!