Article: http://t.csdn.cn/ndFcq
1. Why use one-hot encoding?
Problem:
In machine learning algorithms, we often encounter categorical features, for example: the gender of a person is male and female, and the motherland is China, the United States, France, etc. These eigenvalues are not continuous, but discrete and disordered.
Purpose:
如果要作为机器学习算法的输入,通常我们需要对其进行特征数字化。什么是特征数字化呢?例如:
性别特征:["男","女"]
祖国特征:["中国","美国,"法国"]
运动特征:["足球","篮球","羽毛球","乒乓球"]
bottleneck:
If a sample (someone) is characterized by ["Male", "China", "Table Tennis"], we can use [0,0,4] to represent, but such feature processing cannot be directly placed into the machine learning algorithm. Because the categories are unordered.