Machine learning data preprocessing 1: One-Hot encoding (One-Hot) and its code

Article: http://t.csdn.cn/ndFcq

1. Why use one-hot encoding?

Problem:
In machine learning algorithms, we often encounter categorical features, for example: the gender of a person is male and female, and the motherland is China, the United States, France, etc. These eigenvalues ​​are not continuous, but discrete and disordered.

Purpose:

       如果要作为机器学习算法的输入,通常我们需要对其进行特征数字化。什么是特征数字化呢?例如:

       性别特征:["男","女"]

       祖国特征:["中国","美国,"法国"]

       运动特征:["足球","篮球","羽毛球","乒乓球"]

bottleneck:

If a sample (someone) is characterized by ["Male", "China", "Table Tennis"], we can use [0,0,4] to represent, but such feature processing cannot be directly placed into the machine learning algorithm. Because the categories are unordered.

Guess you like

Origin blog.csdn.net/qq_45583898/article/details/126886293