Use numpy in python to perform onehot encoding of category features, and how to restore onehot encoding to category features
where: one_hot_label represents the features after onehot encoding, and label is the original category feature
Steps:
- Generate a matrix with a diagonal of 1, n*n, where n represents the number of categories
- input category
- The result of onehot encoding
- Restore the result of onehot encoding
import numpy as np
one_hot = np.eye(28) # 生成对角线为类别个数的矩阵,这里的例子为28个类别
label = np.array([1, 4, 8, 9, 5, 0]) # 输入类别(数字范围为:0-27)
# 进行onehot编码
one_hot_label = one_hot[label.astype(np.int32)] # 该方法即为选取上述生成的矩阵的第几行
# 恢复
label = [one_label.tolist().index(1) for one_label in one_hot_label] # 找到下标是1的位置
# 下面的程序打印了这个过程
# for one_label in one_hot_label:
# print(one_label)
# print('*'*50)
# print(one_label.tolist()) # 输出为一维列表
# print('-'*50)
# print(one_label.tolist().index(1)) # 1在第几个数字第一个出现