Python's sklearn: Introduction to LabelEncoder function (encoding and encoding restoration), usage method, detailed guide for specific cases
table of Contents
Introduction to LabelEncoder function (encoding and encoding restoration)
How to use the LabelEncoder function
Specific case of LabelEncoder function
Introduction to LabelEncoder function (encoding and encoding restoration)
class LabelEncoder Found at: sklearn.preprocessing._labelclass LabelEncoder(TransformerMixin, BaseEstimator): |
"" Encode the target tag with a value between 0 and n_class -1 . This converter should be used to encode the target value, * that is, 'y', instead of inputting' X'. For more information, see: ref:'User Guide'. |
.. versionadded:: 0.12 Attributes ---------- classes_ : array of shape (n_class,) Holds the label for each class. Examples -------- `LabelEncoder` can be used to normalize labels. >>> from sklearn import preprocessing >>> le = preprocessing.LabelEncoder() >>> le.fit([1, 2, 2, 6]) LabelEncoder() >>> le.classes_ array([1, 2, 6]) >>> le.transform([1, 1, 2, 6]) array([0, 0, 1, 2]...) >>> le.inverse_transform([0, 0, 1, 2]) array([1, 1, 2, 6]) It can also be used to transform non-numerical labels (as long as they are hashable and comparable) to numerical labels. >>> le = preprocessing.LabelEncoder() >>> le.fit(["paris", "paris", "tokyo", "amsterdam"]) LabelEncoder() >>> list(le.classes_) ['amsterdam', 'paris', 'tokyo'] >>> le.transform(["tokyo", "tokyo", "paris"]) array([2, 2, 1]...) >>> list(le.inverse_transform([2, 2, 1])) ['tokyo', 'tokyo', 'paris'] See also -------- sklearn.preprocessing.OrdinalEncoder : Encode categorical features using an ordinal encoding scheme. sklearn.preprocessing.OneHotEncoder : Encode categorical features as a one-hot numeric array. |
. .versionadded:: 0.12
>>> from sklearn import preprocessing
>>> le = preprocessing.LabelEncoder()
|
""" Parameters Returns Parameters Returns Parameters Returns Parameters Returns |
Methods
|
Fit label encoder |
Fit label encoder and return encoded labels |
|
|
Get parameters for this estimator. |
Transform labels back to original encoding. |
|
|
Set the parameters of this estimator. |
|
Transform labels to normalized encoding. |
LabelEncoder函数的使用方法
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from DataScienceNYY.DataAnalysis import dataframe_fillAnyNull,Dataframe2LabelEncoder
#构造数据
train_data_dict={'Name':['张三','李四','王五','赵六','张七','李八','王十','un'],
'Age':[22,23,24,25,22,22,22,None],
'District':['北京','上海','广东','深圳','山东','河南','浙江',' '],
'Job':['CEO','CTO','CFO','COO','CEO','CTO','CEO','']}
test_data_dict={'Name':['张三','李四','王十一',None],
'Age':[22,23,22,'un'],
'District':['北京','上海','广东',''],
'Job':['CEO','CTO','UFO',' ']}
train_data_df = pd.DataFrame(train_data_dict)
test_data_df = pd.DataFrame(test_data_dict)
print(train_data_df,'\n',test_data_df)
#缺失数据填充
for col in train_data_df.columns:
train_data_df[col]=dataframe_fillAnyNull(train_data_df,col)
test_data_df[col]=dataframe_fillAnyNull(test_data_df,col)
print(train_data_df,'\n',test_data_df)
#数据LabelEncoder化
train_data,test_data=Dataframe2LabelEncoder(train_data_df,test_data_df)
print(train_data,'\n',test_data)
LabelEncoder函数的具体案例
1、基础案例
LabelEncoder can be used to normalize labels.
>>>
>>> from sklearn import preprocessing
>>> le = preprocessing.LabelEncoder()
>>> le.fit([1, 2, 2, 6])
LabelEncoder()
>>> le.classes_
array([1, 2, 6])
>>> le.transform([1, 1, 2, 6])
array([0, 0, 1, 2]...)
>>> le.inverse_transform([0, 0, 1, 2])
array([1, 1, 2, 6])
It can also be used to transform non-numerical labels (as long as they are hashable and comparable) to numerical labels.
>>>
>>> le = preprocessing.LabelEncoder()
>>> le.fit(["paris", "paris", "tokyo", "amsterdam"])
LabelEncoder()
>>> list(le.classes_)
['amsterdam', 'paris', 'tokyo']
>>> le.transform(["tokyo", "tokyo", "paris"])
array([2, 2, 1]...)
>>> list(le.inverse_transform([2, 2, 1]))
['tokyo', 'tokyo', 'paris']
2、在数据缺失和test数据内存在新值(train数据未出现过)环境下的数据LabelEncoder化
参考文章:Python之sklearn:LabelEncoder函数的使用方法之使用LabelEncoder之前的必要操作
import numpy as np
from sklearn.preprocessing import LabelEncoder
#训练train数据
LE= LabelEncoder()
LE.fit(train_df[col])
#test数据中的新值添加到LE.classes_
test_df[col] =test_df[col].map(lambda s:'Unknown' if s not in LE.classes_ else s)
LE.classes_ = np.append(LE.classes_, 'Unknown')
#分别转化train、test数据
train_df[col] = LE.transform(train_df[col])
test_df[col] = LE.transform(test_df[col])