Python's sklearn: Introduction to LabelEncoder function (encoding and encoding restoration), usage method, detailed guide for specific cases

table of Contents

Introduction to LabelEncoder function (encoding and encoding restoration)

Methods

How to use the LabelEncoder function

Specific case of LabelEncoder function

1. Basic case

2. LabelEncoderization of data in an environment with missing data and new values in test data (train data has not appeared)

Introduction to LabelEncoder function (encoding and encoding restoration)

class LabelEncoder Found at: sklearn.preprocessing._labelclass LabelEncoder(TransformerMixin, BaseEstimator):
"""Encode target labels with value between 0 and n_classes-1.
This transformer should be used to encode target values, *i.e.* `y`, and not the input `X`.
Read more in the :ref:`User Guide <preprocessing_targets>`.

"" Encode the target tag with a value between 0 and n_class -1 .

This converter should be used to encode the target value, * that is, 'y', instead of inputting' X'.

For more information, see: ref:'User Guide'.

.. versionadded:: 0.12

Attributes
----------
classes_ : array of shape (n_class,)
Holds the label for each class.

Examples
--------
`LabelEncoder` can be used to normalize labels.

>>> from sklearn import preprocessing
>>> le = preprocessing.LabelEncoder()
>>> le.fit([1, 2, 2, 6])
LabelEncoder()
>>> le.classes_
array([1, 2, 6])
>>> le.transform([1, 1, 2, 6])
array([0, 0, 1, 2]...)
>>> le.inverse_transform([0, 0, 1, 2])
array([1, 1, 2, 6])

It can also be used to transform non-numerical labels (as long as they are hashable and comparable) to numerical labels.

>>> le = preprocessing.LabelEncoder()
>>> le.fit(["paris", "paris", "tokyo", "amsterdam"])
LabelEncoder()
>>> list(le.classes_)
['amsterdam', 'paris', 'tokyo']
>>> le.transform(["tokyo", "tokyo", "paris"])
array([2, 2, 1]...)
>>> list(le.inverse_transform([2, 2, 1]))
['tokyo', 'tokyo', 'paris']

See also
--------
sklearn.preprocessing.OrdinalEncoder : Encode categorical features using an ordinal encoding scheme.
sklearn.preprocessing.OneHotEncoder : Encode categorical features as a one-hot numeric array.

. .versionadded:: 0.12

attribute
----------
classes_: shape array (n_class,) to
save the label of each class.

Example
-------
"LabelEncoder" can be used to normalize labels.

>>> from sklearn import preprocessing
>>> le = preprocessing.LabelEncoder()
>>> le.fit([1, 2, 2, 6])
LabelEncoder()
>>> le.classes_
array([1, 2, 6])
>>> le.transform([1, 1, 2, 6])
array([0, 0, 1, 2]...)
>>> le.inverse_transform([0, 0, 1, 2])
array([1, 1, 2, 6])

It can also be used to convert non-digital tags (as long as they are hashable and comparable) into digital tags .

>>> le = preprocessing.LabelEncoder()
>>> le.fit(["paris", "paris", "tokyo", "amsterdam"])
LabelEncoder()
>>> list(le.classes_)
['amsterdam', 'paris', 'tokyo']
>>> le.transform(["tokyo", "tokyo", "paris"])
array([2, 2, 1]...)
>>> list(le.inverse_transform([2, 2, 1]))
['tokyo', 'tokyo', 'paris']

另请参阅
--------
sklearn.preprocessing.OrdinalEncoder :序号编码器:使用序号编码方案编码分类特征。
sklearn.preprocessing.OneHotEncoder : 将分类特性编码为一个热的数字数组。

"""
def fit(self, y):
"""Fit label encoder

Parameters
----------
y : array-like of shape (n_samples,)
Target values.

Returns
-------
self : returns an instance of self.
"""
y = column_or_1d(y, warn=True)
self.classes_ = _encode(y)
return self

def fit_transform(self, y):
"""Fit label encoder and return encoded labels

Parameters
----------
y : array-like of shape [n_samples]
Target values.

Returns
-------
y : array-like of shape [n_samples]
"""
y = column_or_1d(y, warn=True)
self.classes_, y = _encode(y, encode=True)
return y

def transform(self, y):
"""Transform labels to normalized encoding.

Parameters
----------
y : array-like of shape [n_samples]
Target values.

Returns
-------
y : array-like of shape [n_samples]
"""
check_is_fitted(self)
y = column_or_1d(y, warn=True)
# transform of empty array is empty array
if _num_samples(y) == 0:
return np.array([])
_, y = _encode(y, uniques=self.classes_, encode=True)
return y

def inverse_transform(self, y):
"""Transform labels back to original encoding.

Parameters
----------
y : numpy array of shape [n_samples]
Target values.

Returns
-------
y : numpy array of shape [n_samples]
"""
check_is_fitted(self)
y = column_or_1d(y, warn=True)
# inverse transform of empty array is empty array
if _num_samples(y) == 0:
return np.array([])
diff = np.setdiff1d(y, np.arange(len(self.classes_)))
if len(diff):
raise ValueError(
"y contains previously unseen labels: %s" % str(diff))
y = np.asarray(y)
return self.classes_[y]

def _more_tags(self):
return {'X_types':['1dlabels']}

Methods

`fit`(y)	Fit label encoder
`fit_transform`(y)	Fit label encoder and return encoded labels
`get_params`([deep])	Get parameters for this estimator.
`inverse_transform`(y)	Transform labels back to original encoding.
`set_params`(**params)	Set the parameters of this estimator.
`transform`(y)	Transform labels to normalized encoding.

LabelEncoder函数的使用方法

import pandas as pd
from sklearn.preprocessing import LabelEncoder
from DataScienceNYY.DataAnalysis import dataframe_fillAnyNull,Dataframe2LabelEncoder


#构造数据
train_data_dict={'Name':['张三','李四','王五','赵六','张七','李八','王十','un'],
                'Age':[22,23,24,25,22,22,22,None],
                'District':['北京','上海','广东','深圳','山东','河南','浙江',' '],
                'Job':['CEO','CTO','CFO','COO','CEO','CTO','CEO','']}
test_data_dict={'Name':['张三','李四','王十一',None],
                'Age':[22,23,22,'un'],
                'District':['北京','上海','广东',''],
                'Job':['CEO','CTO','UFO',' ']}
train_data_df = pd.DataFrame(train_data_dict)
test_data_df = pd.DataFrame(test_data_dict)
print(train_data_df,'\n',test_data_df)


#缺失数据填充
for col in train_data_df.columns:
        train_data_df[col]=dataframe_fillAnyNull(train_data_df,col)
        test_data_df[col]=dataframe_fillAnyNull(test_data_df,col)
print(train_data_df,'\n',test_data_df)


#数据LabelEncoder化
train_data,test_data=Dataframe2LabelEncoder(train_data_df,test_data_df)
print(train_data,'\n',test_data)

LabelEncoder函数的具体案例

1、基础案例

LabelEncoder can be used to normalize labels.

>>>
>>> from sklearn import preprocessing
>>> le = preprocessing.LabelEncoder()
>>> le.fit([1, 2, 2, 6])
LabelEncoder()
>>> le.classes_
array([1, 2, 6])
>>> le.transform([1, 1, 2, 6])
array([0, 0, 1, 2]...)
>>> le.inverse_transform([0, 0, 1, 2])
array([1, 1, 2, 6])
It can also be used to transform non-numerical labels (as long as they are hashable and comparable) to numerical labels.

>>>
>>> le = preprocessing.LabelEncoder()
>>> le.fit(["paris", "paris", "tokyo", "amsterdam"])
LabelEncoder()
>>> list(le.classes_)
['amsterdam', 'paris', 'tokyo']
>>> le.transform(["tokyo", "tokyo", "paris"])
array([2, 2, 1]...)
>>> list(le.inverse_transform([2, 2, 1]))
['tokyo', 'tokyo', 'paris']

2、在数据缺失和test数据内存在新值(train数据未出现过)环境下的数据LabelEncoder化

参考文章：Python之sklearn：LabelEncoder函数的使用方法之使用LabelEncoder之前的必要操作

import numpy as np
from sklearn.preprocessing import LabelEncoder

#训练train数据
LE= LabelEncoder()
LE.fit(train_df[col])

#test数据中的新值添加到LE.classes_
test_df[col] =test_df[col].map(lambda s:'Unknown' if s not in LE.classes_ else s) 
LE.classes_ = np.append(LE.classes_, 'Unknown') 
 
#分别转化train、test数据
train_df[col] = LE.transform(train_df[col]) 
test_df[col] = LE.transform(test_df[col])