Hot encoded text attributes treatment

Learning Source: the Click here Wallpaper

When there is a text attribute data, machine learning algorithm is not easy to handle text attribute, this time the need to convert into digital text attribute. Conversion, if the sequence relationships among attributes, e.g. :( cold, warm, heat) can be used directly to integer coding; but when there is no sequence relationship between attributes, e.g. :( red, green, blue), heat alone may be used coding.

Hot encoded: encoded attribute is 1, the remaining value of the attribute 0

First, the artificial one-hot encoding

from numpy improt argmax

data = 'hello world'
alphabet = 'abcdefghigklmnopqrstuvwxyz '
char_to_int = dict((c, i) for i, c in enumerate(alphabet))
int_to_char = dict((i, c) for i, c in enumerate(alphabet))
#整数编码
integer_encoded = [char_to_int[char] for char in data]
print(integer_encoded)
#独热编码
OneHot_Encoder = list()
for i ininteger_encoded: 
     Letter = [0 for _ in Range (len (Alphabet))] 
     Letter [I] =. 1 
     OneHot_Encoder.append (Letter) 
Print (OneHot_Encoder)
 # recover data from a hot encoded 
Inverted = int_to_char [the argmax (OneHot_Encoder [0] )]
 Print (Inverted)
 
#output:

Two, Scikit-Learn hot encoded

from numpy import argmax
from numpy import array
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder

#整数编码
data = array(['cold', 'cold', 'warm', 'hot', 'warm', 'cold', 'hot', 'hot', 'warm', 'cold'])
label_encoder = LabelEncoder()
label_encoded = label_encoder.fit_transform(data)
print(label_encoded)
#独热编码
onehot_encoder = OneHotEncoder(categories='auto')
onehot_encoded = onehot_encoder.fit_transform(label_encoded.reshape(-1, 1))
onehot = onehot_encoded.toarray()
print(onehot)
#恢复编码
state = label_encoder.inverse_transform([argmax(onehot[0, :])])
print(state)

#output:
 

 

Guess you like

Origin www.cnblogs.com/pineapple-chicken/p/12402273.html