Learning Source: the Click here Wallpaper
When there is a text attribute data, machine learning algorithm is not easy to handle text attribute, this time the need to convert into digital text attribute. Conversion, if the sequence relationships among attributes, e.g. :( cold, warm, heat) can be used directly to integer coding; but when there is no sequence relationship between attributes, e.g. :( red, green, blue), heat alone may be used coding.
Hot encoded: encoded attribute is 1, the remaining value of the attribute 0
First, the artificial one-hot encoding
from numpy improt argmax data = 'hello world' alphabet = 'abcdefghigklmnopqrstuvwxyz ' char_to_int = dict((c, i) for i, c in enumerate(alphabet)) int_to_char = dict((i, c) for i, c in enumerate(alphabet)) #整数编码 integer_encoded = [char_to_int[char] for char in data] print(integer_encoded) #独热编码 OneHot_Encoder = list() for i ininteger_encoded: Letter = [0 for _ in Range (len (Alphabet))] Letter [I] =. 1 OneHot_Encoder.append (Letter) Print (OneHot_Encoder) # recover data from a hot encoded Inverted = int_to_char [the argmax (OneHot_Encoder [0] )] Print (Inverted)
#output:
Two, Scikit-Learn hot encoded
from numpy import argmax from numpy import array from sklearn.preprocessing import LabelEncoder from sklearn.preprocessing import OneHotEncoder #整数编码 data = array(['cold', 'cold', 'warm', 'hot', 'warm', 'cold', 'hot', 'hot', 'warm', 'cold']) label_encoder = LabelEncoder() label_encoded = label_encoder.fit_transform(data) print(label_encoded) #独热编码 onehot_encoder = OneHotEncoder(categories='auto') onehot_encoded = onehot_encoder.fit_transform(label_encoded.reshape(-1, 1)) onehot = onehot_encoded.toarray() print(onehot) #恢复编码 state = label_encoder.inverse_transform([argmax(onehot[0, :])]) print(state)
#output: