一、One-Hot Encoding
One-Hot
encoding, also known as one-bit valid encoding, mainly uses the bit state register to encode each state. Each state has its own register bit, and only one bit is valid at any time.
In the actual application tasks of machine learning, the features are sometimes not always continuous values, there may be some categorical values, such as gender can be divided into "
male
" and "
female
". In machine learning tasks, for such features, we usually need to digitize them, as in the following example:
There are the following three characteristic properties:
- 性别:["male","female"]
- 地区:["Europe","US","Asia"]
- 浏览器:["Firefox","Chrome","Safari","Internet Explorer"]
Second, the processing method of One-Hot Encoding
For the above problem, the attribute of gender is two-dimensional. Similarly, the region is three-dimensional, and the browser is thinking. In this way, we can use the One-Hot encoding method for the above sample " [" male ", " US "," Internet Explorer "] "encoding, " male " corresponds to [1, 0], similarly " US " corresponds to [0, 1, 0], " Internet Explorer " corresponds to [0 , 0, 0] ,1]. Then the result of the complete feature digitization is: [1,0,0,1,0,0,0,0,1]. One consequence of this is that the data becomes very sparse.
3. The actual Python code
from sklearn import preprocessing enc = preprocessing.OneHotEncoder() enc.fit([[0,0,3],[1,1,0],[0,2,1],[1,0,2]]) array = enc.transform([[0,1,3]]).toarray() print array