One-Hot Encoding

一、One-Hot Encoding

    One-Hot encoding, also known as one-bit valid encoding, mainly uses the bit state register to encode each state. Each state has its own register bit, and only one bit is valid at any time.
    In the actual application tasks of machine learning, the features are sometimes not always continuous values, there may be some categorical values, such as gender can be divided into " male " and " female ". In machine learning tasks, for such features, we usually need to digitize them, as in the following example:
There are the following three characteristic properties:
  • 性别:["male","female"]
  • 地区:["Europe","US","Asia"]
  • 浏览器:["Firefox","Chrome","Safari","Internet Explorer"]
For a certain sample, such as [" male "," US "," Internet Explorer "], we need to digitize the features of this classification value, the most direct method, we can use serialization: [0,1,3 ]. But such feature processing cannot be put directly into machine learning algorithms.

Second, the processing method of One-Hot Encoding

    For the above problem, the attribute of gender is two-dimensional. Similarly, the region is three-dimensional, and the browser is thinking. In this way, we can use the One-Hot encoding method for the above sample " [" male ", " US "," Internet Explorer "] "encoding, " male " corresponds to [1, 0], similarly " US " corresponds to [0, 1, 0], " Internet Explorer " corresponds to [0 , 0, 0] ,1]. Then the result of the complete feature digitization is: [1,0,0,1,0,0,0,0,1]. One consequence of this is that the data becomes very sparse.


3. The actual Python code

from sklearn import preprocessing  
  
enc = preprocessing.OneHotEncoder()  
enc.fit([[0,0,3],[1,1,0],[0,2,1],[1,0,2]])  
  
array = enc.transform([[0,1,3]]).toarray()  
  
print array

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325727510&siteId=291194637