sklearn onehotencoder()

 For example a person could have features 

["male", "female"]

["from Europe", "from US", "from Asia"]

["uses Firefox", "uses Chrome", "uses Safari", "uses Internet Explorer"].

Such features can be efficiently coded as integers,

for instance  could be expressed as ["male", "from US", "uses Internet Explorer"][0, 1, 3] 

["female", "from Asia", "uses Chrome"] would be .[1, 2, 1]

The first thing to understand is that the input array is transformed into the following form:

The feature distribution defaults to a column distribution, that is, the first column is a feature and the second column is another.

Through the fit method, analyzing the input array, you can get n_values, that is, how many bits each feature needs to be represented. For example, in the first column, if the range is 0-1, then two digits are required; the second column 0-2 requires three digits; the third column 0-3 requires four digits.

So [0, 1, 3] this array represents three eigenvalues, equivalent to [1, 0, 0, 1, 0, 0, 0, 0, 1]

 

We can also directly input an array of n_values

If you already know n_values ​​then fit has no meaning (tested that the results of different arrays in fit are the same)

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325606936&siteId=291194637