For example a person could have features
["male", "female"]
["from Europe", "from US", "from Asia"]
["uses Firefox", "uses Chrome", "uses Safari", "uses Internet Explorer"]
.
Such features can be efficiently coded as integers,
for instance could be expressed as ["male", "from US", "uses Internet Explorer"]
[0, 1, 3]
["female", "from Asia", "uses Chrome"]
would be .[1, 2, 1]
The first thing to understand is that the input array is transformed into the following form:
The feature distribution defaults to a column distribution, that is, the first column is a feature and the second column is another.
Through the fit method, analyzing the input array, you can get n_values, that is, how many bits each feature needs to be represented. For example, in the first column, if the range is 0-1, then two digits are required; the second column 0-2 requires three digits; the third column 0-3 requires four digits.
So [0, 1, 3] this array represents three eigenvalues, equivalent to [1, 0, 0, 1, 0, 0, 0, 0, 1]
We can also directly input an array of n_values
If you already know n_values then fit has no meaning (tested that the results of different arrays in fit are the same)