The meaning of np_utils.to_categorical(y_train, 2)

Original: https://www.cnblogs.com/lhuser/p/9073012.html
Why use np_utils.to_categorical(y_train, 2) to convert the original label to a column of [1,0,0,0,1...] It is a one-hot code with one row and two columns.
Example:
Take the iris flower data as an example:
sample, label
1, Iris-setosa
2, Iris-versicolor
3, Iris-virginica
transformed with one hot encoding as follows:

sample, Iris-setosa, Iris-versicolor, Iris-virginica
1, 1, 0, 0
2, 0, 1, 0
3, 0, 0, 1

  1. The multi-class classification problem is similar to the second-class classification problem. The output label of the categorical function needs to be converted into a numerical variable. This problem is directly converted to (0, 1) (the output layer uses the sigmoid function) or (-1, 1) (the output layer uses the tanh function) in the two-class classification. Similarly, in the multi-classification problem, we will convert to dummy variable: that is, use one hot encoding method to convert the vector of the output label to 1 in the column where the corresponding label appears, and the rest to 0 Boolean matrix.
  2. Be careful not to directly convert the label into a numerical variable, such as 1, 2, 3, so that the prediction problem is more like a regression prediction problem than the former. The latter is more difficult than the former. (When there are more categories, the span of the output value will be larger. At this time, the activation function of the output layer can only use linear)
  3. We can use the np_utils.to_categorical function in keras to perform the transformation work in this step.
  4. Keras is a simple and modular neural network framework developed based on Theano or Tensorflow, so it is easier to build a network structure with Keras than Tensorflow. Here we will use the KerasClassifier class provided by Keras, this class can be used as an Estimator in the scikit-learn package, so we can use this class to conveniently call some functions in the sklearn package for data preprocessing and result evaluation (this is sklearn The basic type of model in the package).
    For the network structure, we use a 3-layer omnidirectional connection, with 4 nodes in the input layer, 10 nodes in the hidden layer, and 3 nodes in the output layer. Among them, the activation function of the hidden layer is relu (rectifier), and the activation function of the output layer is softmax. For the loss function, choose categorical_crossentropy accordingly (this function comes from theano or tensorflow, see here for details) (activation='sigmoid', loss='binary_crossentropy' is generally selected for two classifications).
    PS: For the multi-class classification network structure, increasing the intermediate hidden layer can improve the training accuracy, but the required calculation time and space will increase, so it is necessary to test and select an appropriate number, here we set it to 10; in addition, The dropout rate of each layer also needs to be adjusted accordingly (too high is easy to underfit, too low to overfit), here we set it to 0.2.

Guess you like

Origin blog.csdn.net/qq_37706433/article/details/113923854