Classification (vii) multi-label classification and multiple-output classification

Multi-label classification

Until now, we see a model and data are the data classified as a category. In some cases, we may need to classify multiple categories for each output data. For example, suppose you have a face classifier, if it is in the same picture recognized the multiple faces, then it should do what output? Obviously, it should face marked with a flag for each person it recognizable.

Assuming that the face classifier has been trained to recognize the three face, Xiao Ming, red and Xiaoqiang. When an input image, and assuming small strong Xiaoming above, the classifier should be output [1, 0, 1]. Outputting a plurality of binary identification of this classification system is called multi-label classification system (multilabel classification system).

We do not go into face recognition, but we can look at a simple example, as a showcase:

from sklearn.neighbors import KNeighborsClassifier

y_train_large = (y_train >= 7)
y_train_odd = (y_train % 2 == 1)
y_multilabel = np.c_[y_train_large, y_train_odd]

knn_clf = KNeighborsClassifier()
knn_clf.fit(X_train, y_multilabel)

 

This code creates a y_multilabel array, for each of the data, the target comprising two categories: the first number indicates that this is not a handwritten large number (7,8,9), the second category is that it represents whether it is an odd number. After we create an instance of KNeighborsClassifier (KNN supports multi-label classification, supported of course, not all classifiers), and then use an array of multi-class training. Finally, we use the model to make forecasts, it can be seen that the output of a number of categories:

knn_clf.predict([X_train[0]])
>array([[False,  True]])

and [0]
>5

 

We can see the first data is the number 5, the correct classification of its results.

We have a variety of methods used to assess more than one label classifier, which index to use depends entirely on the needs of the project. For example, a method wherein it is a measure of F for each individual label . 1 fraction (or any other measure mentioned in binary classifier), and then to average these scores. As follows:

y_train_knn_pred = cross_val_predict(knn_clf, X_train, y_multilabel, cv=3)
f1_score(y_multilabel, y_train_knn_pred, average="macro")
>0.976410265560605

 

The premise is: assuming that all categories are equally important to. But in the actual project may not be the case. We assume that the picture data, the picture Xiaoming much more than a small red and strong, then we could give Xiao Ming scores more weight. A simple method is: a weight for each category is equivalent to the amount of data in this category. Only need to set average = "weighted" in the above code can use.

 

Multiple Output Category

Finally, a classification task is to introduce multiple-output classification (multioutput-multiclass classification, sometimes referred to as multioutput classification). The general form it is actually multi-label classification, and each label can be more than one category (that is, it contains the possible values ​​of two or more).

For example, suppose we want to construct a system for removing picture noise. It will be a noise input image, and then outputs a clean image to a matrix of pixels represented by pixel intensity (expressed as a method MNIST picture). Note that this classification is output a plurality of label (a category for each pixel), and each may have a label (pixel intensity pixels from 0 to 255) a plurality of values. This is an example of a multioutput classification system.

We can start by giving MNIST increase picture noise (added pixel intensity) to create a training set and test set, randint () method of using NumPy. The target picture is the original picture:

noise = np.random.randint(0, 100, (len(X_train), 784))
X_train_mod = X_train + noise
noise = np.random.randintnt (0, 100, (referred to as (X_test), 784 ))
X_test_mod = X_test + noise
y_train_mod = X_train
y_test_mod = X_test

 

We look at a test pattern:

 

Enter the left is included noise, the right is a clean goal graph. Then we train a classifier to clean up this picture:

knn_clf.fit(X_train_mod, y_train_mod)
clean_digit = knn_clf.predict([X_test_mod[0]])

 

FIG picture and the prediction is shown below, respectively (FIG left prediction), still looks very satisfactory results:

 

So far, we have introduced over classification issues, I hope you now has been able to do the following:

  • How to choose the measure for the classification task
  • Select the appropriate precision / recall tradeoff
  • Comparison between different classifiers
  • Configurations for different tasks, with very good performance classification system

 

Guess you like

Origin www.cnblogs.com/zackstang/p/12332148.html