[Machine Learning] Multi-classification and multi-label classification algorithm (including source code)

1. Single-label binary classification problem

1.1 Principle of single-label binary classification algorithm

The problem of single-label binary classification is our most common algorithm problem. It mainly means that there are only two values ​​​​of the label label, and there is only one label label that needs to be predicted in the algorithm; to put it bluntly, the possible categories of each instance are only Two types (A or B); the classification algorithm at this time is actually constructing a classification line to divide the data into two categories. Common algorithms: Logistic, SVM, KNN, decision tree, etc.

insert image description here
Logistic algorithm principle:

insert image description here

2. Single-label multi-classification problem

The single-label multi-classification problem actually means that there is only one label label to be predicted, but the value of the label label may have many situations; straightforwardly speaking, there are KK possible categories for each instanceK species (t 1 t_1t1 , t 2 t_2 t2 , ⋯ \cdots , t k t_k tk,k≥3); common algorithms: Softmax, KNN, decision tree, etc.
insert image description here
insert image description here

In actual work, if it is a multi-classification problem, we can convert the problem to be solved into an extension of the binary classification algorithm, that is, split the multi-classification task into several binary classification tasks to solve. The specific strategy is as follows:

  • One-Versus-One(ovo): One-on-one
  • One-Versus-All / One-Versus-the-Rest (ova/ovr): One-to-many
  • Error Correcting Output codes (error correction code mechanism): many to many

insert image description here

2.1 ovo

principle:

Will KKCombine the data of two categories in K categories, and then use the combined data to train a model, resulting inK ( K − 1 ) / 2 K(K-1)/2K(K1 ) /2 classifiers, the results of these classifiers are fused, and the prediction results of the classifiers are output by a majority vote to output the final prediction result value.

insert image description here

2.1.1 Handwritten code

def ovo(datas,estimator):
    '''datas[:,-1]为目标属性'''
    import numpy as np
    Y = datas[:,-1]
    X = datas[:,:-1]
    y_value = np.unique(Y)

    #计算类别数目
    k = len(y_value)
    modles = []
    #将K个类别中的两两类别数据进行组合,并对y值进行处理
    for i in range(k-1):
        c_i = y_value[i]
        for j in range(i+1,k):
            c_j = y_value[j]
            new_datas = []
            for x,y in zip(X,Y):
                if y == c_i or y == c_j:
                    new_datas.append(np.hstack((x,np.array([2*float(y==c_i)-1]))))
            new_datas = np.array(new_datas)
            algo = estimator()
            modle = algo.fit(new_datas)
            modles.append([(c_i,c_j),modle])
    return modles
def argmaxcount(seq):
    '''计算序列中出现次数最多元素'''
    '''超极简单的方法'''
    # from collections import Counter
    # return Counter(seq).values[0]

    '''稍微复杂的'''
    # dict_num = {}
    # for item in seq:
    #     if item not in dict_num.keys():
    #         dict_num[item] = seq.count(item)
    # # 排序
    # import operator
    # sorted(dict_num.items(), key=operator.itemgetter(1))

    '''字典推导'''
    dict_num = dict_num = {
    
    i: seq.count(i) for i in set(seq)}

def ovo_predict(X,modles):
    import operator
    result = []
    for x in X:
        pre = []
        for cls,modle in modles:
            pre.append(cls[0] if modle.predict(x) else cls[1])
        d = {
    
    i: pre.count(i) for i in set(pre)} #利用集合的特性去重
        result.append(sorted(d.items(),key=operator.itemgetter(1))[-1][0])
    return result

You can also call the encapsulated code directly:

2.1.2 Call API

class sklearn.multiclass.OneVsOneClassifier(estimator, n_jobs=1)

insert image description here
insert image description here
code show as below:

from sklearn import datasets
from sklearn.multiclass import OneVsOneClassifier
from sklearn.svm import LinearSVC
from sklearn.neighbors import KNeighborsClassifier

# 加载数据
iris = datasets.load_iris()

# 获取X和y
X, y = iris.data, iris.target
print("样本数量:%d, 特征数量:%d" % X.shape)
# 设置为3,只是为了增加类别,看一下ovo和ovr的区别
y[-1] = 3

# 模型构建
clf = OneVsOneClassifier(LinearSVC(random_state=0))
# clf = OneVsOneClassifier(KNeighborsClassifier())
# 模型训练
clf.fit(X, y)

# 输出预测结果值
print(clf.predict(X))
print("效果:{}".format(clf.score(X, y)))

# 模型属性输出
k = 1
for item in clf.estimators_:
    print("第%d个模型:" % k, end="")
    print(item)
    k += 1
print(clf.classes_)

2.2 ovr

principle:

In one-to-many model training, instead of a combination of two categories, each category is used as a positive example, and the remaining examples are used as negative examples to train KK respectively.K models; then when predicting, if in thisKKAmong the K models, only one model output is a positive example, then the final prediction result belongs to this category of the classifier; if multiple positive examples are generated, then you can choose the confidence based on the confidence of the classifier as an indicator. The classifier with the highest degree is used as the final result, common confidence: precision, recall.

insert image description here

2.2.1 Handwritten code

def ovr(datas,estimator):
    '''datas[:,-1]为目标属性'''
    import numpy as np
    Y = datas[:,-1]
    X = datas[:,:-1]
    y_value = np.unique(Y)

    #计算类别数目
    k = len(y_value)
    modles = []
    #准备K个模型的训练数据,并对y值进行处理
    for i in range(k):
        c_i = y_value[i]
        new_datas = []
        for x,y in zip(X,Y):
            new_datas.append(np.hstack((x,np.array([2*float(y==c_i)-1]))))
        new_datas = np.array(new_datas)
        algo = estimator()
        modle = algo.fit(new_datas)
        confidence = modle.score(new_datas) #计算置信度
        modles.append([(c_i,confidence),modle])
    return modles

def ovr_predict(X,modles):
    import operator
    result = []
    for x in X:
        pre = []
        cls_confi = []
        for cls,modle in modles:
            cls_confi.append(cls)
            pre.append(modle.predict(x))
        pre_res = []
        for c,p in zip(cls_confi,pre):
            if p == 1:
                pre_res.append(c)
        if not pre_res:
            pre_res = cls_confi
        result.append(sorted(pre_res,key=operator.itemgetter(1))[-1][0])
    return result

2.2.2 Call API

sklearn.multiclass.OneVsRestClassifier

insert image description here
insert image description here
code show as below:

from sklearn import datasets
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import LinearSVC
from sklearn.metrics import accuracy_score

# 数据获取
iris = datasets.load_iris()
X, y = iris.data, iris.target
print("样本数量:%d, 特征数量:%d" % X.shape)
# 设置为3,只是为了增加类别,看一下ovo和ovr的区别
y[-1] = 3

# 模型创建
clf = OneVsRestClassifier(LinearSVC(random_state=0))
# 模型构建
clf.fit(X, y)

# 预测结果输出
print(clf.predict(X))

# 模型属性输出
k = 1
for item in clf.estimators_:
    print("第%d个模型:" % k, end="")
    print(item)
    k += 1
print(clf.classes_)

2.3 The difference between OvO and OvR

insert image description here

2.4 Error Correcting

Rationale: Divide the model building application into two phases: the encoding phase and the decoding phase.

In the encoding stage, M divisions are carried out among the K categories. Each division divides part of the data into positive categories and part of the data into negative categories. A model is constructed for each division. The result of the model is that in the space for each Classes each define a point;

In the decoding stage, the trained model is used to predict the test samples, the distance between the point corresponding to the predicted sample and the category is calculated, and the category with the closest distance is selected as the final predicted category.

insert image description here

sklearn.multiclass.OutputCodeClassifier

The structure of the code is:

class sklearn.multiclass.OutputCodeClassifier(estimator, code_size=1.5, random_state=None, n_jobs=1)

insert image description here
insert image description here
code show as below:

from sklearn import datasets
from sklearn.multiclass import OutputCodeClassifier
from sklearn.svm import LinearSVC
from sklearn.metrics import accuracy_score

# 数据获取
iris = datasets.load_iris()
X, y = iris.data, iris.target
print("样本数量:%d, 特征数量:%d" % X.shape)

# 模型对象创建
# code_size: 指定最终使用多少个子模型,实际的子模型的数量=int(code_size*label_number)
# code_size设置为1,等价于ovr子模型个数;
# 设置为0~1, 那相当于使用比较少的数据划分,效果比ovr差; 
# 设置为大于1的值,那么相当于存在部分模型冗余的情况
clf = OutputCodeClassifier(LinearSVC(random_state=0), code_size=30, random_state=0)
# 模型构建
clf.fit(X, y)

# 输出预测结果值
print(clf.predict(X))
print("准确率:%.3f" % accuracy_score(y, clf.predict(X)))

# 模型属性输出
k = 1
for item in clf.estimators_:
    print("第%d个模型:" % k, end="")
    print(item)
    k += 1
print(clf.classes_)

insert image description here

3. Multi-label algorithm problem

Multi-Label Machine Learning(MLL algorithm) refers to the existence of multiple y values ​​in the prediction model, which can be divided into two different situations:

  • Multiple y values ​​to be predicted;
  • In a classification model, a sample may have multiple unfixed categories.

According to the complexity of multi-label business problems, problems can be divided into two categories:

  • There is mutual dependence between the values ​​to be predicted;
  • There is no dependency between the values ​​to be predicted.

Solutions to these problems can be divided into two categories:

  • Transformation Strategies (Problem Transformation Methods);
  • Algorithm Adaptation.

insert image description here
Note: In multi-label, it is generally believed that each label has only two categories, namely (+1, and - 1). If a label has multiple categories, it is necessary to decompose the category into a new value of +1 or -1. Label.

3.1 Problem Transformation Methods

Problem Transformation Methods, also known as strategy transformation or problem transformation, is a way to convert a multi-label classification problem into a single-label model construction problem, and then merge the models. There are mainly the following methods:

  • Binary Relevance(first-order)
  • Classifier Chains(high-order)
  • Calibrated Label Ranking(second-order)

3.1.1 Binary Relevance

The core idea of ​​Binary Relevance is to decompose the multi-label classification problem and convert it into q binary classification problems, where each binary classifier corresponds to a label to be predicted.

insert image description here

def Binary_Relevance(X,Y,estimator):
    '''Y是一个只有0和1的数组'''
    import numpy as np

    # 计算标签的个数
    q = Y.shape[1]
    Y_label = [i for i in range(q)]

    modles = []
    #准备K个模型的训练数据,并对y值进行处理
    for j in Y_label:
        D_j = []
        for x,y in zip(X,Y):
            D_j.append(np.hstack((x,np.array([1 if j in Y_label[y==1] else -1]))))
        new_datas = np.array(D_j)
        algo = estimator()
        g_j = algo.fit(new_datas)
        modles.append(g_j)

    # Y = Y.replace(0,-1) #把所有的0替换成-1
    # for j in Y_label:
    #     new_datas = np.hstack((X,Y[:,j].reshape(-1,1)))
    #     new_datas = np.array(new_datas)
    #     algo = estimator()
    #     g_j = algo.fit(new_datas)
    #     modles.append(g_j)

    return modles

def Binary_Relevance_predict(X,modles,label_name):
    import operator
    result = []
    for x in X:
        pre_res = []
        for g_j in modles:
            pre_res.append(g_j.predict(x))
        Y = set(np.array(label_name)[np.array(pre_res)>0]).union(label_name[pre_res.index(max(pre_res))])
        result.append(Y)
    return result

The advantages of the Binary Relevance method are as follows:

  • The implementation method is simple and easy to understand;
  • The model works well when there are no relevant dependencies between the y values.

The disadvantages are as follows:

  • If y directly has mutual dependencies, then the generalization ability of the final model is relatively weak;
  • It is necessary to construct q binary classifiers, and q is the number of y values ​​to be predicted. When q is relatively large, more models need to be constructed.

3.1.2 Classifier Chains

The core idea of ​​Classifier Chains is to decompose the multi-label classification problem and convert it into a chain of binary classifiers, in which the construction of the binary classifier after the chain is based on the prediction results of the previous classifiers. When building the model, first shuffle the order of the tags, and then build the model corresponding to each tag from the beginning to the end.

insert image description here
Classifier Chains model construction:
insert image description here
Classifier Chains model prediction:
insert image description here
insert image description here
The advantages of the Classifier Chains method are as follows:

  • The implementation method is relatively simple and easy to understand;
  • Considering the dependencies between labels, the generalization ability of the final model is better than that of the model constructed by Binary Relevance.

The disadvantages are as follows: It is difficult to find a more suitable label dependency.

sklearn.multioutput.ClassifierChain
class sklearn.multioutput.ClassifierChain(base_estimator, order=None, cv=None, random_state=None)

insert image description here
insert image description here
insert image description here

import numpy as np
import matplotlib.pyplot as plt

from sklearn.datasets import make_multilabel_classification
from sklearn.multioutput import MultiOutputClassifier
from sklearn.svm import SVC
from sklearn.preprocessing import LabelBinarizer
from sklearn.decomposition import PCA

def plot_hyperplane(clf, min_x, max_x, linestyle, label):
    # 画图
    w = clf.coef_[0]
    a = -w[0] / w[1]
    xx = np.linspace(min_x - 5, max_x + 5)  
    yy = a * xx - (clf.intercept_[0]) / w[1]
    plt.plot(xx, yy, linestyle, label=label)

def plot_subfigure(X, Y, subplot, title):
    # 将X进行降维操作,变成两维的数据
    X = PCA(n_components=2).fit_transform(X)
    
    min_x = np.min(X[:, 0])
    max_x = np.max(X[:, 0])

    min_y = np.min(X[:, 1])
    max_y = np.max(X[:, 1])

    classif = MultiOutputClassifier(SVC(kernel='linear'))
    classif.fit(X, Y)

    plt.subplot(2, 2, subplot)
    plt.title(title)

    zero_class = np.where(Y[:, 0])
    one_class = np.where(Y[:, 1])
    plt.scatter(X[:, 0], X[:, 1], s=40, c='gray')
    plt.scatter(X[zero_class, 0], X[zero_class, 1], s=160, edgecolors='b',
               facecolors='none', linewidths=2, label='Class 1')
    plt.scatter(X[one_class, 0], X[one_class, 1], s=80, edgecolors='orange',
               facecolors='none', linewidths=2, label='Class 2')

    plot_hyperplane(classif.estimators_[0], min_x, max_x, 'r--',
                    'Boundary\nfor class 1')
    plot_hyperplane(classif.estimators_[1], min_x, max_x, 'k-.',
                    'Boundary\nfor class 2')
    plt.xticks(())
    plt.yticks(())

    plt.xlim(min_x - .5 * max_x, max_x + .5 * max_x)
    plt.ylim(min_y - .5 * max_y, max_y + .5 * max_y)
    if subplot == 1:
        plt.xlabel('First principal component')
        plt.ylabel('Second principal component')
        plt.legend(loc="upper left")

plt.figure(figsize=(8, 6))

X, Y = make_multilabel_classification(n_classes=2,
                                      allow_unlabeled=False, # 该参数控制是否有类别缺省的数据,False表示没有
                                      random_state=1)

plot_subfigure(X, Y, 1, "With unlabeled samples + CCA")


plt.subplots_adjust(.04, .02, .97, .94, .09, .2)
plt.show()
import numpy as np
import matplotlib.pyplot as plt

from sklearn.datasets import make_multilabel_classification
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC
from sklearn.preprocessing import LabelBinarizer
from sklearn.decomposition import PCA

def plot_hyperplane(clf, min_x, max_x, linestyle, label):
    # 画图
    w = clf.coef_[0]
    a = -w[0] / w[1]
    xx = np.linspace(min_x - 5, max_x + 5)  
    yy = a * xx - (clf.intercept_[0]) / w[1]
    plt.plot(xx, yy, linestyle, label=label)

def plot_subfigure(X, Y, subplot, title):
    # 将X进行降维操作,变成两维的数据
    X = PCA(n_components=2).fit_transform(X)
    min_x = np.min(X[:, 0])
    max_x = np.max(X[:, 0])

    min_y = np.min(X[:, 1])
    max_y = np.max(X[:, 1])

    classif = OneVsRestClassifier(SVC(kernel='linear'))
    classif.fit(X, Y)

    plt.subplot(2, 2, subplot)
    plt.title(title)

    zero_class = np.where(Y[:, 0])
    one_class = np.where(Y[:, 1])
    plt.scatter(X[:, 0], X[:, 1], s=40, c='gray')
    plt.scatter(X[zero_class, 0], X[zero_class, 1], s=160, edgecolors='b',
               facecolors='none', linewidths=2, label='Class 1')
    plt.scatter(X[one_class, 0], X[one_class, 1], s=80, edgecolors='orange',
               facecolors='none', linewidths=2, label='Class 2')

    plot_hyperplane(classif.estimators_[0], min_x, max_x, 'r--',
                    'Boundary\nfor class 1')
    plot_hyperplane(classif.estimators_[1], min_x, max_x, 'k-.',
                    'Boundary\nfor class 2')
    plt.xticks(())
    plt.yticks(())

    plt.xlim(min_x - .5 * max_x, max_x + .5 * max_x)
    plt.ylim(min_y - .5 * max_y, max_y + .5 * max_y)
    if subplot == 1:
        plt.xlabel('First principal component')
        plt.ylabel('Second principal component')
        plt.legend(loc="upper left")


plt.figure(figsize=(8, 6))

X, Y = make_multilabel_classification(n_classes=2, n_labels=1,
                                      allow_unlabeled=False, # 该参数控制是否有类别缺省的数据,False表示没有
                                      random_state=1)

plot_subfigure(X, Y, 1, "With unlabeled samples + CCA")


plt.subplots_adjust(.04, .02, .97, .94, .09, .2)
plt.show()

多标签分类问题(OneVsRestClassifier)

3.2 Algorithm Adaptation

Algorithm Adaptation, also known as algorithm adaptation strategy, is a way to directly apply the existing single-label algorithm to multi-label. There are mainly the following methods:

The idea of ​​k-nearest neighbor algorithm (k-Nearest Neighbor, KNN): If most of the k most similar samples in the feature space (that is, the closest distance in the feature space) of a sample belong to a certain category, then the sample belongs to this category. category.

The idea of ​​ML-kNN: For each instance, first obtain the k nearest instances, and then use the label sets of these instances to judge the value of the predicted label set of this instance through the maximum posterior probability (MAP).

Maximum a posteriori probability (MAP): In fact, the prior probability distribution of the estimator is added to the maximum likelihood estimation (MLE).

insert image description here
insert image description here
insert image description here
ML-DT uses a decision tree to process multi-label content. The core is to give a finer-grained information gain criterion to build this decision tree model.
insert image description here

Guess you like

Origin blog.csdn.net/wzk4869/article/details/128748406