Ensemble Learning - Unpaired Diversity Metrics - Personal Summary

I. Introduction

        Ensemble Learning: Learning tasks are accomplished by building and combining multiple learners. The general structure is: first generate a group of "individual learners", and then use a certain strategy to combine them. Combining strategies mainly include average method, voting method and learning method.

        In ensemble learning, the difference between individual learners is called "ensemble diversity". How to understand the integration of diversity is the holy grail problem of this learning paradigm , that is , the elusive and meaningful goal . The existing ensemble diversity measurement methods mainly include two categories: one is the diversity measurement of paired individual learners, and the other is the diversity measurement of unpaired individual learners. This paper mainly discusses and summarizes the latter category.

2. Preparation

        This section declares some basic terminology, as the following metrics are computed based on individual learners. Individual learner set: \{h_1, h_2, ..., h_T\}; dataset: D=\{(\mathbf{x}_1, y_1), (\mathbf{x}_2, y_2), ..., (\mathbf{x}_m, y_m)\}, where \mathbf{x}_i, y_isamples and class labels are respectively, and y_i \in \{-1, +1\}.

3. Unpaired diversity measurement method

        1. Kohavi-Wolpert variance , referred to as KW measure, was proposed by Kohavi and Wolpert in 1996. The specific calculation method is

KW=\frac{1}{mT^2}\sum_{k=1}^{m}\rho(\mathbf{x}_k)(T-\rho(\mathbf{x}_k))

Among them, mis the number of samples, Tis the number of individual learners, and \rho(\mathbf{x})is the number of correct classification Tof samples by individual learners .\mathbf{x}0 \leqslant \rho(\mathbf{x}) \leqslant T

        It can be seen from the equation that mand Tare regarded as constants, and the most critical point is \rho(\mathbf{x}): when each sample \rho(\mathbf{x})is Thalf of , the KW metric reaches the maximum, and the diversity is the largest at this time; and when each sample is \rho(\mathbf{x})all of When 0 or T, the KW metric reaches the minimum, and the diversity is minimum at this time. This is well understood, if each sample \rho(\mathbf{x})is 0 or T, then the prediction results of all individual learners are the same; otherwise, if each sample \rho(\mathbf{x})is Thalf of , the prediction results of all individual learners are the same It may be different, please note that it may be different, not absolutely different. Therefore, there are certain problems in the diversity measurement of the KW metric.

        2. Interrater Agreement (Interrater Agreement) , that is, \kappameasurement. \kappaThe metric is used to analyze the consistency of a set of classifiers, it is defined as

\kappa = 1 - \frac{\frac{1}{T}\sum_{k=1}^{m}\rho(\mathbf{x}_k)(T-\rho(\mathbf{x}_k) )}{m(T-1)\bar{p}(1-\bar{p})}

Among them, \bar{p}=\frac{1}{mT}\sum_{i=1}^{T}\sum_{k=1}^{m}\mathbb{I}(h_i(\mathbf{x}_k)=y_k)is the average classification accuracy of individual learners; and \mathbb{I}(\cdot)is an indicator function, which returns 1 when the condition in parentheses is true, and returns 0 otherwise.

        \kappaThe metric mainly reflects the consistency of prediction results among individual learners. When the prediction results are completely consistent, then \kappathe value of is 1; if the degree of consistency between the learners is worse than random (the most extreme case is: the result of each sample being correctly classified is half of the individual learner and the average Accuracy is 0.5), then \kappa \leqslant 0. Therefore, \kappathe larger the value of the metric, the more consistent the prediction results of the individual learners, but the smaller the diversity; and vice versa.

        3. Entropy . The entropy measure calculation method proposed by Cunningham and Carney in 2000 is

\mathrm{Ent}_{\mathrm{cc}}=\frac{1}{m}\sum_{k=1}^{m}\sum_{y\in\{-1,+1\}}{ -P(y|\mathbf{x}_k)\log P(y|\mathbf{x}_k)}

Among them, P(y|\mathbf{x}_k)=\frac{1}{T}\sum_{i=1}^{T}\mathbb{I}(h_i(\mathbf{x}_k)=y)represents the proportion of individual learners that \mathbf{x}_kwill be predicted as (the denominator of the proportion is ). Obviously, it is not necessary to know the accuracy of individual learners.yT\mathrm{Ent}_{\mathrm{cc}}

        The entropy measure calculation method proposed by Shipp and Kuncheva in 2002 is

\mathrm{Ent}_{\mathrm{sk}}=\frac{1}{m}\sum_{k=1}^{m}{\frac{\min(\rho(\mathbf{x}_k) , T-\rho(\mathbf{x}_k))}{T-\left\lceil T/2\right\rceil}}

Among them, \left \lceil x \right \rceilis the sign of rounding up: if xit is an integer, then \left \lceil x \right \rceil = x, if xit is not an integer, then \left \lceil x \right \rceil = xthe integer part of the value is +1. \mathrm{Ent}_{\mathrm{sk}}The value range of is [0, 1], when it is set to 0, it means that it is completely consistent, and when it is set to 1, it means that the diversity is the largest. It's worth noting that \mathrm{Ent}_{\mathrm{sk}}no logarithmic function is used, so it's not classical entropy. Nevertheless, this equation is used more often because it is easier to implement and faster to compute.

        4. Difficulty . Assuming that \mathbf{x}the proportion of individual learners that correctly classify samples is recorded as a random variable X, then the calculation method of difficulty is

\theta = \mathrm{variance}(X)

Among them, Xthe value range of the random variable is \{0, \frac{1}{T}, \frac{2}{T}, ..., 1\}, and Xthe probability distribution of can be estimated by predicting on the Tdata set by a classifier . DTherefore, Xthe distribution of the random variable is listed as

X 0 \frac{1}{T} ... 1
P \frac{\mid\{\mathbf{x} |  \rho(\mathbf{x})=0\}\mid}{m} \frac{\mid\{\mathbf{x} |  \rho(\mathbf{x})=1\}\mid}{m} ... \frac{\mid\{\mathbf{x} |  \rho(\mathbf{x})=T\}\mid}{m}

        \thetaMeasures the classification difficulty of the sample, \thetathe smaller it is, the greater the diversity. If a histogram is used to visualize the above distribution column, when the sample is difficult to be classified, the distribution area of ​​the histogram will be mainly scattered on the left, and when the sample is easy to be classified, the distribution area of ​​the histogram will be mainly scattered on the right.

        5. Universal diversity . This measure is calculated as

\mathrm{gd}=1-\frac{p(2)}{p(1)}

Among them, p(1)=\sum_{i=1}^{T}\frac{i}{T}p_i, p(2)=\sum_{i=1}^{T}\frac{i}{T}\frac{i-1}{T-1}p_i, and p_irepresents the probability that a randomly selected classifier \mathbf{x}fails to predict on a randomly selected sample. \mathrm{gd}The value range of the metric is [0, 1], when \mathrm{gd}=0, the diversity is the smallest. This measure captures the idea that diversity is greatest when one classifier's prediction error is accompanied by another prediction being correct. As for why this can be done, I haven’t figured it out yet. If you understand, please leave a message and let me know.

        6. Simultaneous failure measurement . This measure is a modified version of the generic diversity, calculated as

\mathrm{cfd} = \left\{\begin{matrix} 0, & p_0=1\\ \frac{1}{1-p_0}\sum_{i=1}^{T}\frac{T-i}{T-1}p_i & p_0<1 \end{matrix}\right.

When all classifiers give the same prediction results at the same time, cfd=0, if the samples of each classifier are different, cfd=1. Sorry, I still don't understand it.

Four. Summary

        The diversity calculation methods above are all implemented based on classifiers. Among them, except for \thetathe two of consistency between raters, other metrics are directly proportional to the integration diversity.

        In fact, the author is just getting started in the field of integrated learning, and there are still many things I don’t understand. If anyone sees it, please give me advice. If you don't understand something, you are welcome to leave a message in the comment area to discuss the unpaired diversity of this integrated learning.

5. References

        1. Baidu Encyclopedia: Integrated Learning

        2. Zhou Zhihua. Integrated Learning: Foundations and Algorithms [M]. Electronic Industry Press, 2020.

Guess you like

Origin blog.csdn.net/qq_36158230/article/details/130135708