算法强化 —— 提升树算法(四)

多分类问题

对于多分类问题,与二分类问题类似,仅在损失函数部分有所区别,对于多分类问题,原论文中选择的是交叉熵损失函数
L ( { y k , F k ( x ) } 1 K ) = k = 1 K y k log p k ( x ) L\left(\left\{y_{k}, F_{k}(x)\right\}_{1}^{K}\right)=-\sum_{k=1}^{K} y_{k} \log p_{k}(x)
同时多分类问题一般使用softmax 函数进行类别概率的计算,其中k表示当前的类别,K表示类别的数量
p k ( x ) = exp ( F k ( x ) ) / l = 1 K exp ( F l ( x ) ) p_{k}(x)=\exp \left(F_{k}(x)\right) / \sum_{l=1}^{K} \exp \left(F_{l}(x)\right)
然后同样的求负梯度(残差)
y ~ i k = [ L ( { y i l , F l ( x i ) } l = 1 K ) F k ( x i ) ] { F l ( x ) = F l m 1 ( α ) } 1 K = y i k p k , m 1 ( x i ) \tilde{y}_{i k}=-\left[\frac{\partial L\left(\left\{y_{i l}, F_{l}\left(x_{i}\right)\right\}_{l=1}^{K}\right)}{\partial F_{k}\left(x_{i}\right)}\right]_{\left\{F_{l}(x)=F_{l m-1(\alpha)}\right\}_{1}^{K}}=y_{i k}-p_{k, m-1\left(x_{i}\right)}
我们需要求叶子节点的估计值
{ r j k m } = argmin γ k i = 1 N k = 1 K ϕ ( y i k , F k , m 1 ( x i ) + j = 1 J γ j k I ( x i R j m ) } ) \left.\left\{r_{j k m}\right\}=\operatorname{argmin}_{\gamma_{k}} \sum_{i=1}^{N} \sum_{k=1}^{K} \phi\left(y_{i k}, F_{k, m-1}\left(x_{i}\right)+\sum_{j=1}^{J} \gamma_{j k} I\left(x_{i} \in R_{j m}\right)\right\}\right)
可以通过Newton-Raphson来求近似结果
γ j k m = K 1 K x i R j k m y ~ i k x i R j k m y ~ i k ( 1 y ~ i k ) \gamma_{j k m}=\frac{K-1}{K} \frac{\sum_{x_{i} \in R_{j k m}} \tilde{y}_{i k}}{\sum_{x_{i} \in R_{j k m}}\left|\tilde{y}_{i k}\right|\left(1-\left|\tilde{y}_{i k}\right|\right)}

回归问题

在原论文中使用的是huber损失函数,为了简单,我们使用平方损失
L ( y , F ) = ( y F ) 2 2 L(y,F) = \frac{(y-F)^2}{2}
y ~ i = [ L ( y , F ( x i ) ) F ( x i ) ] F ( x ) = F m 1 ( x ) = y i F m 1 ( x i ) \tilde{y}_{i}=-\left[\frac{\partial L\left(y, F\left(x_{i}\right)\right)}{\partial F\left(x_{i}\right)}\right] F(x)=F_{m-1}(x)=y_{i}-F_{m-1}\left(x_{i}\right)
叶子节点值的估计
γ j m = argmin γ x i R j m 1 2 ( y i ( F m 1 ( x i ) + γ ) ) 2 γ j m = argmin γ x i R j m 1 2 ( y i F m 1 ( x i ) γ ) 2 γ j m = argmin γ x i R j m 1 2 ( y ~ i γ ) 2 \begin{array}{c} \gamma_{j m}=\operatorname{argmin}_{\gamma} \sum_{x_{i} \in R_{j m}} \frac{1}{2}\left(y_{i}-\left(F_{m-1}\left(x_{i}\right)+\gamma\right)\right)^{2} \\ \gamma_{j m}=\operatorname{argmin}_{\gamma} \sum_{x_{i} \in R_{j m}} \frac{1}{2}\left(y_{i}-F_{m-1}\left(x_{i}\right)-\gamma\right)^{2} \\ \gamma_{j m}=\operatorname{argmin}_{\gamma} \sum_{x_{i} \in R_{jm}} \frac{1}{2}\left(\tilde{y}_{i}-\gamma\right)^{2} \end{array}
所以,我们可以去 y ~ i \tilde{y}_i 均值,来使得损失最小:
γ j m = a v e r a g e x i R j m y ~ i \gamma_{jm} = average_{x_i \in R_{jm}} \tilde{y}_i

发布了110 篇原创文章 · 获赞 3 · 访问量 4084

猜你喜欢

转载自blog.csdn.net/qq_33357094/article/details/105066780