The formula for cross entropy is − Σ k K p (yk) log p (y ^ k) -\Sigma_k^(K)p(y_k)\log p(\hat(y)_k)- ΣkKp(yk)logp(Y^k),
对y ^ \ hat {y}Y^After derivation, we get − Σ k K p (yk) p (y ^ k) -\Sigma_k^{K}\frac{p(y_k)}{ p(\hat{y}_k)}- ΣkKp(Y^k)p(yk)
Reflected in this formula:
numpy_ml.trees.losses.CrossEntropyLoss.grad
def grad(self, y, y_pred):
eps = np.finfo(float).eps # 对y_pred求导
return -y * 1 / (y_pred + eps)
Note that for classification tasks, if there is KKK classes, essentially trainingKKK trees, then use OHE to classifyy ∈ [0, K) y \in [0,K)Y∈[0,K ) is processed as k 0,1 column vectors. So for the kth component, the cross entropy degenerates to− p (y) log p (y ^) -p(y)\log p(\hat{y})−p(y)logp(Y^) 。
The negative gradient of the decision tree fitting at each step: p (y) p (y ^) \frac{p(y)}{ p(\hat{y})}p(Y^)p(y)
y = 0 y = 0 Y=0 | y = 1 y = 1 Y=1 | |
---|---|---|
y ^ = 0 \ hat {y} = 0 Y^=0 | 0 | ∞ \infin ∞ |
y ^ = 1 \ hat {y} = 1 Y^=1 | 0 | 1 |