Why does the deviation of KNN algorithm increase when k increases, but the deviation remains unchanged when the number of trees is increased by RF, and the deviation of GBDT can become smaller when the number of trees is increased

For the KNN algorithm, the larger the value of k, the weaker the learning ability of the model, because the larger the k is, the more it tends to make judgments from the "face", rather than specifically considering the situation of a sample. Judgment, therefore, its deviation will become larger and larger.

For RF, we actually partially achieved the effect of taking the mean value of multiple trainings. The tree obtained from each training is a strong learner, and the variance of each is relatively large, but the sum will be relatively small. For example, when a strong learner is learning, the west wind is blowing, and it will adjust its aiming method accordingly. Another strong learner is learning when the east wind is blowing (west wind and east wind can be understood as noise in different training sets) It will also adjust its aiming method accordingly. When testing samples, one error is westward and one error is eastward, which just cancel each other out, so the variance will be relatively small. But since each tree has a similar bias, the bias doesn't change much when we take the average.

Why is it said that the effect of taking the mean value of multiple trainings is partially realized, but not all? Because when we train each tree, we implement multiple training by sampling sample sets, and different training sets will inevitably overlap. At this time, it cannot be considered as independent multiple trainings. The variance between the obtained trees will produce a certain correlation. The more samples that overlap in the training set, the stronger the correlation between the variances of the two trees, and the more difficult it is to achieve the effect of canceling each other out.

For GBDT, the relationship between N trees is not a relationship of multiple training and taking the mean value at all, but N trees form a correlation, a super learner with progressive layers, it is conceivable that its variance must be Relatively large. But because of its strong learning ability, its deviation is very small, and the more trees there are, the stronger the learning ability and the smaller the deviation. That is, as long as the number of learning times is enough, the predicted mean will be infinitely close to the target. Simply put, the N trees of GBDT are actually an organically related model and cannot be considered as N models.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325604845&siteId=291194637