The derivation of the cost function in this logistic regression is easier to understand

Great God Notes http://scuel.gitee.io/ml-andrewng-notes/week3.html#header-n109

6.5 Simplified Cost Function and Gradient Descent

This cost function derivation in logistic regression is easier to understand


Note: The last item of the penultimate step z is the partial derivative of θj and written into the accumulation


7.2 Cost Function

Tips for tuning the learning rate and regularization parameters:

    First adjust the learning rate so that the number of iterations is within an acceptable range. Therefore, if the learning rate is too small, the number of iterations will be large. Should the initial lambda be set to 0?

1. On the basis of the existing learning rate, adjust the regularization parameters on the validation set, the main thing is to judge whether there are problems of variant and bias.

(1) If lambda is too small, there may be a variant problem, that is, the training error is small, but the error on the validation set is large, increasing labmda will make the step size (the product of the learning rate and the increment) smaller, learning rate slows down.

(2) If the lambda is too large, there may be a problem of bias, that is, the training error and the validation error are large, and the difference is small, reducing the lambda will make the step size larger, thus making the convergence speed faster.

2. The final result should be such that both the training error and the validation error are relatively small and close, and the training error can reach a minimum value within a limited number of iterations.
(1) Increasing the learning rate is equivalent to increasing the step size, which is also equivalent to increasing the regularization parameter

(2) Reducing the learning rate is equivalent to reducing the step size, which is also equivalent to reducing the regularization parameter.

3. On the existing basis, since increasing lambda will slow down the learning rate, it is necessary to increase the value of the learning rate during mild overfitting, which can further reduce the overfitting problem (increasing the learning rate will disguise increase the regularization parameter). That is, when there is a slight overfitting problem and the learning process is too slow, the value of the learning rate should be increased instead of the value of the regularization parameter lambda. However, if the learning rate is too large, the step size will be too large, resulting in a large training error, which will further affect the verification error. That is to say, increasing the learning rate may increase the verification error.

4. If lambda is too large, there will be a problem of bias. At this time, the value of lambda should be reduced. Note that this is also equivalent to reducing the step size, which will speed up the convergence rate.

It should be noted that the error on the validation set will appear first small and then large during the iteration process. It seems that we only need to focus on the convergence on the training set.

     Due to the limited number of iterations, there will be a loss in accuracy (the training error is only approximately optimal), but it should be within an acceptable range.
     In the case of overfitting, the optimal value of the training error will be relatively small, while under normal circumstances, the training error will be relatively large, and the validation error will be comparable. Therefore, in the process of parameter tuning, the training error will gradually increase (from over-learning to normal). 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326000075&siteId=291194637