Gradient Descent in Logistic Regression

Gradient descent method: This algorithm is a first-order optimization algorithm that finds a local minimum of a function by iteratively searching for a specified step distance point in the opposite direction of the gradient (or approximate gradient) corresponding to the current point on the function.

Take the unary function as an example to illustrate the gradient descent method:

The update of w will be updated in the direction that the cost function J(w) decreases. Training w is to repeat the following process continuously:

repeat{

w:=w-\partial \frac{dJ(w)}{dw}

}

:= represents update, \partialrepresents the learning rate, \frac{dJ(w)}{dw}and is the derivative of J(w) with respect to w.

Gradient descent in logistic regression:

In logistic regression, we need to train two parameters w and b. Regarding why w and b, you can read the previous article for deep learning understanding and application .

For these two parameters need to repeat {

w:=w-\partial \frac{\partial J(w,b)}{\partial w}

b:=b-\partial \frac{\partial J(w,b)}{\partial b}

}

\frac{\partial J(w,b)}{\partial w}Represents the partial derivative of the cost function J(w,b) on w.

How to calculate the partial derivatives to implement the gradient descent method of logistic regression?

Assuming that a sample has two features, namely x1 and x2, the logistic regression has the following steps:

 Callout: \delta (z)=\frac{1}{1+e^{-z}}   ;  L(a,y)=-(y\log a+(1-y)\log (1-a))

where y is the true value and a is the predicted value.

The algorithm for implementing logistic regression includes the process of calculating the predicted value a forward and using the loss function L(a, y) to update w and b in reverse.

The update formula is:

w1:=w1-\partial \frac{\partial L}{\partial w1}            w2:=w2-\partial \frac{\partial L}{\partial w2}         b:=b-\partial \frac{\partial L}{\partial b}

Taking w1 as an example, it mainly needs to be calculated when updating \frac{\partial L}{\partial w1}. Calculated as follows:

\frac{\partial L}{\partial w1}=\frac{\partial L}{\partial z}\cdot \frac{\partial z}{\partial w1}=x1\ast \frac{\partial L}{\partial z}

\frac{\partial L}{\partial z}=\frac{\partial L}{\partial a}\cdot \frac{\partial a}{\partial z}\frac{\partial L}{\partial z}=\frac{\partial L}{\partial a}\cdot \frac{\partial a}{\partial z}=(-\frac{y}{a}+\frac{1-y}{1-a})\ast [a(1-a)]=a-y

Note:  a=\delta (z)=\frac{1}{1+e^{-z}}  It is the sigmoid function, for the sigmoid function  a^{'}=a\ast (1-a)

therefore  \frac{\partial L}{\partial w1}=x1\ast(a-y)

in the same way  \frac{\partial L}{\partial w2}=x2\ast(a-y)          \frac{\partial L}{\partial b}=a-y

The update of the parameters can be calculated according to the above formula.

Guess you like

Origin blog.csdn.net/m0_45267220/article/details/128683284