Article Directory
Basic definition
Binomial logistic regression, referred to as logistic regression, also known as logarithmic regression
Is a two-class model
Using probability to study the relationship between categories and features is a kind of nonlinear regression.
Introduction of log probability function
Because the output label of the binary classification task y ∈ {0, 1} y ∈ \{0,1\}Y∈{ 0,1 } , and the predicted value of linear regressionz = w T x + bz=w^Tx+bwith=wTx+b is a real value.
We want to convert the real value z to a value of 0/1. The most ideal is the unit step function:
because it is not continuous, we hope to find a substitute function that is close to it to a certain extent, and hope that the substitute function is monotonically differentiable. So the logarithmic probability function was introduced
y = 1 1 + e - zy = \ frac {1} {1 + e ^ {- z}} Y=1+e- with1
If the predicted value z is greater than zero, it is judged as a positive type, if it is less than zero, it is judged as a negative type, and if it is a critical value of zero, it can be judged at will:
Derivation of log probability function
Let y be the probability that the sample x is a positive class, then 1-y is the probability that the sample x is a negative class
Their ratio y 1 − y \frac{y}{1-y}1 - andandCalled the probability , expressed as the relative probability of the positive class
Take the logarithm to get the logarithmic probability: lny 1 − y ln\frac{y}{1-y}ln1 - andand
令z = lny 1 - yz = ln \ frac {y} {1-y}with=ln1 - andand, 可 得y = 1 1 + e - zy = \ frac {1} {1 + e ^ {- z}}Y=1+e- with1
The significance of the log probability function : the probability of predicting the sample x as a positive class
Logistic regression model
将 z = w T x + b z = w^Tx+b with=wTx+b is substituted into the logarithmic probability function to obtain the form of the logistic regression model:
h = 1 1 + e − ( w T x + b ) h=\frac{1}{1+e^{-(w^Tx+b)}} h=1+e−(wTx+b)1
Estimation of model parameters
Cost function:
J = − 1 n ∑ i = 1 n y i l n h ( x i ) + ( 1 − y i ) l n [ 1 − h ( x i ) ] J=-\frac{1}{n}∑_{i=1}^ny_iln~h(x_i)+(1-y_i)ln[1-h(x_i)] J=−n1i=1∑nYil n h ( x i)+(1−Yi)ln[1−h(xi)]
Minimize the cost function to obtain the parameters w and b
Solution method: small batch gradient descent method