Logistic regression classification algorithm of (Logistic Regression

Logistic regression classification algorithm of (Logistic Regression)

1. binary classification

Now there is a hospital, you want to analyze the patient's condition, of which one is judged on benign \ malignant tumors, there are a number of data sets is about the size of a tumor, the task is to be determined according to the size of the tumor is benign or malignant. This is a typical two-class problem, that is the result of only two output values ​​---- benign and malignant (usually represented by numbers 0 and 1). 1, we can make a visual determination of tumor size is greater than 5, i.e. nausea tumors (output 1); 5 or less, that is benign (output 0).

2. The classification of the nature of the problem

Essentially belongs to the classification of supervised learning, for a given set of data known classification, and then let the computer through the data set classification algorithm learning, so that the computer that the data can be predicted. Tumors of example, the existing data set shown in Figure 1, now to diagnose a patient's condition, the computer needs only the patient and the tumor size compared 5, it can then be inferred malignant or benign. Classification and regression problems have some similarities, are unknown to predict the results of the learning data sets, except that different output values. Regression output value is continuous values ​​(eg house prices), the output value of the classification of discrete values ​​(for example, malignant or benign). Since the classification and regression problems have some similarities, we can not be classified on the basis of return on it? The answer is yes. One possible idea is to use a linear fit, then the predicted result value of the linear fit to quantify, is about continuous value quantized to discrete values.

3. Classification assume the function of

Although the classification and regression problems have some similar, but we can not directly use the regression assumptions function as a classification problem hypothesis function. Or in the example of FIG. 1 as an example, if we use simple linear function (i.e. \ (H (X) = \ theta_0 + \ theta_1x \) ) to fit, the result could be like this: $h_ \ theta (x ) = \ dfrac {5} { 33} x- \ frac {1} {3}$, is reflected in the picture:

$h_\theta(x)=\dfrac{1}{1+e^{-\theta^Tx}}=\dfrac{1}{1+e^{-\sum_{i=0}^n\theta_ix_i}} \tag{3.1}$

4.逻辑回归的代价函数（Cost Function）

$J(\theta)= \dfrac{1}{m}\sum_{i=1}^m[-yln(h_\theta(x))-(1-y)ln(1-h_\theta(x))] \tag{4.1}$

$Cost(h_\theta(x),y)=\begin{cases} -ln(h_\theta(x)),\quad &y = 1 \\ -(1-y)ln(1-h_\theta(x)), &y=0 \end{cases} \tag{4.2}$

4.2式和4.1式是等价的，依据4.2式，不难得出：当y=1时，如果判定为y=1(即$$h_\theta(x) = 1$$)，误差为0；如果误判为y=0($$即h_\theta(x) = 0$$)，误差将会是正无穷大。当y=0时，如果判定为y=0(即$$h_\theta(x) = 0$$)，误差为0；如果误判为y=1(即$$h_\theta(x) = 1$$)，误差将会是正无穷大。(注意：$$h_\theta(x) = 1$$表示y等于1的概率为1，等价于认为y=1；$$h_\theta(x) = 0$$表示y等于1的概率为0，等价于认为y=0)

$J(\theta)=-\dfrac{1}{m}Y^Tln(h_\theta(X))-(E-Y)^Tln(E-h_\theta(X)) \tag{4.3}$

5.逻辑回归使用梯度下降法

$\dfrac{\partial J(\theta)}{\theta_i} = \dfrac{1}{m}\sum_{j=1}^m(h_\theta(x^{(j)})-y^{(j)})x_i^{(j)} = \dfrac{1}{m}\sum_{j=1}^m(\dfrac{1}{1+e^{-\sum_{i=0}^n\theta_ix_i^{(j)}}}-y^{(j)})x_i^{(j)}\quad (i=0,1,\dots,n)\tag{5.1}$

$\theta_i = \theta_i-\alpha\dfrac{\partial J(\theta)}{\theta_i} = \theta_i-\alpha\dfrac{1}{m}\sum_{j=1}^m(h_\theta(x^{(j)})-y^{(j)})x_i^{(j)}\quad (i=0,1,\dots,n)\tag{5.2}$

$\dfrac{\partial J(\theta)}{\theta} = \dfrac{1}{m}X^T(h_\theta(X)-Y),\quad \theta=\theta-\alpha\dfrac{1}{m}X^T(h_\theta(X)-Y) \tag{5.3}$

6.多元逻辑回归

For multivariate logistic regression, one possible idea is to simplify its binary. For example, if the classification data set comprises 1,2,3 three categories. If you now want to determine if a sample is not a class 1, we can be seen as a set of data types ---- namely Class 1 and non-Class 1 (the Class 2 and Class 3), so that we can obtain for Class 1 hypothesis function \ ({H ^ (. 1)} _ \ Theta (X) \) , as well as empathy \ (h ^ {(2) } _ \ theta (x) \) and \ (h ^ {(3 )} _ \ Theta (X) \) . So that our decision rule becomes:

$if \quad max\{h^{(i)}_\theta(x)\} = h^{(j)}_\theta(x), then \quad y = j\quad(i,j=1,2,3) \tag{6.1}$

7. Summary

Although logistic regression with the "return" of the word, but in fact it is a classification algorithm. Logistic regression thinking and pattern recognition discriminant function is very similar, the two can combine learning.