Andrew Ng machine learning introductory notes 4- logistic regression

4 logistic regression

An output value of the logistic regression function is a sigmoid function is assumed, the larger the range of change pressed into the (0,1), the pressing function is also known as
\ [h_ \ theta (x) = \ frac {1} { } \ {4.1} Tag \] - 1 + {E ^ \ Theta the Tx} ^
\ (H_ \ Theta (x) \) Y = 1 when the input represents the probability of x

$[Image dump outer link failure (img-xU4i1Vzg-1568602297699) (E: \ Artificial Intelligence Markdown \ Machine Learning \ pictures \ 4 logistic regression function .png)]$

4.1 decision boundary

If the predetermined \ (H_ \ Theta (X) \ ge0.5 \) Y = time. 1, \ (H_ \ Theta (X) <0.5 \) when y = 0, it can be concluded when the \ (\ theta ^ Tx \ GE0 \) Y =. 1, the when the \ (\ Theta the Tx ^ <0 \) Y = 0 when

If the fit parameter \ (\ Theta \) after, \ (\ Theta the Tx ^ \) configuration decision boundary

Decision boundary is not a training set of attributes, when a given parameter \ (\ theta \) after the decision of the decision boundary

4.2 Sample single cost function

If the cost function using linear regression, Sigmoid function resulting in non-convex function, gradient descent method will be local optimum.
\ [\ Text {Cost} ( h_ \ theta (x), y) = \ begin {cases} -log (h_ \ theta (x)), & \ text {if} \ y = 1 \\ -log (1 -h_ \ theta (x)), & \ text {if} \ y = 0 \ end {cases} \ tag {4.2} \]
$[Image dump outer link failure (img-GQt5mHAD-1568602297700) (E: \ Artificial Intelligence Markdown \ Machine Learning \ pictures \ 4.2 cost function y = 1.png)]$

$[Image dump outer link failure (img-iYpAHOUL-1568602297702) (E: \ Artificial Intelligence Markdown \ Machine Learning \ pictures \ 4.2 cost function y = 0.png)]$

Logistic regression cost function 4.3 Functions

\[ \begin{aligned} J(\theta) &=\frac{1}{m} \sum_{i=1}^{m} \operatorname{cost}(h_{\theta}(x^{(i)}), y^{(i)}) \\ &=-\frac{1}{m}[\sum_{i=1}^{m} y^{(i)} \log h_{\theta}(x^{(i)})+(1-y^{(i)}) \log (1-h_{\theta}(x^{(i)}))] \end{aligned}\tag{4.3} \]

Then different algorithms to minimize the cost function

4.3.1 gradient descent

\[ \begin{aligned} \theta_j&=\theta_j-\alpha\frac{\partial}{\partial\theta_j}J(\theta)\\ &=\theta_j-\alpha\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)} \end{aligned}\tag{4.4} \]

Multiple linear regression with a gradient descent method except that different assumptions functions
Wherein when a large range, can be scaled using the same feature that the gradient descent converge faster

4.3.2 Other advanced algorithms

Conjugate Gradient Method
BFGS
L-BFGS

No need to manually select the learning rate and convergence rate is higher than the gradient descent, but more complex algorithm

More than 4.4 classification

Each extraction as a positive class category, the remaining negative type, repeated several times to obtain a plurality hypothesis as a function of a plurality of classifiers

When a new sample forecast, were used to predict each classifier, and a summary of all the results, most of the classification results as a prediction result of the new sample