1 Logistic Regression

It is a classification algorithm although it has the name of regression

1.1 Differences between Logistic Regression and Linear Regression

Logistic Regression	Linear Regression
clssification algorithm	regression algorithm
$0≤h_{\theta}(x)≤1$	$h_{\theta}(x)$ can be $> 1$ or $< 0$

1.2 Model

Hypothesis： $\begin{aligned} h_\theta(x)&=P(y=1|x;\theta)=g(\theta^Tx)&\text{【estimated probability that y=1, given x, parameterized by $\theta$】}\\ g(z)&=\frac{1}{1+e^{-z}}&\text{【Sigmoid Function，Logistic Function】} \end{aligned}$

suppose predict：
“ $y = 1$ ” if $h_{\theta}(x)≥0.5（\theta^Tx≥0）$
“ $y = 0$ ” if $h_{\theta}(x)<0.5（\theta^Tx<0）$

import numpy as np
def sigmoid(z):
	return 1 / (1 + np.exp(-z))

Parameters： $\theta$
Decision Boundary：is a property not of the training set but of the hypothesis and of the patameters
Cost Function：square error function / square error cost function $\begin{aligned} J(\theta)&=\frac{1}{m}\sum_{i=1}^mCost(h_\theta(x^{(i)},y))\\ Cost(h_\theta(x,y))&=\begin{cases} -log(h_{\theta}(x))&,\text{if $y=1$}\\ -log(1-h_{\theta}(x))&,\text{if $y=0$}\end{cases}\\ 变式： Cost(h_\theta(x,y))&=-ylog(h_{\theta}(x))-(1-y)log(1-h_{\theta}(x))&\text{【y=1 or 0】} \end{aligned}$

import numpy as np
def cost(theta, X, y):
	theta = np.matrix(theta)
	X = np.matrix(X)
	y = np.matrix(y)
	first = np.multiply(-y, np.log(sigmoid(X* theta.T)))
	second = np.multiply((1-y), np.log(1 - sigmoid(X* theta.T)))
	return np.sum(first - second) / (len(X))

Goal（Object Function）： $\mathop{\text{minimize}}\limits_{\theta} J(\theta)$

1.3 use Gradient Descent for $J(\theta)$

repeat{
$\theta_j:=\theta_j-\alpha\frac{1}{m}\sum_{i=1}^m((h_\theta(x^{(i)})-y^{(i)})·x_j^{(i)})$
(simultaneously update all $\theta_j$ )
}

1.4 Advanced Optimization replace for Gradient Descent

Optimization Algorithms：

Conjugate gradient（共轭梯度）
BFGS（局部优化，Broyden Fletcher Goldfarb Shann）
L-BFGS（有限内存局部优化）

Advantages：

No need to manually pick $\alpha$
Often faster than gradient descent

Disadvantages：more complex

Octave代码
function [jval, gradient] = costFunction(theta)
	jVal = [...code to compute J(theta)...];
	gradient = [... code to compute derivative of J(theta)...];
end
options = optimset('GradObj', 'on', 'MaxIter', '100');
initialTheta = zeros(2,1);

[optTheta, functionVal, exitFlag] = fminunc(@costFunction, initialTheta, options);

2 Multi-class classification：One-vs-all

One-versus-all Classification / One-versus-rest
Train a logistic regression classifier $h_\theta^{(i)}(x)$ for each class $i$ to predict the probability that $y = i$
On a new input $x$ to make a prediction, pick the class $i$ that maximizes $\mathop{\text{max}}\limits_{i} h_\theta^{(i)}(x)$

就是当解决多类别分类问题时，每次只分类一个类别A，而将其他类别都看作是一个类别B

3 Reference

吴恩达机器学习 coursera machine learning
黄海广机器学习笔记

【机器学习】2 逻辑回归

第二章逻辑回归

1 Logistic Regression

1.1 Differences between Logistic Regression and Linear Regression

1.2 Model

1.3 use Gradient Descent for $J(\theta)$

1.4 Advanced Optimization replace for Gradient Descent

2 Multi-class classification：One-vs-all

3 Reference

猜你喜欢

【机器学习】2 逻辑回归

第二章 逻辑回归

1 Logistic Regression

1.1 Differences between Logistic Regression and Linear Regression

1.2 Model

1.3 use Gradient Descent for J ( θ ) J(\theta) J(θ)

1.4 Advanced Optimization replace for Gradient Descent

2 Multi-class classification：One-vs-all

3 Reference

猜你喜欢

第二章逻辑回归

1.3 use Gradient Descent for $J(\theta)$