分类和逻辑回归(Classification and logistic regression)

版权声明:本文为博主原创文章,采用“署名-非商业性使用-禁止演绎 2.5 中国大陆”授权。欢迎转载,但请注明作者姓名和文章出处。 https://blog.csdn.net/njit_77/article/details/84452045

看了一下斯坦福大学公开课:机器学习教程(吴恩达教授),记录了一些笔记,写出来以便以后有用到。笔记如有误,还望告知。
本系列其它笔记:
线性回归(Linear Regression)
分类和逻辑回归(Classification and logistic regression)
广义线性模型(Generalized Linear Models)

分类和逻辑回归(Classification and logistic regression)

1 逻辑回归(Logistic regression)

h θ ( x ) = g ( θ T x ) = 1 1 + e θ T x h_{\theta}(x) = g(\theta^Tx) = \frac{1}{1+e^{-\theta^Tx}} , g ( z ) = 1 1 + e z g(z) = \frac{1}{1+e^{-z}} (logistic function / sigmoid function)

p ( y = 1 x ; θ ) = h θ ( x ) p(y=1|x;\theta) = h_\theta(x)

p ( y = 0 x ; θ ) = 1 h θ ( x ) p(y=0|x;\theta) = 1 - h_\theta(x)

p ( y x ; θ ) = ( h θ ( x ) ) y ( 1 h θ ( x ) ) 1 y p(y|x;\theta) = (h_\theta(x))^y(1 - h_\theta(x))^{1-y}
L ( θ ) = p ( y X ; θ ) = i = 1 m p ( y ( i ) x ( i ) ; θ ) = i = 1 m ( h θ ( x ( i ) ) ) y ( i ) ( 1 h θ ( x ( i ) ) ) 1 y ( i ) ( θ ) = log L ( θ ) = log i = 1 m ( h θ ( x ( i ) ) ) y ( i ) ( 1 h θ ( x ( i ) ) ) 1 y ( i ) = i = 1 m log ( ( h θ ( x ( i ) ) ) y ( i ) ( 1 h θ ( x ( i ) ) ) 1 y ( i ) ) = i = 1 m ( log ( ( h θ ( x ( i ) ) ) y ( i ) + log ( 1 h θ ( x ( i ) ) ) 1 y ( i ) ) = i = 1 m ( y ( i ) log ( h θ ( x ( i ) ) ) + ( 1 y ( i ) ) log ( 1 h θ ( x ( i ) ) ) ) L(\theta) = p(\vec y | X;\theta) \\ = \prod_{i=1}^{m} p(y^{(i)} | x^{(i)}; \theta) \\ = \prod_{i=1}^{m}(h_\theta(x^{(i)}))^{y^{(i)}}(1 - h_\theta(x^{(i)}))^{1-y^{(i)}} \\ \Downarrow \\ \ell(\theta) = \log L(\theta) \\ = \log \prod_{i=1}^{m}(h_\theta(x^{(i)}))^{y^{(i)}}(1 - h_\theta(x^{(i)}))^{1-y^{(i)}} \\ = \sum_{i=1}^{m} \log ((h_\theta(x^{(i)}))^{y^{(i)}}(1 - h_\theta(x^{(i)}))^{1-y^{(i)}}) \\ = \sum_{i=1}^{m}(\log ((h_\theta(x^{(i)}))^{y^{(i)}} + \log (1 - h_\theta(x^{(i)}))^{1-y^{(i)}}) \\ = \sum_{i=1}^{m}(y^{(i)} \log (h_\theta(x^{(i)})) + (1-y^{(i)}) \log (1 - h_\theta(x^{(i)})))
最大化 L ( θ ) L(\theta) θ   : = θ + α   θ ( θ )   ( + h θ ( x ) ) \theta \ := \theta + \alpha \ \nabla_{\theta}\ell(\theta) \ (此处+,与前面学习梯度下降算法的-不同,因为h_\theta(x)不同)
θ j ( θ ) = θ j i = 1 m ( y ( i ) log ( h θ ( x ( i ) ) ) + ( 1 y ( i ) ) log ( 1 h θ ( x ( i ) ) ) ) = i = 1 m θ j ( y ( i ) log ( h θ ( x ( i ) ) ) + ( 1 y ( i ) ) log ( 1 h θ ( x ( i ) ) ) ) = i = 1 m ( y ( i ) h θ ( x ( i ) ) θ j h θ ( x ( i ) ) + 1 y ( i ) 1 h θ ( x ( i ) ) θ j ( 1 h θ ( x ( i ) ) ) ) = i = 1 m ( y ( i ) h θ ( x ( i ) ) θ j h θ ( x ( i ) ) 1 y ( i ) 1 h θ ( x ( i ) ) θ j ( h θ ( x ( i ) ) ) ) = i = 1 m y ( i ) h θ ( x ( i ) ) h θ ( x ( i ) ) ( 1 h θ ( x ( i ) ) ) θ j h θ ( x ( i ) ) { n o t e 1 : θ j h θ ( x ( i ) ) = h θ ( x ( i ) ) ( 1 h θ ( x ( i ) ) ) θ j θ T x ( i ) = h θ ( x ( i ) ) ( 1 h θ ( x ( i ) ) x j ( i ) } = i = 1 m ( y ( i ) h θ ( x ( i ) ) ) x j ( i ) \left.\frac{\partial}{\partial\theta_j}\right.\ell(\theta) = \left.\frac{\partial}{\partial\theta_j}\right.\sum_{i=1}^{m}(y^{(i)} \log (h_\theta(x^{(i)})) + (1-y^{(i)}) \log (1 - h_\theta(x^{(i)}))) \\ = \sum_{i=1}^{m} \left.\frac{\partial}{\partial\theta_j}\right.(y^{(i)} \log (h_\theta(x^{(i)})) + (1-y^{(i)}) \log (1 - h_\theta(x^{(i)}))) \\ = \sum_{i=1}^{m} (\frac{y^{(i)}}{h_\theta(x^{(i)})} \left.\frac{\partial}{\partial\theta_j}\right.h_\theta(x^{(i)}) + \frac{1 - y^{(i)}}{1 - h_\theta(x^{(i)})} \left.\frac{\partial}{\partial\theta_j}\right.(1 - h_\theta(x^{(i)}))) \\ = \sum_{i=1}^{m} (\frac{y^{(i)}}{h_\theta(x^{(i)})} \left.\frac{\partial}{\partial\theta_j}\right.h_\theta(x^{(i)}) - \frac{1 - y^{(i)}}{1 - h_\theta(x^{(i)})} \left.\frac{\partial}{\partial\theta_j}\right.(h_\theta(x^{(i)}))) \\ = \sum_{i=1}^{m} \frac{y^{(i)} - h_\theta(x^{(i)})}{h_\theta(x^{(i)})(1 - h_\theta(x^{(i)}))} \left.\frac{\partial}{\partial\theta_j}\right.h_\theta(x^{(i)}) \\ \lbrace note1:\left.\frac{\partial}{\partial\theta_j}\right.h_\theta(x^{(i)}) = h_\theta(x^{(i)})(1 - h_\theta(x^{(i)})) \left.\frac{\partial}{\partial\theta_j}\right. \theta^{T}x^{(i)} = h_\theta(x^{(i)})(1 - h_\theta(x^{(i)}) x_{j}^{(i)} \rbrace \\ = \sum_{i=1}^{m} (y^{(i)} - h_\theta(x^{(i)}))x_{j}^{(i)}
θ j   : = θ j + α   i = 1 m ( y ( i ) h θ ( x ( i ) ) ) x j ( i ) \theta_{j} \ := \theta_{j} + \alpha \ \sum_{i=1}^{m} (y^{(i)} - h_\theta(x^{(i)}))x_{j}^{(i)}

2 感知器学习算法(Digression: The perceptron learning algorithm)

定义g(z)函数:
g ( z ) = { 1 i f   z 0 0 i f   z 0 g(z) = \left\{\begin{array}{cc} 1 \quad if \ z\geq 0 \\ 0 \quad if \ z\leq 0 \end{array}\right.
如果我们让 h θ x = g ( θ T x ) h_{\theta}{x} = g({\theta^{T}x)} ,那么可得到 θ j   : = θ j + α ( y ( i ) h θ ( x ( i ) ) ) x j ( i ) \theta_{j} \ := \theta_{j} + \alpha(y^{(i)} - h_\theta(x^{(i)}))x_{j}^{(i)} (感知器学习算法)。

3 牛顿法最大化 ( θ ) \ell(\theta) (Another algorithm for maximizing ( θ ) \ell(\theta)

函数 f ( θ ) f(\theta) 找一个 θ \theta 使得 f ( θ ) = 0 f(\theta) = 0 ,牛顿法执行以下操作:
θ : = θ f ( θ ) f ( θ ) . \theta := \theta - \frac{f(\theta)}{f'(\theta)}.
那么我们如何找打一个 θ \theta 使得函数 ( θ ) \ell(\theta) 值最大?我们需要是 ( θ ) = 0 \ell'(\theta) = 0 (不论 ( θ ) \ell(\theta) 最大值或者最小值,其 ( θ ) = 0 \ell'(\theta)=0 ,极值在导函数拐点处),使用牛顿法可得出以下结论:
θ : = θ ( θ ) ( θ ) . \theta := \theta - \frac{\ell'(\theta)}{\ell''(\theta)}.
在逻辑回归设置中, θ \theta 是一个向量。因此牛顿法中也需满足此条件。
θ : = θ H 1 θ ( θ ) . H i j = 2 ( θ ) θ i θ j . \theta := \theta - H^{-1}\nabla_{\theta}\ell(\theta). \\ H_{ij} = \frac{\partial^{2}\ell(\theta)}{\partial\theta_{i}\partial\theta_{j}}.

猜你喜欢

转载自blog.csdn.net/njit_77/article/details/84452045