吴恩达《机器学习》--- Logistic分类

Logistic分类应用于二分类问题，即给定特征 $X$ ， $y$ 为0或者1.

模型假设

$h_\theta(x) = g(\theta^Tx)$
$z = \theta^Tx$
$g(z) = \frac{1}{1 + e^{-z}}$
$g(z)$ 函数如下：
这里写图片描述
$h_\theta(x)$ 取值范围是 $(0, 1 )$ ，其含义为 $y = 1$ 的概率，即 $h_\theta(x) = P(y = 1|x, \theta) = 1- P(y = 0|x, \theta)$ .

决策边界

上面提到 $h_\theta(x)$ 表示y = 1的概率，当其大于0.5时，我们可以预测y = 1，当其小于0.5时，我们可以预测y = 0，对于等于0.5的情况，我们可以约定 y = 1。那么由上图可知，当 $\theta^Tx >= 0$ 时， $y = 1;$ 当 $\theta^Tx < 0$ 时， $y = 0;$
对于线性的情况 $\theta^Tx = \theta_0 + \theta_1x_1 + \theta_2x_2$ 可以得到如下所示的决策边界：
这里写图片描述
对于非线性情况 $\theta^Tx = \theta_0 + \theta_1x_1 + \theta_2x_2 + \theta_3x_1^2 + \theta_4x_2^2$ 可以得到如下所示的决策边界：

对于更复杂的情况，可以通过 $\theta^Tx$ 的复杂的多项式来实现。
Logistic分类的目标就是找到决策边界。

误差函数

这里写图片描述
于是有：

将两种情况结合起来：

得到误差函数为：

梯度下降法

使用梯度下降法对误差函数最小化求解
这里写图片描述
下面推导 $\frac{\partial J(\theta)}{\partial \theta_j}$ :
在正式推导之前先推导 $g(z) = \frac{1}{1 + e^{-z}}$ 的导数,
$\frac{\partial g(z)}{\partial z} = -1 * \frac{1}{(1 + e^{-z})^2}*e^{-z}*-1=\frac{e^{-z}}{(1 + e^{-z})^2} = g(z)(1-g(z))$
下面正式推导

\begin{array}{rcl} (1) & \frac{\partial J (θ)}{\partial θ_{j}} & = & - \frac{1}{m} \sum_{i = 1}^{m} (y_{i} \frac{1}{h_{θ} (x_{i})} h_{θ} (x_{i}) (1 - h_{θ} (x_{i})) x_{i}^{j} + (1 - y_{i}) \frac{1}{1 - h_{θ} (x_{i})} (- h_{θ} (x_{i}) (1 - h_{θ} (x_{i}))) x_{i}^{j}) \\ (2) & = & - \frac{1}{m} \sum_{i = 1}^{m} (y_{i} (1 - h_{θ} (x_{i})) x_{i}^{j} + (1 - y_{i}) (- h_{θ} (x_{i})) x_{i}^{j}) \\ (3) & = & - \frac{1}{m} \sum_{i = 1}^{m} (y_{i} x_{i}^{j} - y_{i} h_{θ} (x_{i}) x_{i}^{j} - h_{θ} (x_{i}) x_{i}^{j} + y_{i} h_{θ} (x_{i}) x_{i}^{j}) \\ (4) & = & \frac{1}{m} \sum_{i = 1}^{m} (h_{θ} (x_{i}) - y_{i}) x_{i}^{j} \end{array}

$\begin{eqnarray} \frac{\partial J(\theta)}{\partial \theta_j} &=&-\frac{1}{m}\sum_{i=1}^m(y_i\frac{1}{h_\theta(x_i)}h_\theta(x_i)(1-h_\theta(x_i))x_i^j+(1-y_i)\frac{1}{1-h_\theta(x_i)}(-h_\theta(x_i)(1-h_\theta(x_i)))x_i^j) \\ &=&-\frac{1}{m}\sum_{i=1}^m(y_i(1-h_\theta(x_i))x_i^j+(1-y_i)(-h_\theta(x_i))x_i^j)\\ &=&-\frac{1}{m}\sum_{i=1}^m(y_ix_i^j-y_ih_\theta(x_i)x_i^j-h_\theta(x_i)x_i^j+y_ih_\theta(x_i)x_i^j)\\ &=&\frac{1}{m}\sum_{i=1}^m(h_\theta(x_i)-y_i)x_i^j \end{eqnarray}$
由此梯度下降算法为：
这里写图片描述

优化算法

比如共轭梯度法什么的，吴老师都认为超纲了，还是让专业的搞数值算法的人去弄吧，我们需要的是知道怎么调用接口。

多分类

logistic分类可用于二分类，通过1-vs-all方法，即针对每一个类别，将训练集分为此类和非此类两类，进而使用logistic进行分类，共得到k个分类器（假设有k类），对于任意输入，将其输入该k个分类器，而后选择概率最高的那个。

softmax

等待填坑ing…

过拟合

使用了过多的特征，是的误差函数在训练集上很小，但是对于新的实例泛化能力较差。正则化可用于克服过拟合问题，实现方法是在误差函数的基础上加上 $\lambda$ 倍的 $\theta$ 的2范数或者1范数（不含 $\theta_0$ ）。

sklearn logistic

sklearn中正则化实现的误差函数如下：
这里写图片描述

上述中C越大，w的值影响越小，正则化的作用越弱；反之，C越小，正则化作用越强。
LogisticRegression参数C决定了正则化的强度，penalty确定误差函数添加的是2范数还是1范数，tol是优化是收敛的阈值。
示例代码如下：

扫描二维码关注公众号，回复： 4730780 查看本文章

import numpy as np 
import matplotlib.pyplot as plt 

from sklearn.linear_model import LogisticRegression
from sklearn import datasets
from sklearn.preprocessing import StandardScaler

digits  = datasets.load_digits()

X, y = digits.data, digits.target
X = StandardScaler().fit_transform(X)

y = (y > 4).astype(np.int)

for i, C in enumerate((100, 1, 0.01)):
    clf_l1_LR = LogisticRegression(C = C, penalty='l1', tol=0.01)
    clf_l2_LR = LogisticRegression(C = C, penalty='l2', tol=0.01)
    clf_l1_LR.fit(X, y)
    clf_l2_LR.fit(X, y)

    coef_l1_LR = clf_l1_LR.coef_.ravel()
    coef_l2_LR = clf_l2_LR.coef_.ravel()

    sparsity_l1_LR = np.mean(coef_l1_LR == 0) * 100
    sparsity_l2_LR = np.mean(coef_l2_LR == 0) * 100

    print("C = %.2f" % C)
    print("Sparsity with L1 penalty: %.2f%%" % sparsity_l1_LR)
    print("Score with L1 penalty: %.4f" % clf_l1_LR.score(X, y))
    print("Sparsity with L2 penalty: %.2f%%" % sparsity_l2_LR)
    print("Score with L2 penalty: %.4f" % clf_l2_LR.score(X, y))

    l1_plot = plt.subplot(3, 2, 2 * i + 1)
    l2_plot = plt.subplot(3, 2, 2 * (i + 1))

    if i == 0:
        l1_plot.set_title("L1 penalty")
        l2_plot.set_title("L2 penalty")

    l1_plot.imshow(np.abs(coef_l1_LR.reshape(8, 8)), interpolation='nearest',
                   cmap='binary', vmax=1, vmin=0)
    l2_plot.imshow(np.abs(coef_l2_LR.reshape(8, 8)), interpolation='nearest',
                   cmap='binary', vmax=1, vmin=0)
    plt.text(-8, 3, "C = %.2f" % C)

    l1_plot.set_xticks(())
    l1_plot.set_yticks(())
    l2_plot.set_xticks(())
    l2_plot.set_yticks(())

plt.show()

效果：
这里写图片描述
多分类问题实例
multinomial logistic regression(softmax regression) && 1-vs-rest:

import numpy as np 
import matplotlib.pyplot as plt 
from sklearn.datasets import make_blobs
from sklearn.linear_model import LogisticRegression

centers = [[-5, 0], [0, 1.5], [5, -1]]
X, y = make_blobs(n_samples = 1000, centers = centers, random_state = 40)
transformation = [[0.4, 0.2], [-0.4, 1.2]]
X = np.dot(X, transformation)

for multi_class in ('multinomial', 'ovr'):
    clf = LogisticRegression(solver='sag', max_iter=100, random_state=42, multi_class=multi_class).fit(X, y)
    print("training score : %.3f(%s)" % (clf.score(X, y), multi_class))

    h = .02
    x_min, x_max = X[:, 0].min()-1, X[:, 0].max()+1
    y_min, y_max = X[:, 1].min()-1, X[:, 1].max()+1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z =Z.reshape(xx.shape)
    plt.figure()
    plt.contourf(xx, yy, Z, cmap=plt.cm.Paired)
    plt.title("Decision surface of LogisticRegression (%s)" % multi_class)
    plt.axis('tight')

    colors = 'bry'
    for i, color in zip(clf.classes_, colors):
        idx = np.where(y == i)
        plt.scatter(X[idx,0], X[idx, 1], c=color, cmap=plt.cm.Paired, edgecolor='black', s=20)

    xmin, xmax = plt.xlim()
    ymin, ymax = plt.ylim()
    coef = clf.coef_
    intercept = clf.intercept_

    def plot_hyperplane(c, color):
        def line(x0):
            return (-(x0 * coef[c, 0])- intercept[c])/coef[c,1]
        plt.plot([xmin, xmax], [line(xmin), line(xmax)], ls="--", color=color)

    for i, color in zip(clf.classes_, colors):
        plot_hyperplane(i, color)

plt.show()

这里写图片描述
最最后：欢迎关注微信公众号“翰墨知道”