Logistic Regression详细剖析

Logistic Regression实现python代码：Logistic Regression作为二分类器的手写代码 vs 调用sklearn库

数据集中的一组测试数据： $X^{i}$ = $(x_{0}, x_{1}, x_{2}, ..., x_{n})$ ，其对应的一个输出 $Y^{i}$ ，i=1, 2, ..., m

要学习的参数为： $\Theta = (\theta _{0}, \theta _{1}, \theta _{2}, ..., \theta _{n})$

$Z=\theta_{_{0}}x_{0}+\theta _{_{1}}x_{1}+\theta _{_{2}}x_{2}+...+\theta _{_{n}}x_{n}$ ，其向量表示为 $Z=\Theta ^{T}X$

Sigmoid函数： $g(Z)=\frac{1}{1+e^{-Z}}$

预测函数： $h_{\theta }(X)=g(\Theta ^{T}X)=\frac{1}{1+e^{-\Theta ^{T}X}}$

根据Sigmoid函数的性质，可以知道 $0< h_{\theta }(X)< 1$

用概率表示： $if Y=1:h_{\theta }(X)=P(Y=1|X;\Theta )$ $if Y=0:1-h_{\theta }(X)=P(Y=0|X;\Theta )$

损失函数： $cost(h_{\theta }(X),Y)=\left\{\begin{matrix} -log(h_{\theta }(X)) if Y=1 &\\ -log(1-h_{\theta }(X))if Y=0& \end{matrix}\right.$ ，此处的log是以e为底的

如果分类结果正确的话，那么cost就应该小，而 $h_{\theta }(X)$ 是增函数，而 $log(h_{\theta }(X))$ 也是增函数，而 $-log(h_{\theta }(X))$ 是一个减函数，也就是当这一组数据得到的估计值越是接近正确的数值那么其cost就越小，相反亦是如此。

损失函数可以进一步变化为一个式子： $cost(h_{\theta }(X),Y)=-[Ylog(h_{\theta }(X))+(1-Y)log(1-h_{\theta }(X))]$

对于所有训练集来说，共同造成的损失函数的总和 $J(\Theta )=\sum_{i=1}^{m}cost(h_{\theta }(X^{(i)}),Y^{(i)})$

$J(\Theta )=-\sum_{i=1}^{m}\begin{bmatrix} Y^{(i)}log(h_{\theta }(X^{(i)}))+(1-Y^{(i)})log(1-h_{\theta }(X^{(i)})) \end{bmatrix}$ 接下来要做的就是最小化损失函数 $J(\Theta )$ ：

梯度下降算法： $\theta _{j}:=\theta _{j}-\alpha \frac{\partial J(\Theta )}{\partial \theta _{j}}$ 其中α是学习率，表示步长，一次下降多少。

def sigmoid(x):
    return 1.0 / (1 + np.exp(-x))

def gradientDescent(x, y, theta, alpha, m, numIterations):
    xTrans = x.transpose()
    for i in range(0, numIterations):
        hypothesis = sigmoid(np.dot(x, theta))
        loss = hypothesis - y
        gradient = np.dot(xTrans, loss) / m
        theta = theta - alpha * gradient
    return theta
#x：输入的样例
#y：输出结果
#theta：θ要学习的参数
#alpha：α学习率
#m：有m个样例
#numIterations：下降次数

Logistic Regression详细剖析

猜你喜欢