03 Logistic Regression

Binary Classification

Define

  1. Sigmoid Function Logistic Function
    \[ h_\theta(x) = g(\theta^Tx) \]
    \[ z = \theta^Tx \]
    \[ 0 <= g(z) = \frac{1}{1 + e^{-z}} <= 1 \]

  2. \( h_\theta(x) \) the probability that the output is 1.

  3. \( h_\theta(x) = P(y = 1 | x; \theta) \)

  4. \( P(y = 0 | x; \theta) + P(y = 1 | x; \theta) = 1 \)

  5. 0.5 set to the decision boundary, the \ (h_ \ theta (x) = 0.5 <==> \ theta ^ Tx = 0 \)

    Cost Function

  • \[J(\theta) = \dfrac{1}{m} \sum_{i=1}^m \mathrm{Cost}(h_\theta(x^{(i)}),y^{(i)})\]
  • \[ \mathrm{Cost}(h_\theta(x),y) = -\log(h_\theta(x)) ,\text{if y = 1} \]
  • \[ \mathrm{Cost}(h_\theta(x),y) = -\log(1-h_\theta(x)), \text{if y = 0} \]
  • \[ Cost(h_\theta(x), y) = -ylog(h_\theta(x)) - (1 - y)log(1 - h_\theta(x)) \]

Algorithm

\(\begin{align*} & Repeat \; \lbrace \newline & \; \theta_j := \theta_j - \frac{\alpha}{m} \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)}) x_j^{(i)} \newline & \rbrace \end{align*}\)

  • Although the decrease but both with the same gradient (h_ \ theta (x) \) defined \ not the same
  • Logistic regression can be used to accelerate the convergence rate by scaling feature

  1. Algorithm can be used to calculate the \ (\ Theta \) of

    • Gradient descent
    • Conjugate gradient
    • BFGS conjugate gradient method (variable metric method)
    • L-BFGS variable metric method limits
    • Characteristics of the three algorithms

    Advantages:
    a.no need to manually pick \(\alpha\)
    b.often faster than gradient descent
    Disadvantages:
    More complex

  2. Octave optimization algorithm

%exitFlag: 1 收敛
%R(optTheta) >= 2
options = optimset(‘GradObj’, ‘on’, ‘MaxIter’, ‘100’);
initialTheta = zeros(2, 1);
[optTheta, functionVal, exitFlag] ...
    = fminumc(@costFunction, initialTheta, options);
    
%costFunction:
function [jVal, gradient] = costFunction(theta)
    jVal = ... %cost function
    gradient = zeros(n, 1); %gradient
    
    gradient(1) = ...
    ...
    gradient(n) = ...

Multi-class classification

one-vs-all one-vs-rest

  • Train a logistic regression classifier \(h_\theta^{(i)}(x)\) for each class \(i\) to predict the probability that \(y = i\).
  • On a new input \(x\), to make a prediction, pick the class \(i\) that maximizes \(\max \limits_ih_\theta^{(i)}(x)\).

Guess you like

Origin www.cnblogs.com/QQ-1615160629/p/03-Logistic-Regression.html