Machine Learning (Machine Learning) - Andrew Ng (Andrew Ng) study notes (b)

Liner regression with one variable

Model Representation model representation

Supervised Learning Supervised Learning

Given the "right answer" for each example in the data. Each data, we are given the "right answer."

Regression Regression Problem

Predict real-valued output. We predict an accurate output value based on previous data.

Training set training set

Notation: Common notation

  1. m = Number The examples of training samples Training
  2. X = "INPUT" variable / input variable Feature / characteristic values
  3. Y = "Output" variable / "target" variable output variables / objective variable
  4. (X, Y) = a training sample Example One Training
  5. ( \ (X ^ i, Y ^ i \) ) = The \ (TH} {I_ \) Training Example i-th training sample

hypothesis hypothesis (a function )

  1. Training set-> Learning Algorithm-> hdataset "Hello" to the learning algorithm, the learning algorithm is a function of the output.

  2. x-> h-> yA Map from \ (x apos \) to \ (y apos \) is a mapping function from x to y..

  3. How do we represent h?

    \(h_{\theta}(x) = \theta_0 + \theta_1 \times x\).

Summery

Data sets of the role and function: a prediction about a linear function of x y

Cost function cost function

How the most likely straight line fit with our data

Idea

Choose \(\theta_0, \theta_1\) so that \(h_{\theta}(x)\) is close to \(y\) for our training examples (\(x, y\)).

Squared error function

\(J(\theta_0, \theta_1) = \frac{1}{2m} \sum_{i=1}^m(h_{\theta}(x^i) - y^i)^2\)

Goal: to find \ (\ theta_0, \ theta_1 \ ) so that \ (J (\ theta_0, \ theta_1) \) minimum. Wherein \ (J (\ theta_0, \ theta_1) \) called cost function

Cost function intuition

Review

  1. Hypothesis: \(h_{\theta}(x) = \theta_0 + \theta_1 \times x\)
  2. Parameters: \(\theta_0, \theta_1\)
  3. Cost Function: \(J(\theta_0, \theta_1) = \frac{1}{2m} \sum_{i=1}^m(h_{\theta}(x^i) - y^i)^2\)
  4. Goal: find \(\theta_0,\theta_1\) to minimize \(J(\theta_0, \theta_1)\)

Simplified

\(\theta_0 = 0 \rightarrow h_{\theta}(x) = \theta_1x\)

\(J(\theta_1) = \frac{1}{2m} \sum_{i=1}^m(h_{\theta}(x^i) - y^i)^2\)

Goal: find \(\theta_1\) to minimize \(J(\theta_1)\)

Examples: sample point comprising (1, 1), (2, 2), graph (3, 3) and the cost function is a function of assumed

Picture

Gradient descent gradient descent

Background

Have some function \(J(\theta_0, \theta_1, \theta_2, \ldots, \theta_n)\)

Want find \(\theta_0, \theta_1, \theta_2, \ldots, \theta_n\) to minimize \(J(\theta_0, \theta_1, \theta_2, \ldots, \theta_n)\)

Simplify -> \(\theta_1, \theta_2\)

Outline

  1. with some Start \ (\ theta_0, \ theta_1 \) ( \ (\ theta_0 = 0, \ theta_1 = 0 \) ). Initialization
  2. Changing the Keep \ (\ theta_0, \ theta_1 \) to the reduce \ (J (\ theta_0, \ theta_1) \) an until hopefully WE AT A Minimum End up. Constantly looking for the optimal solution, until you find the local optimal solution (from a different point the end result / departure orientation obtained may vary).

Picture2

Gradient descent algorithm

  1. repeat until convergence {

    \(\theta_j := \theta_j - \alpha\frac{\partial}{\partial\theta_j}J(\theta_0, \theta_1)\) (for \(j= 0\) and \(j = 1\))

    }

    变量含义:\(\alpha\): learning rate 学习速率(控制我们以多大的幅度更新这个参数\(\theta_j\)

  2. Correct: Simultaneous update 正确实现同时更新的方法

    \(temp0 := \theta_0 - \alpha\frac{\partial}{\partial\theta_j}J(\theta_0, \theta_1)\)

    \(temp1 := \theta_1 - \alpha\frac{\partial}{\partial\theta_j}J(\theta_0, \theta_1)\)

    \(\theta_0 := temp0\)

    \(\theta_1 := temp1\)

  3. Incorrect: 没有实现同步更新

    \(temp0 := \theta_0 - \alpha\frac{\partial}{\partial\theta_j}J(\theta_0, \theta_1)\)

    \(\theta_0 := temp0\)

    \(temp1 := \theta_1 - \alpha\frac{\partial}{\partial\theta_j}J(\theta_0, \theta_1)\)

    \(\theta_1 := temp1\)

Gradient descent intuition

导数项的意义

  1. \(\alpha\frac{\partial}{\partial\theta_j}J(\theta_0, \theta_1) > 0\) (即函数处于递增状态)时,\(\because \alpha > 0\)\(\therefore \theta_1 := \theta_1 - \alpha\frac{\partial}{\partial\theta_j}J(\theta_0, \theta_1) < 0\),即向最低点处移动。
  2. \(\alpha\frac{\partial}{\partial\theta_j}J(\theta_0, \theta_1) < 0\) (即函数处于递减状态)时,\(\because \alpha > 0\)\(\therefore \theta_1 := \theta_1 - \alpha\frac{\partial}{\partial\theta_j}J(\theta_0, \theta_1) > 0\),即向最低点处移动。

学习速率\(\alpha\)

  1. If \(\alpha\) is too small, gradient descent can be slow. \(\alpha\)太小,会使梯度下降的太慢。
  2. If \(\alpha\) is too large, gradient descent can overshoot the minimum. It may fail to converge, or even diverge. \(\alpha\)太大,梯度下降法可能会越过最低点,甚至可能无法收敛。

思考

  1. 假设你将\(\theta_1\)初始化在局部最低点,而这条线的斜率将等于0,因此导数项等于0,梯度下降更新的过程中就会有\(\theta_1 = \theta_1\)

  2. Gradient descent can converge to a local minimum, even with the learning rate \(\alpha\) fixed. 即使学习速率\(\alpha\)固定不变,梯度下降也可以达到局部最小值。

    As we approach a local minimum, gradient descent will automatically take smaller steps. So, no need to decrease \(\alpha\) over time. 在我们接近局部最小值时,梯度下降将会自动更换为更小的步子,因此我们没必要随着时间的推移而更改\(\alpha\)的值。(因为斜率在变)

Gradient descent for linear regression 梯度下降在线性回归中的应用

化简公式

$\frac{\partial}{\partial\theta_j}J(\theta_0, \theta_1) = \frac{\partial}{\partial\theta_j} \frac{1}{2m} \sum_{i = 1}^m(h_\theta(x^i) - y^i)^2 = \frac{\partial}{\partial\theta_j} \frac{1}{2m} \sum_{i = 0}^m(\theta_0 + \theta_1x^i - y^i)^2 $

分别对\(\theta_0,\theta_1\)求偏导

  1. \(j = 0\) : $ \frac{\partial}{\partial\theta_0}J(\theta_0, \theta_1) = \frac{1}{m} \sum_{i=1}^m(h_{\theta}(x^i) - y^i) $
  2. \(j = 1\) : $ \frac{\partial}{\partial\theta_1}J(\theta_0, \theta_1) = \frac{1}{m} \sum_{i=1}^m(h_{\theta}(x^i) - y^i) \times x^i $

Gradient descent algorithm result back to the top of the gradient descent

repeat until convergence {

​ $\theta_0 := \theta_0 - \alpha \frac{1}{m} \sum_{i=1}^m(h_{\theta}(x^i) - y^i) $

​ $\theta_1 := \theta_1 - \alpha \frac{1}{m} \sum_{i=1}^m(h_{\theta}(x^i) - y^i) \times x^i $

}

"Batch" Gradient Descent batch gradient descent

"Batch": Each step of gradient descent uses all the training examples for each iteration step, all the data to be used in the training set.

Guess you like

Origin www.cnblogs.com/songjy11611/p/12173201.html