4.10 The second day of learning artificial intelligence Linear regression gradient descent normal equations

notes

General procedure for solving linear regression problems

1 搜集数据 (x1,x2,...,xn,y) n个体征变量xi 一个预测值y
2 设计回归方程 h=Sigema(thetai*xi)
3 度量误差的函数 用最小二乘法度量 J(theta)=1/2 Sigema(h(x)-y)
4 找到一组theta 使得J最小

One of the ways to implement step 4 is gradient descent

1 初始化theta 
2 调整theta 采用减等于梯度的相反值乘以 学习速率 

根据样本两种调整theta的策略
1 每次根据所有的样本求出 J :batch gradient descent 
2 把样本分成m组,每次用一组修正一次theta :stochastic gradient descent

Method 2 of Implementing Step 4 System of Equations

Record

1.linear regression

监督学习 自动驾驶
先是人驾驶,让AI学习(根据路况和人对于方向盘的转换),然后AI开车
第一个监督学习算法
房价预测
    数据:房子的大小和价格(卧室数目)
    定义符号
        m:样本数目
        x:标识输入的特征,这里表示,x1房子的大小 x2是卧室数目
        y:表示输出变量,目标变量
        (x,y): 表示一组样例
        ith :表示第i组样例
        n: 表示输入特征的个数
        theta: 表示参数(系数) 都是实数
    监督学习的设计流程
        找到一个训练集合
        找到一种算法
        输出函数 h
        通过h预测新给的数据的输出
    我们假设 h(x)=theta0*x0+theta1*x1+theta2*x2 //假设x0=1
        h(x)=sigema thetai*xi
    我们需要做的是,选取一些 theta 使预测尽可能准确J(theta)=1/2×(h(x)-y)^2尽量小

The first way to choose sigema search algorithm search algorithm

start with some value of the parameter vector theta 可以是0
then keep changing the parameter vector theta 
    to reduce J of theta a little bit

2.gradient descent can implement the above algorithm

batch gradient descent 批梯度下降算法
    on every step of gradient descent you're going to look at your entire
    training set

this is the 3D shape of ,like a hill in some park.
So imagine you're actually standing physically at the position of that star,
of a cross. image you can stand on that hill.right and look all 360 degrees
around,you and ask, if I were to take a small step,what would allow me to
go downhill the most?

if you try again,with a new point
you may actually end up at a completely different local optimum

gradient descent can sometimes 
    depend on where you initialize you paramenters

theta i := theta i - alpha*J(theta)对 theta i的偏微分
theta i := theta i - alpha*(H(theta)-y)*xi 
// alpha is a parameter of the algorithm called the learning rate it,
//  controls how large a step you take

?下降最陡的方向就是偏导数?what mean

constantly(incremental) gradient decent 随机梯度下降 增量梯度下降

Repeat until convergence{
    for i=1 to n{
        For j=1 to m{
            theta i= theta i-alpha*(h(xj)-yj)*xij
        }
    }
}
in order to start learning, in order to start modifying the parameters,
you only need to look at your first training examples
you should look at your first training example 
and perform an update using the derivative of the error with respect to
just your first training example
and then you look at you second training example

for launch data sets,  so constantly gradient descent is often much faster.
what happens is that constant gradient descent is that it won't actually 
converge to the global minimum exactly.

3.the normal equations

定义符号
    J:since J is a function of a vector of parameters theta
    定义 derivative J的导数:as self of vector n+1维.第i列是一个对thetai偏导数 

theta:=theta - alpha×J的导数

you have a function f
f: R(m*n)->R   A belong to R(m*n)
derivative f(A)= 一个m*n 的矩阵 每项是 f对Aij 的偏导数
if A is an n by n matrix 
    define trace of A to be equal to the sum of A's diagonal element 对角元素之和

fact
tr AB=tr BA
tr ABC=tr CAB =tr BCA
derivative A to f(A)=tr AB ; 
    derivative with respect to the matrix A of this function of trace AB
is going to be B transposed 
tr A=tr A transposed
if a is R tr a=a
derivative A to tr ABA(t)C=CAB+C(t)AB(t) // A(t) A的转置


X=[x(1)t,x(2)t,...,x(m)t]t   //t表示转置 x(i) 表示第i个样本
X×theta=[h(x1),h(x2),...,h(xm)]t
Y=[y1,y2,...,y3]t

X*theta -Y 做内积

(X*theta -Y)t*(X*theta -Y)=J(theta)

set derivative theta to J(theta) =0


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324650605&siteId=291194637