Machine Learning (2) of the regression algorithm

@ (Machine Learning (2) of the regression algorithm)

What is regression algorithm

  • Supervised algorithm
  • Relationship between the explanatory variables (x) and observed values ​​(dependent variable y)
  • The end result is a continuous data value, the input value (attribute value) is a property of a dimension d / numeric vector

Linear Regression

Linear Regression

  • The final requirement is that the value of θ is calculated, and an optimal selection algorithm equation θ value forms

  • It can be written as
    Here Insert Picture Description
    where ε ^ (i) ^ is an error, the independent identically distributed with mean 0 and variance is a predetermined value δ ^ 2 ^ Gaussian distribution.
    which isHere Insert Picture Description

Likelihood function / log-likelihood function

  • The likelihood function
    (function can be relieved concept of reference: https: //segmentfault.com/a/1190000014373677 utm_source = channel- hottest?)
    Here Insert Picture Description

    Note: The likelihood function is used inside a normal distribution, the practical problems, many random phenomenon can be seen as a comprehensive response to the independent influence of many factors, they tend to follow a normal distribution.

  • Log-likelihood function

Here Insert Picture Description

The objective function / loss function

  • Loss function is the relationship between the actual and predicted values, by minimization of the loss function, is solved to determine the value of θ, the following equation to obtain log likelihood function:Here Insert Picture Description
  • By taking the derivative of the loss function and to make it equal to zero, we get: Here Insert Picture Description
    Note: X sample x ^ (i) ^ matrix, Y is y ^ (i) ^ matrix, requires the matrix X ^ T ^ X is reversible .

Other common loss function

Here Insert Picture Description

Locally weighted regression - Loss Function

Here Insert Picture Description

  • w(i)是权重,它根据要预测的点与数据集中的点的距离来为数据集中的点赋权值。当某点离要预测的点越远,其权重越小,否则越大。常用值选择公式为:
    Here Insert Picture Description
    该函数称为指数衰减函数,其中k为波长参数,它控制了权值随距离下降的速率。
    注意:使用该方式主要应用到样本之间的相似性考虑。

线性回归的过拟合

  • 为了防止数据过拟合,也就是的θ值在样本空间中不能过大/过小,可以在目标函数之上增加一个平方和损失:
    Here Insert Picture Description
    其中λ$\sum_{i=1}^{n}$θ^2^~j~为正则项(norm),这里这个正则项叫做L2-norm。

Ridge回归(岭回归)

  • 使用L2正则的线性回归模型就称为Ridge回归(岭回归)
    Here Insert Picture Description
    Ridge模型具有较高的准确性、鲁棒性以及稳定性。

LASSO回归

  • 使用L1正则的线性回归模型就称为LASSO回归(Least Absolute Shrinkage and Selection Operator)Here Insert Picture Description
    LASSO模型具有较高的求速度,容易出现稀疏解,即解为0的情况。

Elasitc Net算法(弹性网络算法)

  • 同时使用L1正则和L2正则的线性回归模型就称为Elasitc Net算法(弹性网络算法)
    Here Insert Picture Description
    既要考虑稳定性也考虑求解的速度,就使用Elasitc Net。

梯度下降算法

  • 目标函数θ求解:
    Here Insert Picture Description
  • 初始化θ(随机初始化,可以初始为0)
  • 沿着负梯度方向迭代,更新后的θ使J(θ)更小
    Here Insert Picture Description
    α:学习率、步长

批量梯度下降算法(BGD)

  • 当样本量为m的时候,每次迭代BGD算法中对于参数值更新一次。
  • BGD一定能够得到一个局部最优解(在线性回归模型中一定是得到一个全局最优解)。
  • 计算速度比较慢。
    Here Insert Picture Description

随机梯度下降算法(SGD)

  • 当样本量为m的时候,SGD算法中对于参数值更新m次。SGD算法的结果并不是完全收敛的,而是在收敛结果处波动的。
  • SGD在某些情况下(全局存在多个相对最优解/J(θ)不是一个二次),SGD有可能跳出某些小的局部最优解,所以不会比BGD坏。
  • SGD由于随机性的存在可能导致最终结果比BGD的差。
  • SGD算法特别适合样本数据量大的情况以及在线机器学习(Online ML)。
  • 注意:优先选择SGD
    Here Insert Picture Description

小批量梯度下降法(MBGD)

  • 保证算法的训练过程比较快,又保证最终参数训练的准确率。MBGD中不是每拿一个样本就更新一次梯度,而且拿b个样本(b一般为10)的平均梯度作为更新方向。Here Insert Picture Description

梯度下降法调优策略

  • 学习率的选择:学习率过大,表示每次迭代更新的时候变化比较大,有可能会跳过最优解;学习率过小,表示每次迭代更新的时候变化比较小,就会导致迭代速度过慢,很长时间都不能结束。
  • 算法初始参数值的选择:初始值不同,最终获得的最小值也有可能不同,因为梯度下降法求解的是局部最优解,所以一般情况下,选择多次不同初始值运行算法,并最终返回损失函数最小情况下的结果值。
  • 标准化:由于样本不同特征的取值范围不同,可能会导致在各个不同参数上迭代速度不同,为了减少特征取值的影响,可以将特征进行标准化操作。

Logistic回归

  • 主要是进行二分类预测,也即是对于0~1之间的概率值,当概率大于0.5预测为1,小于0.5预测为0。
  • Logistic/sigmoid函数:
    Here Insert Picture DescriptionHere Insert Picture Description
  • 假设:Here Insert Picture Description
  • 得似然函数:
    Here Insert Picture Description
  • 回归参数θ(类似梯度下降方法求得):
    Here Insert Picture Description
  • Logistic回归损失函数(由对数似然函数得来):
    Here Insert Picture Description

Softmax回归

  • softmax回归是logistic回归的一般化,适用于K分类的问题,第k类的参数为向量θ~k~,组成的二维矩阵为θ~k*n~ 。
  • Essence softmax function is a K-dimensional vector compress arbitrary real number (mapped) to another K-dimensional vectors of real numbers, where each element in the vector values ​​are between (0,1).
  • softmax return probability function is:
    Here Insert Picture Description
  • Algorithm theory
    Here Insert Picture Description
  • Loss function
    Here Insert Picture Description
  • Regression parameters [theta] (obtained by a similar gradient descent method):
    Here Insert Picture Description

Model to determine the effect of

Here Insert Picture Description

  • MSE: square error and, more close to 0 indicates the model fit the training data.
  • RMSE: MSE square root, with the role of MSE.
  • R2: range (negative infinity, 1], a larger value indicates the model fit the training data; optimal solution is 1; when the value of the random model predicts, there may be negative; if the predicted value constant at a desired sample , R2 is zero.
  • TSS: Total square and TSS (Total Sum of Squares), where represents the difference between the samples, the variance is a pseudo m times.
  • RSS: residual sum of squares RSS (Residual Sum of Squares), where represents the difference between the predicted value and the sample value, m is the MSE of times.

Machine learning parameter adjustment

  • In practice, for a variety of algorithmic models (linear regression) is concerned, we need to get the value of θ, λ, p's; in fact, solving algorithm model θ generally does not require developers to participate (algorithm has been implemented), the main need to solve is the value of λ and p, this process is called parameter adjustment (hyperparametric)
  • Cross-validation : the training data is divided into multiple parts, wherein a data validation and obtain optimal hyperparametric: and [lambda] p; such as: ten-fold cross validation, half of the cross-validation (scikit-learn the default), leaving a cross-validation.

Guess you like

Origin www.cnblogs.com/tankeyin/p/12123695.html