Andrew Ng machine learning introductory notes 5- regularization

5 regularization

Parameter to increase the penalty term, achieve a function of simplifying assumptions, the purpose of reducing overfitting

5.1 regularized linear regression

5.1.1 regularization cost function

\[ J(\theta)=\frac{1}{2 m}\left[\sum_{i=1}^{m}(h_{\theta}(x^{(i)})-y^{(i)})^{2}+\lambda \sum_{j=1}^{n} \theta_{j}^{2}\right]\tag{5.1} \]

Plus the right item called regularization term, \ (\ the lambda \) called the regularization parameter , there are two objectives

  1. To better fit the training set
  2. 1 is guaranteed while minimizing parameters, a simple model assuming holding avoid overfitting case
  • General Conventions do not \ (\ theta_0 \) be regularized
  • If \ (\ the lambda \) is set too large, the parameter is close to 0, resulting in only assume the function \ (\ theta_0 \) items , i.e., the function is assumed that a horizontal line, it is necessary to select an appropriate regularization parameter

5.1.2 regularized gradient descent

[Image dump outer link failure (img-r6hfBavh-1568602397135) (E: \ Artificial Intelligence Markdown \ Machine Learning \ pictures \ 5.1.2 regularized gradient descent .png)]

Learning rate \ (\ Alpha \) is small, a large amount of the sample m, i.e., every so regularization parameters that refine the direction 0

5.1.3 regularized normal equations

\[ \theta=\left(X^{T} X+\lambda\left[\begin{array}{cccc}{0} \\ {} & {1} \\ {} & {} & {1} \\ {} & {} & {} & {\ddots} \\ {} & {} & {} & {1}\end{array}\right]\right)^{-1} X^{T} y\tag{5.2} \]

其中加入的矩阵为(n+1)×(n+1)维

  • 如果样本量m小于特征变量个数n,则\(X^TX\)不可逆,为奇异矩阵,但只要\(\lambda>0\),可确保矩阵和非奇异

5.2 正则化逻辑回归

5.2.1 正则化代价函数

\[ \begin{aligned} J(\theta)=-[\frac{1}{m}\sum_{i=1}^{m} y^{(i)} \log h_{\theta}(x^{(i)})+(1-y^{(i)}) \log (1-h_{\theta}(x^{(i)}))]+\frac{\lambda}{2m}\sum_{j=1}^{n}\theta_j^2 \end{aligned}\tag{5.3} \]

  • 计算后一项记得从j=1开始,因为不正则化\(\theta_0\)

5.2.2 正则化梯度下降

[Image dump outer link failure (img-ZzgwHxme-1568602397136) (E: \ Artificial Intelligence Markdown \ Machine Learning \ pictures \ 5.2.2 regularized gradient descent .png)]

5.2.3 正则化高级算法

[Picture outside the chain dump failure (img-gkcWAE8y-1568602397139) (E: \ Artificial Intelligence Markdown \ Machine Learning \ pictures \ 5.2.3 regularization advanced algorithms .png)]

5.3 正则化与偏差方差的关系

[Image dump outer link failure (img-Clhl2T6h-1568602397141) (E: \ Artificial Intelligence Markdown \ Machine Learning \ pictures \ 5.3 Regularization deviation variance relationship .png)]

\(\lambda\)越大,训练集和验证集的偏差越大,\(\lambda\)越小,训练集的误差越小,验证集的方差越大

Guess you like

Origin www.cnblogs.com/jestland/p/11548491.html