5 regularization
Parameter to increase the penalty term, achieve a function of simplifying assumptions, the purpose of reducing overfitting
5.1 regularized linear regression
5.1.1 regularization cost function
\[ J(\theta)=\frac{1}{2 m}\left[\sum_{i=1}^{m}(h_{\theta}(x^{(i)})-y^{(i)})^{2}+\lambda \sum_{j=1}^{n} \theta_{j}^{2}\right]\tag{5.1} \]
Plus the right item called regularization term, \ (\ the lambda \) called the regularization parameter , there are two objectives
- To better fit the training set
- 1 is guaranteed while minimizing parameters, a simple model assuming holding avoid overfitting case
- General Conventions do not \ (\ theta_0 \) be regularized
- If \ (\ the lambda \) is set too large, the parameter is close to 0, resulting in only assume the function \ (\ theta_0 \) items , i.e., the function is assumed that a horizontal line, it is necessary to select an appropriate regularization parameter
5.1.2 regularized gradient descent
Learning rate \ (\ Alpha \) is small, a large amount of the sample m, i.e., every so regularization parameters that refine the direction 0
5.1.3 regularized normal equations
\[ \theta=\left(X^{T} X+\lambda\left[\begin{array}{cccc}{0} \\ {} & {1} \\ {} & {} & {1} \\ {} & {} & {} & {\ddots} \\ {} & {} & {} & {1}\end{array}\right]\right)^{-1} X^{T} y\tag{5.2} \]
其中加入的矩阵为(n+1)×(n+1)维
- 如果样本量m小于特征变量个数n,则\(X^TX\)不可逆,为奇异矩阵,但只要\(\lambda>0\),可确保矩阵和非奇异
5.2 正则化逻辑回归
5.2.1 正则化代价函数
\[ \begin{aligned} J(\theta)=-[\frac{1}{m}\sum_{i=1}^{m} y^{(i)} \log h_{\theta}(x^{(i)})+(1-y^{(i)}) \log (1-h_{\theta}(x^{(i)}))]+\frac{\lambda}{2m}\sum_{j=1}^{n}\theta_j^2 \end{aligned}\tag{5.3} \]
- 计算后一项记得从j=1开始,因为不正则化\(\theta_0\)
5.2.2 正则化梯度下降
5.2.3 正则化高级算法
5.3 正则化与偏差方差的关系
\(\lambda\)越大,训练集和验证集的偏差越大,\(\lambda\)越小,训练集的误差越小,验证集的方差越大