# Linear regression and logistic regression Regularization

## First, why should regularization?

Speaking of regularization from over-fitting problems start.

When we have quite a lot of features, machine learning out on the training set may assume in good agreement, but failed to achieve good results in the new test set, which is our usual sense of said overfitting co phenomenon.

The method can be used to give up part of the feature the usual sense to avoid over-fitting, but the relative information will give up some features. When we need to retain all the characteristics of variables, we will use the regularization method. In the regularization process, we will retain all the characteristics of variables, but we will reduce the size of the order parameter or parameters. On the other hand, by regularization can also be effective in helping us simplify the model.

## Second, the cost function

For example, we have 100 features, in fact, it is difficult to know in advance which features variable has a lower correlation, ie which parameters narrowed. Thus, linear regression, for example, the cost of our linear regression function plus an additional regularization term to narrow the value of each coefficient is as follows:

\ [J (\ Theta) = \ FRAC. 1 {{} 2m} [\ sum_ {i = 1} ^ m (h_ \ theta (x ^ {(i)}) - y ^ {(i)}) ^ 2+ \ lambda \ sum_ {i = 1} ^ n \ theta 2_j ^] \]

lambda should not be particularly large.

## Third, linear programming regularization

### 1. gradient descent

In the absence of a regularized gradient descent method is used to minimize the cost function, as shown below

\ [\ theta_j = \ theta_j- \ alpha \ frac {1} {m} \ sum_ {i = 1} ^ m ( h_ \ theta (x ^ {(

i)}) - y ^ {(i)}) x ^ {(i)} _ j (j = 0,1,2, ..., n) \] Referring to the second portion we are easy to regularization linear regression.

\ [\ Theta_0 = \ theta_0- \ alpha \ frac {1} {m} \ sum_ {i = 1} ^ m (h_ \ theta (x ^ {(i)}) - y ^ {(i)}) x ^ {(i)} _ 0 \]

\[ \theta_j=\theta_j-\alpha[\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x^{(i)}_j+\frac{\lambda}{m}\theta_j](j=1,2,...,n) \]