A linear regression algorithm combing

1. The concepts of machine learning supervised, unsupervised, generalization, over-fitting underfitting (bias and variance and their solutions), cross-validation

  • Supervised: data sets have known y values ​​(label result)
  • Unsupervised: no data set the value of y need clustering into a cluster of clusters according to the approximate relationship to be evaluated as a value y
  • Generalization: refers to data other than learning algorithm with the same set of laws, the extent applicable, ability to adapt to other samples
  • Overfitting: over-fitting, refers to the model had a good performance in the training sample, the requirements are too fine, resulting in generalization diminished capacity, poor focus performance in the validation data set and test. It is a graph showing fluctuation is too large, is not stable with FIG. Also known as high variance.
    Solution: overfitting by reducing parameters, a penalty term is added n L1 (ABSOLUTE VALUE) L2 of, variation learning rate, expand the selection range of possible data sets and the like to resolve.
  • Underfitting: under-fitting, and on the contrary, refers to the model is too simple or not doing enough training samples, such as feature over the province, the model leads to poor performance in the test set data validation set, there is no representation. Performance is the ups and downs of a line with no map. Also known as high deviation.
    Solution: underfitting through cross-validation, so that fewer cases feature multiple iterations of alternating training and validation sets, optimize, or with according to the relevant terms of adding other features, reducing the regularization parameter. Neural networks can add nodes plus the number of layers.
  • Cross-validation: the features into parts: some of the training set to do some of the validation set, the next exchange roles for the training set of data with a validation set, the training set do validation set, are alternately repeatedly verified sufficient training data.

2. The principle of linear regression

Linear regression is the use of mathematical statistics, regression analysis, to determine a statistical quantitative relationship between two or more interdependent variables analysis methods, the use of very extensive. The expression of the form y = w'x + e, e is the error of the normal distribution with mean 0

3, linear regression loss of function, cost function, the objective function

  • Loss Function loss function is defined on a single sample, a sample count of the error.
  • Cost Function The cost function is defined on the entire training set, it is the average of all samples of the error, which is the average loss function.
  • Object Function The objective function is defined as: Cost Function + regularization term.

4, the optimization method (gradient descent method, Newton method, quasi-Newton method and the like)

5, linear regression evaluation index

MSE measure linear regression method, RMSE, MAE and R Square

6, sklearn Parameter Description

Linear regression parameters

Guess you like

Origin www.cnblogs.com/robindong/p/11317191.html