Machine Learning - Supplemental


Here some fragmentary knowledge of the supplement, because the content is relatively small, no longer do a detailed introduction.

Model judge

M S E = 1 m i = 1 m ( Y i Y i ^ ) 2 MSE = \frac{1}{m} \sum_{i=1}^m (y_i - \hat{y_i})^2
R M S E = M S E = 1 m i = 1 m ( Y i y i ^ ) 2 RMSE = \sqrt{MSE}= \sqrt{\frac{1}{m} \sum_{i=1}^m (y_i - \hat{y_i})^2}
R 2 = 1 R S S T S S = 1 i = 1 m ( y i y i ^ ) 2 i = 1 m ( y i y ˉ ) 2          y ˉ = 1 m i = 1 m y i R^2 = 1 - \frac{RSS}{TSS} = 1 - \frac{\sum_{i=1}^m (y_i - \hat{y_i})^2}{\sum_{i=1}^m (y_i - \bar{y})^2} \,\,\,\,\,\,\,\, \bar{y} = \frac{1}{m} \sum_{i=1}^m y_i
The MSE : square error and, more close to 0 indicates the model fit the training data.
The RMSE : the square root of the MSE, the same effect of the MSE
R ² : range (negative infinity, 1], a larger value indicates the model fit the training data; optimal solution is 1, i.e., the predicted value is equal to the true value; when the model prediction random time value, it is possible for the negative; if the predictive value of a constant sample of expectations, R² is 0
for classification accuracy of the model is often used, but did not return, as so often with accuracy R² 1 is 100%. .api is optimal in lr.scorce
the TSS : total square and TSS (total Sum of squares), where represents the difference between the samples, the variance of m times the real
RSS : residual sum of squares RSS (residual Sum of squares ), represents the discrepancy between the predicted value and the sample value, m is the MSE of times

Machine Learning Scheduling - cross validation

Here are a commonly used model selection method of cross validation (cross validation).

If sufficient data given sample, a simple method of model selection is random dataset cut into three parts, namely the training set (training set), validation set (validation set) and test set (test set). Training set used to train the model, verify selection set for the model, and the test set for final evaluation of the learning process. In models of varying complexity to study, select the smallest prediction error with a model of the validation set. As validation set has enough data, it is also effective to select a model of it.

However, in many practical applications, the data is not sufficient. In order to choose a good model, cross-validation method may be employed. The basic idea of ​​cross-validation is repeated use of data; the data given segmentation, the segmentation of the training data set and diversity combining test set, training, testing, and based on this model selection repeatedly.

In practice, for a variety of algorithmic models (linear regression) is concerned, we need to get the value of θ, λ, p's; in fact, solving algorithm model θ generally does not require developers to participate (algorithm has been implemented), the main need to solve is the value of λ and p, this process is called parameter adjustment (hyperparametric)
cross-validation: the training data is divided into multiple copies, one of which perform data validation and get the best hyperparametric: λ and p; For example: ten-fold cross validation, half of the cross-validation (scikit-learn the default), etc.
Here Insert Picture Description

1. Simple cross-validation
simple cross-validation method is: First, the data that has been randomly divided into two parts, as part of the training set, as another part of the test set (e.g., 70% of the training data set, 30% of the test data set) ; then the training set to obtain different models under various conditions (e.g., different number of parameters) training model; evaluation test error of each model on the test set, the model selected minimum test error.

2. S-fold cross-validation
most applications is the S-fold cross validation (S -fold cross validation), as follows: First, the data was randomly cut into the same size as the subset S disjoint; then using S - 1 training model data subsets, subset using the remaining test model; and this process may be repeated choices of S; S elect final evaluation, the average test model with the least error.

3. Cross-validation leave a
special case of the S-fold cross validation is S = N, referred to leave a cross-validation (leave-one-out crossvalidation) , often used in case of lack of data. Here, N is the capacity of a given data set.

Published 119 original articles · won praise 350 · views 480 000 +

Guess you like

Origin blog.csdn.net/zhanglianhai555/article/details/104171320