Andrew Ng machine learning model assessment introductory notes 2-

2 model assessment

2.1 data set into training set and test set method

According to Chang 7: 3 ratio selected, if the data has a random, then to take the first sample set as a 70%

2.1.1 distillation method

Directly to the data set D is divided into two mutually exclusive collection

  • Data distribution division should maintain consistency
  • There are various division manner using several randomly divided into experiment was repeated taking the average as the evaluation result
  • Common practice: 2 / 3-4 / 5 the samples used for training for the remaining test, the test sample set containing at least 30

2.1.2 cross validation --k fold cross-validation

The data set is divided into D k of size similar mutually exclusive subsets, each with k-1 subsets as the training set, and sets, as the rest of a test set for k-th training and testing, test results returned k mean

  • Typically k = 10
  • Different ways, for example, dividing 10 times 10-fold cross validation
  • Leave-one: m samples in k = m-1, but when the high computational complexity of the sample amount is too large
  • Typical points system: 60% as a training set, as a validation set of 20%, 20% as a test set, the smallest set of models selected authentication error

Bootstrap 2.1.3

'Then continues to repeat the above process the sample back D, to give the final training set D' each randomly selected data set D in a sample is placed in the data set D, and D is 36.8% of the data is not present in the D 'in these data are used as the test set

  • The test results become outsourced estimate
  • The initial distribution of the data set changes, estimation error is introduced, the amount of data used when insufficient

2.2 parameter adjustment

  • The common practice: the given range and step size
  • Two kinds of parameters: the algorithm parameter and the model parameters , the number of the former is less, manual; multi latter number, generating learning
  • Model evaluation and determine only some of the samples as a training set, and then obtain the optimal model with re-training of all samples , is the ultimate model

2.3 Performance Measurement

The most commonly used regression task performance metric is the mean square error
\ [E (f; D) = \ frac {1} {m} \ sum_ {i = 1} ^ {m} (f (x_i) -y_i) ^ 2 \ tag {2.1} \]
more generally, the data distribution D and the probability density function p, the mean square error is described as (.)
\ [E (F; D) = \ the int_ {X \ in D} (F (X) - y) ^ 2p (x) dx \ tag {2.2} \]

2.3.1 F1 measure

F1 is based on the harmonic mean of precision and recall of

2.4 0/1 Test Error

\[ err(h_\theta(x),y)= \begin{cases} 1, & \mbox{if }h_\theta(x)\ge0.5,y=0\\&\mbox{ or if }h_\theta(x)<0.5,y=1 \\ 0, & \mbox{otherwise}\end{cases}\tag{2.3} \]

\[ Test error=\frac{1}{m_{test}}\sum_{i=1}^{m_{test}}err(h_\theta(x_{test}^{(i)}),y_{test}^{(i)})\tag{2.4} \]

2.5 deviation, variance, noise

  • Deviation: a measure of expected learning algorithm to predict the degree of deviation from the actual results, depicts learning algorithm itself fitting capability

  • Variance: measure the changes in learning performance of different training sets from the same size of the same points lead, depicts the impact caused by the disturbance data

  • Noise: The expression of the lower bound of the current expected generalization error learning algorithm on any task that can be achieved, portray the difficulty of learning the problem itself

  • Deviation and variance conflict

  • Training set and cross-validation error error big, big means that the model bias, model underfitting

    Training set error is very small but significant cross-validation error, a large variance means that the model, the model over-fitting

    [Image dump outer link failure (img-Jn6O9UnW-1568601912528) (E: \ Artificial Intelligence Markdown \ Machine Learning \ pictures \ 2.5 .png deviation and variance)]

2.6 learning curve

The abscissa is the number of samples and the ordinate is the error, and training error == seeking authentication error to remember regularization term == 0

  • Good learning curve fitting situation, the more samples, the training set errors increase, decrease error validation set, because more training parameters
  • Programming, each training set a part of the training set of parameters, to use the model for all the samples == == validation sets the calculated cost function

! [[Image dump outer link failure (img-EvQmghFt-1568601912532) (E: \ Artificial Intelligence Markdown \ Machine Learning \ pictures \ 2.6 good learning curve fitting .png)]]

  • Learning curve at high deviation, increase the number of samples stabilized error value

[外链图片转存失败(img-mhwDaywc-1568601912534)(E:\Artificial Intelligence Markdown\Machine Learning\pictures\2.6 高偏差学习曲线.png)]

  • Learning curve at a high variance, cross-validation error is much larger than the error of the training set, increasing the number of samples validation set error can be reduced, because the more often included, resulting in the probability of occurrence of new samples validation set is reduced

[外链图片转存失败(img-Nuy7zDN5-1568601912535)(E:\Artificial Intelligence Markdown\Machine Learning\pictures\2.6 高方差学习曲线.png)]

2.7 modify the model approach

  • Increasing the number of training samples: solve the problem of high variance
  • Reducing the number of features: to prevent over-fitting, high variance solve the problem
  • Increasing the number of features: increasing the complexity of the model, to solve the problem of high deviation
  • Increasing the polynomial: Synthesis wherein an increase, to solve the problem of high variation
  • Reducing the regularization parameter \ (\ the lambda \) , i.e. increasing the characteristic parameters, to prevent under-fitting, to solve the problem of high variation
  • Increasing the regularization parameter \ (\ the lambda \) , that is, small changes in the characteristic parameters, to prevent over-fitting, high variance solve the problem

2.8 decision boundary

All the algorithms are found in accordance with certain guidelines for the optimal parameters \ (\ Theta \) , \ (\ Theta ^ TX \) is the graphical decision boundary

Guess you like

Origin www.cnblogs.com/jestland/p/11548465.html