"Machine learning" watermelon book Exercises Chapter 2

exercise

  • \ (2.1 \) data set comprising \ (1000 \) samples, where \ (500 \) positive examples, \ (500 \) counter-example, be divided into comprising \ (70 \% \) training set of and \ (30 \% \) test sample set aside for assessment method, how many ways divided estimate the total.

  If the split to ensure that the positive and negative examples as many, then the number of division manner \ (\ n-) has
\ [\ begin {aligned} n & = C ^ {500 \ times35 \%} _ {500} \ times C_ {500 } ^ {500 \ times 35 \
%} \\ & = (C ^ {175} _ {500}) ^ {2} \ end {aligned} \]   without considering if there are
\ [n = C ^ {500 \ times 70 \%} _ { 500} = C ^ {350} _ {500} \]


  • \ (2.3 \) data set comprising \ (100 \) samples, wherein the positive and negative cases for each half, the model assumes learning algorithm generated is a new sample prediction is the number of training samples more categories (number of training samples is the same random guessing), the test is given by \ (10 \) fold cross validation method, respectively, a pair of left and error rate evaluation results obtained.

   \ (10 \) fold cross validation, we believe that division is arbitrary, then the symmetry found for each subset is more positive examples probability \ (\ FRAC. 1} {2} {\) , also counterexample the same, so the test set corresponding to the predicted prediction random, i.e. the error rate \ (50 \% \) .
  and leave one or select a set of positive test examples, or select a counter-example, no matter what kind, always prediction in contrast with the results of the test set, that is the correct rate \ (0 \% \) .
  this tells us leave one more is not necessarily 'better' than the cross-validation. the choice depends on the specific circumstances to be divided into several subsets ( i.e. several fold cross-validation, leave-one and just a special case of cross-validation, i.e. the number of samples is equal to the number of subsets, each subset comprising one sample).


  • \ (2.3 \) if the learner \ (A \) a \ (Fl \) values than the learner \ (B \) high, Analysis \ (A \) a \ (BEP \) values are than \ (B \) high.

  \ (F1 \) value and \ (BEP \) and are not necessarily linked, it is easy to find a counter-example.


  • \ (2.4 \) Describe the rate of true positives ( \ (TPR \) ), the rate of false positive cases ( \ (the FPR \) ) and precision ( \ (P \) ), recall ( \ (R & lt \) ) the relationship between.

Table
\ (2.1 \) confusion matrix classification results

The results predict
the real situation
Positive example Counterexample
Positive example \ (TP \) (real cases) \ (FN \) (false negative examples)
Counterexample \ (FP \) (false positive cases) \ (TN \) (false negative examples)

则有
\[\begin{aligned} TPR = \frac{TP}{TP+TN}\\ FPR = \frac{FP}{FP + FN}\\ P = \frac{TP}{TP + FP}\\ R = \frac{TP}{TP + FN} \end{aligned}\]


  • \ (2.5 \) Prove formula ( \ (2.22 \) ).

\ [\ begin {aligned} \ ell_ {rank} = \ frac {1} {m ^ + m ^ -} \ sum _ {\ boldsymbol {x} ^ + \ in D ^ +} \ sum _ {\ boldsymbol {x} ^ - \ in D ^ -} \ Big (\ mathbb {I} \ big (f (\ boldsymbol {x ^ +}) <f (\ boldsymbol {x ^ -}) \ big) + \ frac {1} { 2} \ mathbb {the I} \ Big (F (\ boldsymbol {X ^ +}) = F (\ boldsymbol {X ^ -}) \ Big) \ Big) \ End {the aligned} \ Tag {2.21} \]
\ [AUC = 1 - \ ell_ {
rank} \ tag {2.22} \]   in fact, if unstriping \ ((2.21) \) can be found that it is seeking \ (the ROC \) area of the right side of each segment of a curve and . \ (\ FRAC {. 1} {m ^ + m ^ -} \) request is a unit area of a rectangle, \ (\ SUM \ Limits _ {\ boldsymbol {X} ^ - \ in D ^ -} \ mathbb {the I } \ big (f (\ boldsymbol {x ^ -}) <f (\ boldsymbol {x ^ -}) \ big) \) seeking is left how many unit rectangle, \ (\ SUM \ Limits _ {\ boldsymbol { x} ^ + \ in D ^ +} \) is the above-mentioned summing each segment, and \ (\ frac {1} { 2} \ mathbb {I} \ big (f (\ boldsymbol {x ^ +} ) = f (\ boldsymbol {x ^ -}) \ big) \) Taking into account the results of the slash (some positive examples and counter-examples of the same classification probability).


Guess you like

Origin www.cnblogs.com/cloud--/p/12122258.html