Machine Learning Cornerstone Eighth Lecture Notes

Lecture 8: Noise and Error

8-1 Noise and Feasible Solutions

Sources of noise: mislabeled, same all features but different labels, wrong features.

Is the VC bound still normal in the presence of noise?

Answer: Replace the fixed ideal objective function f(x) with the changing P(y/x), and the prediction result for each point can be seen as the result of "the most ideal prediction result f(x)" + "noise" . The entire VC bound is fine and can still be used.

Fun time: If you already know that data is linearly separable, then you don't need to run PLA; by whether data is linearly separable, you cannot determine whether the objective function f is linear (because of the existence of noise).


8-2 Measurement of Error

Error measure: E(g(x), f(x) ); for point x, it is called point error measure.

0/1 error: usually used for classification

Square error: usually used in regression (real number problem)

The ideal minimum objective function f(x) is jointly defined by P(y/x) and error , and the machine learning framework diagram after redrawing:



8-3 Error Measurement of Algorithms

There are two error cases: false accept and false reject


The severity of false reject and false accept are not the same in the problem of discounts given by the supermarket and the problem of confidentiality given by the CIA .

The algorithm tries to solve this problem: err_height.


8-4 Different weights for errors

Linking 8-3, how to make the weighted Ein as small as possible?

The idea of ​​naive is: 1.PLA (solvable when linearly separable) 2.pocket (replace the original one if the new weight is better) 

The idea of ​​systematic is: Weighted Pocket algorithm

The weight of 1000 is considered to be copied 1000 times: 1. There are more random checks for errors with large weights; 2. If the new weighted result is smaller, replace the original one. The formula used for comparison is the weighted Ein.


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325400756&siteId=291194637