Andrew Ng machine learning 课程笔记--经验风险最小化

Varience-bias trade-off:

Underfit:

Overfit:

Linear classification:in particular,logistic regression fits this parameters the model like this for maximizing the law of likelihood.but in order to understand learning algorithms more deeply,I am just going to assume a simplifies model of machine learning,I am going to define training error as so this is a training error of a hypothesis X subscript data.write this epsilon hat of subscript data.if I want to make the dependence on a training set explicit,I will write this with a subscript S there where S is a training set.this is a sum of indicator functions for whether your hypothesis correctly classifies the Y the IFE example.and so when you divide by M ,this is just in your training set what's the fraction of training examples your hypothesis classifies so defined as a training error.and training error is also called risk.the simplified model of machine learning I am gonna talk about is called empirical risk minimization.and iin particular,I am going to assume that the way my learning algorithm works is it will choose parameters data,that minimiza my training error.so it turns out that if you actually want to do this,this is a nonconvex optimization problem.it turns out that it will be useful to thinkk of our learning aigorithm as not to choosing a set of parameters,but as choosing a function.let me define the hypothesis class,script h,as the class of all hypotheses of in other words as the class of all learning algorithms .H subscript data is a special linear classifier,so H subscript data in each of these functions each of these is a function mapping from the input domain X in the class zero.each of these is a function,now redefinition it as choosing a function hypothesis class of script H that minimizes my training error.

Generalization error:

Empirical risk minimization:let us say script H is a class of K hypotheses.empirical risk minimization takes the hypothesis of the lowest training error,and what I would like to do is prove a bound on the generalization error of H hat.

Show training error that gives approximatation generalization error:so my training set is drawn randomly from sum distribution scripts d,and depending on what training examples I have got,that Ais would be either zero or one.so let us figure out what the probability distribution ZI is.so ZI takes on the value of either zero or one so what is the probability that ZI is equal to one?in another words what is the probability that from a fixed hypothesis HJ,when I sample my traiining set IID from distribution D,what is the chance that my hypothesis will misclassify it?well ,by definition,that is just a generalization error of my hypothesis HJ.so ZI is a Benuve random variable with mean given by the generalization error of this hypothesis.

Give a bound on the generalization error of the hypothesis output by empirical tisk minimization:

Another:given gamma,so what we proved was given gamma the probability delta of making a large error,how large a training set size do you need in order to given a bound on how large a training set do you need to give a uniform conversions bound with parameters gamma and delta?

Sample complexity bound:how large a training example you need in order to achineve a certain bound and error.

Error bound:the result of the training error is essentially that uniform conversions will hold true with high probability.

Andrew Ng machine learning 课程笔记--经验风险最小化

猜你喜欢