Semi-supervised learning (five) - semi-supervised support vector machine

Semi-supervised support vector machine ( S3VMs)

  Today we introduce the SVM classifier and its semi-supervised form S3VM, here we introduce on the basis of semi-supervised learning algorithm temporarily come to an end. After Xiao Bian also to share in the form of paper introduce some relatively new semi-supervised learning algorithm. Let's start today to learn it ~

Introduced

  Support vector machine ( SVM) I believe we are not familiar with it? However, if the data set a large number of unlabeled data (as shown b), then the decision boundary should be how to determine it? Only use a label data to learn of the decision boundary (shown below a) will pass through dense unlabeled data, not if we assume that the two classes are completely separate, then the decision boundary we want, we want decisions It is a boundary (b) of the solid black line.

 

  The new decision boundary may very well be divided into two categories unlabeled data, but also correctly classified the label data (although it has the tag data to the nearest distance than small SVM).

 

Support vector machine SVM

  First let us discuss SVMs, S3VMs algorithm we are going to introduce pave the way. For simplicity, we discuss the binary classification, i.e., y {-1,1}, for the feature space and the decision boundary is defined as follows where w is a parameter that determines the direction vector of the decision boundary and dimensions, b is an offset. For example, , B = -1, the decision boundary on the blue line as shown in the following figure, the decision boundary is always perpendicular to the vector w.

 

 

 

 

  Our model , the decision boundary is f (x) = 0, we have predicted by the label x sign (f (x)), we are interested in the distance x is an example of decision boundaries, the absolute value of the distance, For example origin x = (0,0) is the distance to the decision boundary , the above figure green solid line. We have defined the label instance to the decision boundary signed distance . Assume that the training data is linearly separable (there is at least a linear decision boundary correctly classified all tag data). There is a decision-making margin symbol decision boundary to its distance from the nearest tag has the data: if a decision boundary can be separated from the training sample label, geometric boundaries is positive, we margins to find the decision boundaries by seeking maximum Geometry: , this equation is difficult to directly optimize it so we will be rewritten as equivalents. First, we note that the parameters ( w, b ) can be scaled to any ( CW, cb ) , so we asked to meet instance closest to the decision boundary: the next target equation can be rewritten as constrained optimization problem of the form: , and, maximizing 1 / || w || equivalent to minimizing || W || square:

 

 

   So far, we are based on training samples are linearly separable, and now we relax this condition, then the equation no longer meet these goals. We want to be relaxed so that the constraints in some instances , these relaxation while penalizing objective equation is rewritten:

 

 

   However, the objective function to convert a regularization risk minimization form is necessary (to which we will expand its relationship to S3VMs ). Consider an optimization problem:

 

 

   It is easy to see that when z 0 <= , the objective function value is 0 , otherwise Z , can be abbreviated as

 

 

   Note (*) The constraints can be rewritten as , using the just concluded, we will ( * ) rewritten as

 

 

   Wherein the first entry represents a hinge loss loss function , the second term is the regularization.

  Here we are no longer in-depth discussion SVMs dual form or nuclear skills (to map features to higher-dimensional space to solve nonlinear problems), though the two issues are of SVMs critical, but not what we want to introduce S3VMs focus ( of course, nuclear techniques can be applied directly to S3VMs on), interested students can go to the relevant papers.

 

Semi-supervised support vector machine ( S3VMs )

  In the above we mentioned a hinge loss to the label data classification as accurate as possible, but if there is no tag data it? Whether or not the label, then you can not know the correct classification, let alone punished.

  We have learned out of a label predictor Sign (f (x)) , for unlabeled samples, its forecast for the label , we assume that the predicted value is x label, it can be in x application on the hinge loss function of

 

 

   We call it the Hat Loss .

  Since we label is to use f (x) generated, unlabeled samples always be correctly classified, but hat loss can still punish some unlabeled samples. As can be seen from the formulas, Hat loss function prefer ( penalty of 0 , relatively far from the decision boundary ), penalize -1 <F (X) <. 1 (especially close to 0 ) of the sample, since these samples is likely to be misclassified, now, we can write S3VMs target equations on the label and unlabeled data: (**)

  In fact, we can see S3VMs goals equation biased in favor of the unlabeled data as far as possible away from the decision boundary (that is, through the decision boundary as a low-density area with no tag data (that role is not like clustering algorithm)). In fact, the above equation is more like a regular target of minimizing the risk of form (in use on the label data has Hinge Loss ), which is the regularization term

 

 

  Also need to note is that, from experience, ( ** ) The result is uneven (most if not all of unlabeled data may be divided into a class), although the cause of this phenomenon is unclear, to correct this error, a heuristic approach is limited in the same proportion unlabeled data class prediction ratio of tag data , because it is not continuous, it is difficult to meet this constraint, we subjected to relaxation, conversion comprising a continuous function for the constraint: ,

Complete S3VMs objective function can be written as:

  However, S3VMs objective function algorithm is not convex, that is to say, he has multiple local optimal solution, which is a calculation difficulty we solve objective function. Learning algorithm may fall into a suboptimal local optima rather than the global optimal solution, S3VMs a hot topic that is effective to find a near-optimal solution.

 

Entropy regularization

  SVMs and S3VMs not a probabilistic model, that is to say they are not to be classified by the posterior probability calculation class. In statistical machine learning, there are many models by calculating the probability p (y | x) to classify, it is interesting that these probabilistic models also have one pair S3VMs direct simulation, called Entropy regularization . In order to make our discussion more concrete, we will first introduce a specific probability model: logistic regression, and by the positive entropy will extend it to the semi-supervised learning.

Posterior probability based on logistic regression modeling, the f (x) range from statute to [0,1] , the model parameters are w and B . There are labels given training samples of conditions on the log-likelihood Shi , assuming that w Gaussian distribution , I is a diagonal matrix, then the training logistic regression model is to maximize the posterior probability:

 

   This is equivalent to regularization risk following minimization problem:

 

   Logistic regression does not use unlabeled data, we can add unlabeled data, based on the following criteria: If two classes can be well separated, then classification on the label without any sample are credible: either belong to the positive class or belong to negative category. Similarly, the posterior probability is close to either 1 , or close to 0. A method of entropy measure of confidence, the probability of p Bernoulli random variables, entropy is defined as follows:

 

   No label given sample , logistic regression may be defined entropy regularization is as follows:

 

   Examples of unlabeled if the classification is determined, the value is smaller. Directly apply it to S3VMs , we can define the semi-supervised logistic regression model:

 

 

 

 The assumption of S3VMs

  S3VMs and entropy canonical class assumption is that the sample can be well separated from the low density region falls sample decision boundary feature space not data intensive and unlabeled. If this assumption is not met, S3VMs may result in poor performance.

 

summary:

  Today, we introduce the SVM classifier and its semi-supervised form S3VM , different semi-supervised learning techniques discussed in our previous, S3VMs looking for a decision boundary in unlabeled data density range. We have also introduced the entropy regularization, which is based on the concept of the logistic regression model. Here we introduce on the basis of semi-supervised learning algorithm temporarily come to an end. And then we will introduce some relatively new semi-supervised learning algorithm to share the paper form. Behind the content will explore the links between human and machine learning of semi-supervised learning , semi-supervised learning research and discuss the potential impact on the field of cognitive science, so stay tuned ~ !

 

I hope a lot of public support number, scan code concerned, we learn together and progress together ~

Guess you like

Origin www.cnblogs.com/PJQOOO/p/11775047.html