CHANG machine study notes -12: Semi-supervised Learning semi-supervised learning

Machine learning Machine learning is the core of Artificial inteligence, divided into four categories:

1. Supervised learning

Supervised learning is a characteristic (feature) and the label (label), even without a tag, the machine is characterized by the relationship between the label and determining the label. For example child understand: College Entrance Examination standard answer is there before the exam, learning and doing question, the answer can control, analyze problems find a way. When the college entrance examination did not answer questions, but also can give the right solution. This is a supervised learning.
Sentence: Given the data, tag prediction.
By correspondence between the existing part of the input data and output data, a function is generated, the input is mapped to the appropriate output, such as classification.

2. Unsupervised learning

Unsupervised learning only features, no label. For example child understand: Some simulation papers before the entrance, there is no standard answer, which is no reference is right or wrong, but we can separate language, mathematics, English according to the links between these issues, this process is called poly class. In only features, no label training data set, by the intrinsic link between the data and the similarity of their divided into several categories.
Summarized in one sentence: Given the data, looking for the hidden structure.
Direct Modeling data set.
The difference between the two above: supervised learning only uses a sample set of markers to learn without supervised learning only use unlabeled sample set.

3. Semi-Supervised learning

Semi-supervised learning to use the data, in part, labeled, and most are unmarked. And compared with supervised learning, semi-supervised learning is low cost, but can achieve higher accuracy.
There is no utilization based targets and the subject class data to generate the appropriate classification function.
Semi-supervised learning occurs background: the practical problems, there is usually only a small amount of data markers, because data sometimes the price tag is high, such as in biology, analysis of the structure of a protein or functional identification could biologists spend many years of work, and a large number of unlabeled data is very easy to get.

4. Reinforcement learning

Reinforcement learning is the use of unlabeled data, but by some of the ways to know that you are getting closer or farther away from the correct answer (incentive function). It can be seen as a reward function correctly answer a delay, sparse form. You can get a delayed feedback, and only prompt answer from you is getting closer or farther away.

The above content from bloggers @ awake dreamer

Semi-supervised, there are four cases in making semi-supervised learning to do some guess, good or bad result of a relationship with this guess
Here Insert Picture Description

1. Semi-supervised Learning for Generative Model(生成模型)

First calculate the probability of not belonging to the tag data C1 is the number, the second is to calculate the probability of C1, go to update your Model
Here Insert Picture Description
Here Insert Picture Description

2.Low-density Separation (assuming density separation)

Separation is assumed that the data density is assumed that black and white, there is obvious gap between the two classes of data, i.e., low (i.e. the amount of good data) at the data density at the boundary between the two classes

2.1 from training (Self-training)

首先根据有标签的数据训练出一个模型,将没有标签的数据作为测试数据输进去,得到没有标签的数据的一个为标签,之后将一部分的带有伪标签的数据转移到有标签的数据中,在进行训练,循环往复。其中选取哪一部分的伪标签数据转移至有标签数据需要自己定义,甚至可以自己提供一个权重。Here Insert Picture Description
在做分类问题中,一个输入数据可能0.7概率属于a,0.3概率属于b,在自训练中是不行的,起不了作用,因为不对标签进行改变的话,将这些放入带标签的数据中对于数据的输出一点改进都没有,输出的还是原来的数据。当有0.7概率是a时,就要把这个标签设置为a(非黑即白
Here Insert Picture Description

2.2基于熵的正则化(Entropy-based Regularization)自训练的进阶版

我们希望一个数据是一个类比较明显,而不是每个标签都有点像,我们怎样去衡量一个数据的无序状态程度,可以通过方程来计算Here Insert Picture Description
在L中加入的无标签的那项就可以当做是正则项,L的两项也可以加入一些权重来重视与有标记数据或者无标记数据

Here Insert Picture Description

3. 平滑性假设(Smoothness Assumption)

这种假设就是
Here Insert Picture Description

3.1聚类标记

在距离上虽然 x2 与 x3 的距离更接近,但是 x2 与 x1 位于同一个高密度的区域中。可以认为同一个高密度区域之间的数据可以很好的接触连接,具有相同的标签值,而不同的高密度区域无法相互接触,所以标签值不相同。
Here Insert Picture Description
Here Insert Picture Description
一种直观的方法是首先对数据进行聚类,看没有标签的数据落在哪一个部分,然后对其及进行标注
但是,在图片上把一类放在一起是有难度的,只有聚类足够好,结果才不至于太差
Here Insert Picture Description

3.2基于图的方法

定性描述

The completion of all the data points a diagram If communication between two points on the graph, the tag is the same between them. So how do you form a view of it, some figure is natural, for example, refer to each other between the connection between the Web page, or papers, but sometimes need to build their own map. Here Insert Picture Description
Figure good or bad results related to good or bad, it is how to build this diagram it?

  • First, to calculate the degree of similarity between x1, x2
  • Then you can create a chart, diagram, there are many
    first K Nearest Neighbor: After we calculate the degree of acquaintance, if k = 3 would most like to take with him a three-point
    Here Insert Picture Description
    second is e Nearest Neighbor: the similarity is greater than e Here Insert Picture Description
    plus some functions let x1, x2 closer to the figure orange dot will be linked together, and orange and green large difference would not be together
    Here Insert Picture Description
    next quantitative description:

The equation calculates s, s the smaller the smoothing
Here Insert Picture Description
R & lt: label the Data
the U-: unlabel the Data
L = DW
W is: A11 is x1 right to x1 weight, A12 is x1 right to x2 weight of a analogy
D: dnn = wn1 + wn2 + ... + end of the line to

Here Insert Picture Description
Here Insert Picture Description
another will be mentioned in unsupervised learning.

Published 16 original articles · won praise 0 · Views 949

Guess you like

Origin blog.csdn.net/qq_44157281/article/details/98315178