Machine Learning Chapter 13 (Semi-Supervised Learning)

13.1 Unlabeled samples

If the number of labeled samples is too small, the generalization performance of the learned model is often lost due to insufficient training.

Active learning is active learning, and its goal is to use as few query queries as possible to obtain better performance.

Although unlabeled samples do not directly contain labeled information, their source is the same as other labeled samples, from independent and identically distributed sampling .

13.2 Generative methods

Generative methods are methods based directly on generative models, which assume that all data are generated by an underlying model.

13.3 Semi-supervised SVM

Semi-Supervised Support Vector Machine, referred to as S3VM is the promotion of support vector machine in semi-supervised learning. When unlabeled samples are not considered, the support vector machine tries to find the maximum interval partition hyperplane, and after considering the unlabeled samples, S3VM tries to find the partition hyperplane that can separate the two types of labeled samples and pass through the low-density area of ​​the data .

13.4 Graph Semi-Supervised Learning

Given a data set, we can regard it as a graph. Each sample in the data set corresponds to a node in the graph. If the similarity between two samples is high, there is an edge between the corresponding nodes. The edge strength is proportional to the similarity between samples.

Graph semi-supervised learning methods are conceptually clear, but this type of method does not work well when dealing with large-scale data sets. If the number of samples is O(m), the corresponding matrix size is O(m2); When a new sample arrives, it is either added to the original data set, the graph is reconstructed and the label is propagated again, or an additional prediction mechanism needs to be introduced.

13.5 Divergence-Based Approaches

Disagreement-based methods use multiple learners, and the disagreement divergence between learners is crucial for exploiting unlabeled data.

13.6 Semi-supervised clustering

There are roughly two types of supervision information obtained in the clustering task, one is must-link, and the other is cannot-link. must-link means that the samples must belong to the same cluster, and the latter means that the samples must not belong to the same cluster.

Guess you like

Origin blog.csdn.net/jinhualun911/article/details/108901185