Preliminary understanding of popular learning

Recently, the blogger revisited the thirteenth chapter of semi-supervised learning in the watermelon book. In the article, the author mentioned active learning that requires a small number of queries, clustering of K-means clusters, and popular learning. For popular learning, bloggers are also the first contact, let's briefly learn and understand popular learning.

1. Semi-supervised learning

The establishment of SSL relies on model assumptions, and when the model assumptions are correct, unlabeled examples can help improve learning performance. There are three assumptions that SSL relies on:
1) Smoothness Assumption: The class labels of two closely-distanced examples located in the dense data region are similar, that is, when the two examples are located in the dense data region When the edges are connected, they have the same class label with high probability; conversely, when two examples are separated by sparse data regions, their class labels tend to be different.
2) Cluster Assumption: When two samples are located in the same cluster, they have the same class label with a high probability. The equivalent of this assumption is defined as the Low Sensity Separation Assumption, that is, the classification decision boundary should pass through the sparse data region, and avoid dividing the samples in the dense data region on both sides of the decision boundary.
The clustering hypothesis means that when the distances between sample data are close to each other, they have the same class. According to this assumption, the classification boundary must pass through places where the data is sparse as much as possible, so as to avoid dividing dense sample data points on both sides of the classification boundary. Under the premise of this assumption, the learning algorithm can use a large number of unlabeled sample data to analyze the distribution of sample data in the sample space, so as to guide the learning algorithm to adjust the classification boundary, so that the sample data layout is relatively sparse as far as possible. . For example, the transduction support vector machine algorithm proposed by Joachims, during the training process, the algorithm continuously modifies the classification hyperplane and exchanges the labels of some unlabeled sample data on both sides of the hyperplane, so that the classification boundary maximizes the interval on all training data , so as to obtain a classification hyperplane that passes through a relatively sparse area of ​​data and divides all labeled sample data as accurately as possible.
3) Manifold Assumption: Embedding high-dimensional data into a low-dimensional manifold, when two examples are located in a small local neighborhood in the low-dimensional manifold, they have similar class labels.
The main idea of ​​the manifold hypothesis is that the sample data in the same local neighborhood have similar properties, so their labels should also be similar. This assumption reflects the local smoothness of the decision function. The main difference from the clustering hypothesis is that the clustering hypothesis mainly focuses on the overall characteristics, and the manifold hypothesis mainly considers the local characteristics of the model. Under this assumption, the unlabeled sample data can make the data space more dense, which is conducive to more standard analysis of the characteristics of the local area, and also enables the decision function to perform data fitting more completely . The manifold assumption can also sometimes be directly applied to semi-supervised learning algorithms. For example, Zhu et al. used Gaussian random fields and harmonic functions for semi-supervised learning . First, they used the training sample data to build a graph, each node in the graph represented a sample, and then obtained the decision function defined by the manifold hypothesis. The optimal value is obtained to obtain the optimal label of the unlabeled sample data; Zhou et al. use the similarity between the sample data to build a graph, and then let the label information of the sample data continue to propagate through the adjacent samples of the edges in the graph until the graph model reaches the global level until the steady state.
Essentially, these three types of assumptions are the same, but the emphasis on each other is different. Among them, the popular hypothesis is more general.

2. Popular Learning

Manifold learning is a very broad concept. Here I mainly talk about the concept of manifold learning and its main representative methods formed since 2000. Since 2000, manifold learning has been regarded as a branch of nonlinear dimensionality reduction. As we all know, the rapid development of this field was led by two articles in Science magazine in 2000: Isomap and LLE (Locally Linear Embedding).

2.1. The English name of manifold learning is manifold learning. The main idea is to non-linearly map a high-dimensional data to a low-dimensional data, and the low-dimensional data can reflect the essence of high-dimensional data. Of course, there is a premise that the high-dimensional observation data has a manifold structure, and its advantages are non-parametric, non-parametric Linear, the solution process is simple.

2.2. The feasibility of manifold learning is because: 1. From the perspective of cognitive psychology, psychologists believe that human cognitive processes are based on cognitive manifolds and topological continuity; 2. Many high-dimensional data are used They are all determined by a few hidden variables, so a small number of low-dimensional data can be used to describe high-dimensional data.

2.3. Mathematical background knowledge required for manifold learning: differential manifolds, Riemannian manifolds, differential geometry, tangent vector fields, topological spaces, smooth maps, etc.

2.4. Classical manifold learning algorithm:

Isomap: Isometric mapping. The premise assumes that the Euclidean distance in the low-dimensional space is equal to the side-to-ground distance in the high-dimensional space. Of course, when the algorithm is specifically implemented, the geodesic distance between the closer points in the high-dimensional space is replaced by the Euclidean distance, and the farther points are replaced by the Euclidean distance. The distance is approximated by the shortest path using the geodesic distance.

LLE: Local Linear Embedding. The premise is that the low-dimensional manifold where the data is located is locally linear, and each sampling point can be represented by linear reconstruction using its neighbors.

LE: Laplacian feature map. The premise is that the image projected from a very close point in a high-dimensional space to a low-dimensional space should also be very close.

HLLE: Local isometric mapping. The premise is that if a manifold is locally equidistant and an open set in Euclidean space, then the mapping function from this manifold to an open set is a linear function, and the second mixed partial derivative of the linear function is 0, so the hessian coefficient The quadratic form of the composition is also 0.

LPP: Local Preserving Projection. On the basis of the LE algorithm, suppose a mapping matrix P from the original space to the manifold space, and then obtain P through a certain method, and finally obtain a displayed projection mapping.

LTSA: Local Coordinate Representation. The basic idea is that the local geometry of the manifold is first represented by tangent coordinates, then the tangent space at each point in the manifold can be isomorphic with an open subset in the Euclidean space, that is, the tangent mapping.

MVU: Local isometric. Construct a local sparse Euclidean distance matrix, isomorphism preserving distance to learn a kernel matrix.

Logmap: Lateral distance and direction. The idea is to know the coordinates and direction of a point in the manifold space, find the normal coordinates through the tangent plane, and form an exponential map.

……

  

2.5. Problems in manifold learning:

The anti-interference noise ability is poor, the dimension of the low-dimensional space is not easy to determine, the assumption of a manifold structure is required, the sampling requires dense sampling, and the out-of-samples problem of the test data.

2.6. The future development direction of manifold learning:

Improve robustness, improve visualization methods, determine the dimension of low-dimensional space, combine with statistical learning, etc.

refer to:

1. Basic assumptions of semi-supervised learning

2. Preliminary understanding of popular learning 

3. Manifold Learning

4. Talking about Manifold Learning

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325697393&siteId=291194637