Dimensionality reduction (3) LLE and other dimensionality reduction techniques

LLE

Locally Linear Embedding (LLE) is another powerful nonlinear dimensional reduction (NLDR) technology. It is a manifold learning technique and is not based on projection. To put it simply, the way LLE works is: first measure the linear correlation between each training instance and its nearest neighbors (cn), and then look for a case where these local relationships can be best preserved Low-dimensional way of representing the training set. This method is very useful when developing curved manifolds, especially when there is not much noise in the data set.

The following code uses the LocallyLinearEmbedding class of sk-learn to expand a Swiss volume:

from sklearn.manifold import LocallyLinearEmbedding

lle = LocallyLinearEmbedding(n_components=2, n_neighbors=10)
X_reduced = lle.fit_transform(X)

 

The result of the generated 2D data set is shown below:

 

We can see that the Swiss roll is fully unrolled and the distance between the instances is very well maintained locally. However, on a larger scale, the distance has not been kept well: the Swiss roll on the left is stretched, while the right is tightened. Nevertheless, LLE still performs very well in manifold modeling.

The specific working method of LLE is: for each training instance x ( i ) , the algorithm identifies its k nearest neighbors (k = 10 in the above code), and then tries to use these neighbors as a linear function Refactor x ( i ) . More specifically, it will find a set of weights w i, j such that the square distance between x ( i ) and ∑ w i , j x ( j ) (where j is initially 1, accumulated to the mth) is as close as possible Small, if x ( j ) is not one of the nearest k neighbors of x ( i ) , suppose w i, j = 0. So the first step of LLE is the following constrained optimization problem, where W is the weight matrix, which contains all the weights w i, j . The second constraint is for each training instance x ( i )The weights are a simple standardized constraint:

 

After the completion of this step, the weighting matrix W is (comprising weights W I , J ) will be a linear relationship between the local training examples are encoded. The second step is to map the training examples into a d-dimensional space (d <n), while preserving these local relationships as much as possible. If Z ( I ) is X ( I ) as in the space of d-dimensional (Image), we'll want to Z ( I ) and [Sigma W I , j Z ( j ) (where j is initially 1, the accumulated The square distance to m) is as small as possible. This idea led to the next constrained optimization problem (as shown in the formula below). It is very similar to the first step, but instead of fixing the instance and finding the optimal weight, it does the opposite: keeping the weight fixed and finding the optimal position of the instance's image in a low-dimensional space. The main thing is that here Z is a matrix containing all z ( i ) .

 

 LLE implementation of sk-learn, its computational complexity, the complexity of finding k nearest neighbors is O ( m log ( m ) n log ( k )), the complexity of optimization weights is O ( mnk 3 ), and the structure The complexity of the low-dimensional space representation is O ( dm 2 ). It is a pity that m 2 in the last formula will make this algorithm very bad for the expansion of very large data sets.

 

Other dimensionality reduction techniques

There are many other dimensionality reduction techniques, some of which are provided in sk-learn. Here are the most popular ones:

  • Random projection
    • As its name suggests, a random linear projection is used to project the data into a low-dimensional space. This sounds strange, but it turns out that this random projection can actually preserve the distance almost well. The quality of its dimensionality reduction depends on the data entry and the dimension of the target label, but not the initial dimension. You can check the sklearn.random_projection package for more information
  • Multidimensional Scaling (Multidimensional Scaling, MDS)
    • Reduce dimensions while keeping the distance between instances
  • Isomap
    • Create a graph by connecting each instance to its nearest neighbor. Then reduce the dimensions while keeping the geodesic distances between the instances as much as possible
  • t-Distributed Stochastic Neighbor Embedding (t-SNE)
    • Reduce dimensionality while keeping similar instances close rather than distant. It is mostly used for data visualization, especially when visualizing clusters of instances in a high-dimensional space (for example, visualizing MNIST images on 2D).
  • Linear Discriminant Analysi (LDA)
    • It is a classification algorithm, but during training, it will learn the discriminative axes that best distinguish between different categories. These axes can then be used to define a hyperplane, and the data can then be projected onto this hyperplane. The advantage of this method is that the projection will keep the different categories as far away as possible, so LDA is a good dimensionality reduction technique, which can be used to perform dimensionality reduction before performing another classification algorithm (such as SVM).

The following diagram shows several of these technologies:

 

 

Guess you like

Origin www.cnblogs.com/zackstang/p/12678522.html