Implementation details of the LLE algorithm

Author: Zen and the Art of Computer Programming

1 Introduction

The LLE (Locally Linear Embedding) algorithm is an unsupervised dimensionality reduction method. It can learn the representation of low-dimensional data from high-dimensional data, and it is local, that is, it only considers the data points in a certain sample neighborhood. The relationship between data, rather than the interconnection of global data. This algorithm is often used for tasks such as visualization, data compression, and classification in high-dimensional spaces.

This article will introduce the LLE algorithm in detail, and deepen the understanding of its working principle and implementation through specific code cases. Articles are based on the python programming language.

1. Background introduction

In many scenarios, we often need to reduce the dimensionality of high-dimensional data for better display, analysis and processing. There are many methods of dimensionality reduction, such as PCA, SVD, etc. But these methods have great defects: first, they assume that all data have the same variance distribution; second, they can only find a globally optimal projection direction, while ignoring the internal irregular structure and complex local information.

Therefore, a new dimensionality reduction algorithm, LLE, has emerged. LLE uses local information for dimensionality reduction, and it believes that the embedding of a sample should maintain a large magnitude in the embedding direction of surrounding samples. This approach not only preserves global and local information, but also overcomes the limitations imposed by PCA and SVD. At the same time, the LLE algorithm does not need to perform feature selection or pre-set the number of dimensions for dimensionality reduction.

2. Explanation of basic concepts and terms

  • Sample point : refers to the point of the original data, which may be a two-dimensional or three-dimensional point, or a multi-dimensional vector.
  • Neighborhood : refers to a group of points near the sample point, the distance of the group of points is close enough, and the distance refers to the Euclidean distance or other arbitrary distance measures.
  • distance matrix

Guess you like

Origin blog.csdn.net/universsky2015/article/details/132493529